From 201d7939d03836b675e8d16fcf69b03313e6b5d3 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Thu, 26 Mar 2026 23:12:58 +0100
Subject: [PATCH 01/10] specification: Explain how to do pixel coordinate
 transformations

---
 index.md | 148 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 93 insertions(+), 55 deletions(-)

diff --git a/index.md b/index.md
index d5c8383d..781f06bf 100644
--- a/index.md
+++ b/index.md
@@ -280,61 +280,6 @@ Here, we refer to any method that obtains values at real-valued coordinates usin
 As such, label images may be interpolated using "nearest neighbor" to obtain labels at points along the continuum.
 ```
 
-#### Array coordinate systems
-
-The dimensions of an array do not have an interpretation
-until they are associated with a coordinate system via a coordinate transformation.
-Nevertheless, it can be useful to refer to the "raw" coordinates of the array.
-Some applications might prefer to define points or regions-of-interest in "pixel coordinates" rather than "physical coordinates," for example.
-Indicating that choice explicitly will be important for interoperability.
-This is possible by using **array coordinate systems**.
-
-Every array has a default coordinate system whose parameters need not be explicitly defined.
-The dimensionality of each array coordinate system equals the dimensionality of its corresponding Zarr array.
-Its name is the path to the array in the container,
-its axes have `"type": "array"`, are unitless, and have default names.
-The i-th axis has `"name": "dim_i"` (these are the same default names used by [xarray](https://docs.xarray.dev/en/stable/user-guide/terminology.html)).
-As with all coordinate systems, the dimension names must be unique and non-null.
-
-:::{dropdown} Example
-```json
-{
-  "arrayCoordinateSystem" : {
-    "name" : "myDataArray",
-    "axes" : [
-      {"name": "dim_0", "type": "array"},
-      {"name": "dim_1", "type": "array"},
-      {"name": "dim_2", "type": "array"}
-    ]
-  }
-}
-
-```
-
-For example, if 0/zarr.json contains:
-```json
-{
-    "zarr_format": 3,
-    "node_type": "array",
-    "shape": [4, 3, 5],
-    //...
-}
-```
-
-Then `dim_0` has length 4, `dim_1` has length 3, and `dim_2` has length 5.
-
-:::
-
-The axes and their order align with the shape of the corresponding Zarr array,
-and whose data depends on the byte order used to store chunks.
-As described in the [Zarr array metadata](https://zarr.readthedocs.io/en/stable/spec/v3.html#arrays),
-the last dimension of an array in "C" order are stored contiguously on disk or in-memory when directly loaded. 
-
-The name and axes names MAY be customized by including a `arrayCoordinateSystem` field
-in the user-defined attributes of the array whose value is a coordinate system object.
-The length of `axes` MUST be equal to the dimensionality.
-The value of `type` for each object in the axes array MUST equal `"array"`.
-
 #### Coordinate convention
 
 **The pixel/voxel center is the origin of the continuous coordinate system.**
@@ -603,6 +548,99 @@ to do so by estimating the transformations' inverse if they choose to.
 ```
 :::
 
+**Transformations in pixel units**: Some applications might prefer to define points, regions-of-interest or transformation parameters
+in "pixel coordinates" rather than "physical coordinates".
+Because transformations are agnostic to whether they refer to pixel or physical coordinates,
+indicating that choice explicitly will be important for interoperability.
+This can be expressed in the metadata in multiple ways, including:
+- One can embed a transformation defined in pixel units into a `sequence` transformation
+  that includes the appropriate scale transformation and its inverse to convert to physical units (see example below).
+- One can define a unitless coordinate system and connect it to the "intrinsic" coordinate system
+  with a scale transformation that has the appropriate scale factors to convert to physical units.
+
+:::{dropdown} Example: Embedded expression
+
+In the context of [`scene`](#scene-md), one may want to express a transformation between two images in pixel units,
+even though the coordinate systems of the two images are in physical units.
+This can be achieved by embedding the pixel-unit transformation into a `sequence` transformation like this:
+
+```json
+{ "scene": 
+  {
+    "type": "sequence",
+    "input": {"name": "intrinsic", "path": "imageA"},
+    "output": {"name": "intrinsic", "path": "imageB"},
+    "transformations": [
+      {
+        "type": "scale",
+        "scale": [0.5, 0.5],
+      },
+      {
+        "type": "translation",
+        "translation": [10, 20],
+        "name": "translation in pixel units"
+      },
+      {
+        "type": "scale",
+        "scale": [2, 2],
+      }
+    ]
+  }
+}
+```
+:::
+
+:::{dropdown} Example: Unitless coordinate system
+
+Alternatively, users may choose to define a unitless coordinate system and connect it to the "intrinsic" coordinate system
+with a scale transformation that has the appropriate scale factors to convert to physical units.
+In the context of multiscales metadata, this could look like this:
+
+```json
+{
+  "multiscales": [
+    {
+      "coordinateSystems": [
+        {
+          "name": "intrinsic",
+          "axes": [{"name": "x", "type": "space", "unit": "micrometer"}, {"name": "y", "type": "space", "unit": "micrometer"}]
+        },
+        {
+          "name": "array",
+          "axes": [{"name": "x", "type": "space"}, {"name": "y", "type": "space"}]
+        }
+      ],
+      "datasets": [
+        {
+          "path": "s0",
+          "coordinateTransformations": [
+            {
+              "type": "scale",
+              "scale": [0.5, 0.5],
+              "input": "s0",
+              "output": "intrinsic"
+            }
+          ]
+        }
+      ],
+      "coordinateTransformations": [ 
+        {
+          "type": "scale",
+          "scale": [2.0, 2.0],
+          "input": {"name": "intrinsic"},
+          "output": {"name": "pixel"}
+        }
+      ]
+    }
+  ]
+}
+```
+In this case, the `scale` transformation under `coordinateTransformations`
+defines the mapping from the "intrinsic" coordinate system to the unitless "pixel" coordinate system.
+:::
+
+
+
 #### Matrix transformations
 (matrix-trafo-md)=
 

From 626f75685c3e849674a161c6d465fc3fd3cea2c5 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Thu, 26 Mar 2026 23:28:13 +0100
Subject: [PATCH 02/10] chore: revert axes ordering

---
 index.md | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/index.md b/index.md
index 781f06bf..fb6bd734 100644
--- a/index.md
+++ b/index.md
@@ -603,11 +603,17 @@ In the context of multiscales metadata, this could look like this:
       "coordinateSystems": [
         {
           "name": "intrinsic",
-          "axes": [{"name": "x", "type": "space", "unit": "micrometer"}, {"name": "y", "type": "space", "unit": "micrometer"}]
+          "axes": [
+            {"name": "y", "type": "space", "unit": "micrometer"},
+            {"name": "x", "type": "space", "unit": "micrometer"}
+          ]
         },
         {
           "name": "array",
-          "axes": [{"name": "x", "type": "space"}, {"name": "y", "type": "space"}]
+          "axes": [
+            {"name": "y", "type": "space"},
+            {"name": "x", "type": "space"}
+          ]
         }
       ],
       "datasets": [

From dc2f68fbbb9515f55962c2f77d98786aff6a1b55 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Thu, 26 Mar 2026 23:28:22 +0100
Subject: [PATCH 03/10] chore: add explanatory sentence to example

---
 index.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/index.md b/index.md
index fb6bd734..81a10ce5 100644
--- a/index.md
+++ b/index.md
@@ -643,6 +643,7 @@ In the context of multiscales metadata, this could look like this:
 ```
 In this case, the `scale` transformation under `coordinateTransformations`
 defines the mapping from the "intrinsic" coordinate system to the unitless "pixel" coordinate system.
+Another transformation (e.g. in a `scene`) could then use the "pixel" coordinate system as an input or output to define transformations in pixel units.
 :::
 
 

From 193b8df9630ac6fb26f2d3bc534a56bfa7771f83 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Thu, 26 Mar 2026 23:29:02 +0100
Subject: [PATCH 04/10] chore: fix example

---
 index.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/index.md b/index.md
index 81a10ce5..0a649bab 100644
--- a/index.md
+++ b/index.md
@@ -634,7 +634,7 @@ In the context of multiscales metadata, this could look like this:
           "type": "scale",
           "scale": [2.0, 2.0],
           "input": {"name": "intrinsic"},
-          "output": {"name": "pixel"}
+          "output": {"name": "array"}
         }
       ]
     }
@@ -642,8 +642,8 @@ In the context of multiscales metadata, this could look like this:
 }
 ```
 In this case, the `scale` transformation under `coordinateTransformations`
-defines the mapping from the "intrinsic" coordinate system to the unitless "pixel" coordinate system.
-Another transformation (e.g. in a `scene`) could then use the "pixel" coordinate system as an input or output to define transformations in pixel units.
+defines the mapping from the "intrinsic" coordinate system to the unitless "array" coordinate system.
+Another transformation (e.g. in a `scene`) could then use the "array" coordinate system as an input or output to define transformations in array units.
 :::
 
 

From 3ba7903f4a71c610513797c3716fd6b1eac320ff Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Sat, 28 Mar 2026 23:55:40 +0100
Subject: [PATCH 05/10] specification: clarify coordinate conventions

---
 index.md | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/index.md b/index.md
index 0a649bab..ccc90a69 100644
--- a/index.md
+++ b/index.md
@@ -284,15 +284,29 @@ As such, label images may be interpolated using "nearest neighbor" to obtain lab
 
 **The pixel/voxel center is the origin of the continuous coordinate system.**
 
-It is vital to consistently define relationship
-between the discrete/array and continuous/interpolated coordinate systems.
-A pixel/voxel is the continuous region (rectangle) that corresponds to a single sample in the discrete array, i.e.,
-the area corresponding to nearest-neighbor (NN) interpolation of that sample.
-The center of a 2d pixel corresponding to the origin `(0,0)` in the discrete array
-is the origin of the continuous coordinate system `(0.0, 0.0)` (when the transformation is the identity).
-The continuous rectangle of the pixel is given
-by the half-open interval `[-0.5, 0.5) x [-0.5, 0.5)` (i.e., -0.5 is included, +0.5 is excluded).
-See chapter 4 and figure 4.1 of the ITK Software Guide.
+It is vital to consistently define relationship between the discrete/array and continuous/interpolated coordinate systems.
+The following conventions apply in this specification:
+
+- The discrete coordinate grid for a Zarr array of shape `[N₀, N₁, ..., Nₖ]`
+  is defined as zero-based, with indices ranging from 0 to Nᵢ - 1 for each dimension i.
+  For example, given an array with shape (2, 3),
+  the discrete coordinate system for that array defines the following array of points:
+  ```
+  [
+    [(0, 0), (0, 1)],
+    [(1, 0), (1, 1)],
+    [(2, 0), (3, 1)],
+  ]
+  ```
+- A "pixel"/"voxel" is the continuous region (rectangle/box) that corresponds to a single sample in the discrete array,
+  i.e., the area corresponding to nearest-neighbor (NN) interpolation of that sample.
+- The center of a 2d pixel corresponding to the origin (0,0) in the discrete array
+  is the origin of the continuous coordinate system (0.0, 0.0) (when the transformation is the identity).
+- The continuous rectangle of the pixel is given
+  by the half-open interval [-0.5, 0.5) x [-0.5, 0.5) (i.e., -0.5 is included, +0.5 is excluded).
+
+For a more formal and in-depth definition,
+see chapter 4 and figure 4.1 of the [ITK Software Guide](https://itk.org/ItkSoftwareGuide.pdf).
 
 ### bioformats2raw.layout
 

From 575345f091a5d93f382b02867980e55fd36c5de0 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Sat, 28 Mar 2026 23:55:56 +0100
Subject: [PATCH 06/10] refactor: use array coordinates terminology rather than
 pixel

---
 index.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/index.md b/index.md
index ccc90a69..3512d496 100644
--- a/index.md
+++ b/index.md
@@ -562,21 +562,22 @@ to do so by estimating the transformations' inverse if they choose to.
 ```
 :::
 
-**Transformations in pixel units**: Some applications might prefer to define points, regions-of-interest or transformation parameters
-in "pixel coordinates" rather than "physical coordinates".
-Because transformations are agnostic to whether they refer to pixel or physical coordinates,
+**Transformations in array coordinate units**:
+Some applications might prefer to define points, regions-of-interest or transformation parameters
+in array coordinates rather than physical units.
+Because transformations are agnostic to whether they refer to array or physical coordinates,
 indicating that choice explicitly will be important for interoperability.
 This can be expressed in the metadata in multiple ways, including:
-- One can embed a transformation defined in pixel units into a `sequence` transformation
+- One can embed a transformation defined in array units into a `sequence` transformation
   that includes the appropriate scale transformation and its inverse to convert to physical units (see example below).
 - One can define a unitless coordinate system and connect it to the "intrinsic" coordinate system
   with a scale transformation that has the appropriate scale factors to convert to physical units.
 
 :::{dropdown} Example: Embedded expression
 
-In the context of [`scene`](#scene-md), one may want to express a transformation between two images in pixel units,
+In the context of [`scene`](#scene-md), one may want to express a transformation between two images in array units,
 even though the coordinate systems of the two images are in physical units.
-This can be achieved by embedding the pixel-unit transformation into a `sequence` transformation like this:
+This can be achieved by embedding the array-unit transformation into a `sequence` transformation like this:
 
 ```json
 { "scene": 
@@ -592,7 +593,7 @@ This can be achieved by embedding the pixel-unit transformation into a `sequence
       {
         "type": "translation",
         "translation": [10, 20],
-        "name": "translation in pixel units"
+        "name": "translation in array units"
       },
       {
         "type": "scale",

From 7284adb5eabe124e096559eeff724148fece20a3 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Sat, 28 Mar 2026 23:56:21 +0100
Subject: [PATCH 07/10] chore: reword

---
 index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/index.md b/index.md
index 3512d496..ddfe74c2 100644
--- a/index.md
+++ b/index.md
@@ -564,7 +564,7 @@ to do so by estimating the transformations' inverse if they choose to.
 
 **Transformations in array coordinate units**:
 Some applications might prefer to define points, regions-of-interest or transformation parameters
-in array coordinates rather than physical units.
+in array coordinates (also referred to as pixel coordinates) rather than physical units.
 Because transformations are agnostic to whether they refer to array or physical coordinates,
 indicating that choice explicitly will be important for interoperability.
 This can be expressed in the metadata in multiple ways, including:

From 769f5196c6be48e55b8f3877cbeced8168f42234 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Wed, 1 Apr 2026 09:57:33 +0200
Subject: [PATCH 08/10] chore: Improve example

---
 index.md | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/index.md b/index.md
index ddfe74c2..f51434cb 100644
--- a/index.md
+++ b/index.md
@@ -575,9 +575,9 @@ This can be expressed in the metadata in multiple ways, including:
 
 :::{dropdown} Example: Embedded expression
 
-In the context of [`scene`](#scene-md), one may want to express a transformation between two images in array units,
+In the context of [`scene`](#scene-md), one may want to express a transformation between two images in dimensionless units,
 even though the coordinate systems of the two images are in physical units.
-This can be achieved by embedding the array-unit transformation into a `sequence` transformation like this:
+This can be achieved by embedding the transformation into a `sequence` transformation like this:
 
 ```json
 { "scene": 
@@ -588,21 +588,27 @@ This can be achieved by embedding the array-unit transformation into a `sequence
     "transformations": [
       {
         "type": "scale",
-        "scale": [0.5, 0.5],
+        "scale": [2, 2],
       },
       {
         "type": "translation",
         "translation": [10, 20],
-        "name": "translation in array units"
+        "name": "translation in dimensionless units"
       },
       {
         "type": "scale",
-        "scale": [2, 2],
+        "scale": [0.5, 0.5],
       }
     ]
   }
 }
 ```
+
+This example assumes that the coordinate system named `"intrisinc"` in both referenced images is in physical units,
+and is linked to the lowest resolution level (e.g., `s0`) of the multiscale image with a `scale` transformation that has the scale factors `[0.5, 0.5]`.
+In this case, the the first `scale` transformation in this example converts the input coordinates from physical to dimensionless units.
+The `translation` transformation is applied in dimensionless units,
+and finally the second `scale` transformation converts the coordinates back to physical units.
 :::
 
 :::{dropdown} Example: Unitless coordinate system

From 1d2c470534010b87d3b406cea88c2ab394ce2773 Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Wed, 1 Apr 2026 10:04:49 +0200
Subject: [PATCH 09/10] chore: improve wording

---
 index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/index.md b/index.md
index f51434cb..5eacd2ff 100644
--- a/index.md
+++ b/index.md
@@ -565,7 +565,7 @@ to do so by estimating the transformations' inverse if they choose to.
 **Transformations in array coordinate units**:
 Some applications might prefer to define points, regions-of-interest or transformation parameters
 in array coordinates (also referred to as pixel coordinates) rather than physical units.
-Because transformations are agnostic to whether they refer to array or physical coordinates,
+Because transformations are agnostic to whether they operate on array or physical coordinates,
 indicating that choice explicitly will be important for interoperability.
 This can be expressed in the metadata in multiple ways, including:
 - One can embed a transformation defined in array units into a `sequence` transformation

From 6236d70966ed451538e85212e64a1ad49d624e9d Mon Sep 17 00:00:00 2001
From: Johannes Soltwedel <38459088+jo-mueller@users.noreply.github.com>
Date: Wed, 1 Apr 2026 10:05:46 +0200
Subject: [PATCH 10/10] chore: typo

---
 index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/index.md b/index.md
index 5eacd2ff..5332bae7 100644
--- a/index.md
+++ b/index.md
@@ -604,7 +604,7 @@ This can be achieved by embedding the transformation into a `sequence` transform
 }
 ```
 
-This example assumes that the coordinate system named `"intrisinc"` in both referenced images is in physical units,
+This example assumes that the coordinate system named `"intrinsic"` in both referenced images is in physical units,
 and is linked to the lowest resolution level (e.g., `s0`) of the multiscale image with a `scale` transformation that has the scale factors `[0.5, 0.5]`.
 In this case, the the first `scale` transformation in this example converts the input coordinates from physical to dimensionless units.
 The `translation` transformation is applied in dimensionless units,