-
Notifications
You must be signed in to change notification settings - Fork 65
multiscales.datasets coordinate transformations: example inconsistency and spec clarification #494
Description
I'm opening this issue to write down two points that came to me regarding transformations and resolution levels in v0.6:
- An inconsistency in two examples
- Ask whether the spec could benefit from a quick clarification
Warning: the text here ended up being rather long but I hope/promise the point made is relatively concise 🤗🙏
Context
In the v0.6 spec, the multiscales metadata contains coordinateTransformations for the objects listed in multiscales.datasets. These transformations make sure all resolution levels are consistently represented in the "intrinsic coordinate system".
For each resolution, the ratio between the parameters of the scale transformation of the current resolution and the first resolution represents the scaling factor between the resolutions. A translation transformation helps to align potential offsets of the different resolutions.
In practice, these offsets depend on the downsampling method that's been used to create the arrays representing the different resolutions. For many downsampling methods (most in practice?), e.g. when using classical binning in which the binning window is aligned with the image origin (think of the first three pixels being summarised in a single pixel), a translation transformation needs to account for the fact that the first pixel of the current resolution no longer maps to the same coordinate as the first pixel of the first resolution in the "intrinsic coordinate system" (due to the coordinate convention).
multiscales examples
So most downsampling methods will require a (non-zero) translation transformation to be set on all resolutions but the first one. See also the discussion here](ome/ome-zarr-py#403).
The example multiscales_strict/multiscales_example.json represents a multiscale image with three resolution levels. The coordinate transformations associated to each resolution level only contain a scale transformation each. This is not by itself inconsistent, because a downsampling method could have been used which actually doesn't require a translation transform to align the resolutions. However, the example explicitly states in the multiscales.metadata field that skimage.transform.pyramid_gaussian has been used for downsampling. This downsampling method requires a translation transformation as described here.
The other multiscales example with more than one resolution multiscales_example_relative.json also doesn't include translation transformations. However it doesn't explicitly mention a downsampling method, so the example is not technically inconsistent, but perhaps slightly misleading.
Next to the inconsistency, I think it's a bit confusing that the two examples don't represent a realistic use case and that an implementer who looks at them needs to find out by themselves that a translation transformation in practice is probably required. Therefore: Should we add consistent translation transformations to the multiscales examples mentioned above? Some explanatory comments could also be nice. There are more examples in the tests, but these are probably less confusing.
Clarifications in the spec text?
Regarding this, what I've found in the spec text is the coordinate convention, explanations of the intrinsic coordinate system and the following paragraph:
The transformation MUST be one of the following:
- A single scale or identity transformation
- A sequence transformation containing one scale and one translation transformation.
In these cases, the scale transformation specifies the pixel size in physical units or time duration. If scaling information is not available or applicable for one of the axes, the value MUST express the scaling factor between the current resolution and the first resolution for the given axis, defaulting to 1.0 if there is no downsampling along the axis. This is strongly recommended so that the the “intrinsic” coordinate system of the image avoids more complex transformations.
While this is not a strictly technical interpretation, the fact that translation transformations are optional would suggest to me that these are useful for some advances use cases, but not generally needed. Therefore, would it make sense to clarify this aspect in the spec text?
One suggestion would be adding something like the following to the the paragraph above:
"Consistent positional alignment across resolutions in the intrinsic coordinate system must be ensured using translation transforms if required. E.g. for the most common downsampling methods such as classical binning, a translation of (pixel-size-at-resolution-N - pixel-size-at-resolution-0) / 2 is required for alignment."