Take axes metadata into account#19
Take axes metadata into account#19dominikl wants to merge 19 commits intoglencoesoftware:masterfrom dominikl:axes
Conversation
sbesson
left a comment
There was a problem hiding this comment.
68f2d2a has also been fixed in #18, could you merge origin/master so that the GitHub actions build run and show the failing tests?
On the failures themselves which I can reproduce locally, part of the difficulty is that the issue might come either from the reading of multiscales 5D arrays with arbitrary order in ZarrPixelBuffer or the generation of these arrays in ZarrPixelBufferTest.
To reduce the number of parameters, would it make sense to use the existing --dimension-order option of bioformats2raw via assertBioFormats2Raw(input, output, "--dimension-order", dimensionOrder);? For fake files, this means the tests can also use FakeReader.readSpecialPixels to decode the plane index and help the troubleshooting.
src/test/java/com/glencoesoftware/omero/zarr/ZarrPixelBufferTest.java
Outdated
Show resolved
Hide resolved
|
PR fixing the test and adding new one opened against @dominikl's branch |
|
Code-wise, there are a few Thinking of the functional testing of this proposal, are there already reference images that would have been converted into OME-Zarr using different supported dimensions orders? |
|
We do not have any reference dataset in a different order as far as I am aware This leads to a different problem. The support for |
Thanks, re-reading this section carefully, any order other than the default XYZTC will indeed be non compliant with the OME-NGFF 0.4 specification. So effectively there is no functionality introduced by this work at this stage and nothing to test functionally apart from checking for regression. Is the next step on your end to introduce support for XYZ, XYC etc here? |
you mean XYZCT
According to the zarr spec, the only possible options are available by rotating We could re-introduce the manual creation of the zarr file for |
Even if it is more permissive for the order of the spatial axes, |
not really |
|
(there's no integration test yet for 7d9f338 , working on it...) |
| try { | ||
| int[] chunks = getChunks()[resolutionLevel]; | ||
| return new Dimension(chunks[4], chunks[3]); | ||
| return new Dimension(chunks[axes.get(Axes.Y)], chunks[axes.get(Axes.X)]); |
There was a problem hiding this comment.
Tbh, I'm not really sure about this. I would have thought rather Dimension(chunks[axes.get(Axes.X)], chunks[axes.get(Axes.Y)]) but the chunks[4], chunks[3] confused me a bit.
There was a problem hiding this comment.
Of course its x, y... tripped me again that in default order x/y is swapped (...chunks must be stored in the (t, c, z, y, x) order...). Fixed now.
|
Everythings works now, tested with the zarrs from: https://github.com/BioNGFF/TestZarr/tree/main/examples "Just" have to add integration tests. That'll probably be more code and work than the functionality itself. |
|
Can't get github action integration tests started :-/ Maybe it would actually be better if we fork this repository, then we also could add it to our build infracstructure. |
|
Thanks @sbesson for starting the build again. Still failing. I don't know why, all tests passing when I run it locally. |
|
The tests should now pass. blosc was missing. |
sbesson
left a comment
There was a problem hiding this comment.
Added a couple of inline comments from a code review perspective. Overall, I think we are close to a state where we can start scheduling some functional testing on our end.
Functionally, our minimal requirement will be to ensure there is no regression for the OME-Zarr datasets that are covered by this implementation as described in the README i.e. 5D images or label images stored either on the filesystem or AWS S3.
In addition, we will look into gathering or generating examples of OME-Zarr multiscales newly supported as per this PR. My understanding is that the full list of supported dimensions is the following:
- 5D arrays with "tczyx" dimensions (current implementation)
- 4D arrays with "tzyx" dimensions
- 4D arrays with "czyx" dimensions
- 4D arrays with "tzyx" dimensions
- 3D arrays with "tyx" dimensions
- 3D arrays with "cyx" dimensions
- 3D arrays with "zyx" dimensions
- 2D arrays with "yx" dimensions
From an API perspective, it is possible to construct a few scenarios outside the scope of this extension:
- Zarr arrays with less than 2 or more than 5 dimensions
- Zarr arrays with dimensions that are not named "xyzct"
- Zarr arrays with unsupported dimension order e.g. XYTCZ - see glencoesoftware/bioformats2raw#278
Should the implementation be defensive and fail initialization when encountering any of these?
In terms of unit testing, this PR supplements the existing set of unit tests and introduces a new utility for generating single-resolution Zarr multiscale groups with different dimensionalities. There is a bit of redundancy with what bioformats2raw does but also additional flexibility so ultimately the updated pixel buffer should be able to support arrays generated by bioformats2raw as well as low-level API so it makes sense to keep both.
As a side note, both the FakeReader and TestZarr introduced here share a similar functionality: the ability to produce synthetic data where each plane encodes its current dimension index: via FakeReader.readSpecialPixels and via generateGreyscaleImageWithText. While the former is more amenable to a programmatic usage, the latter is particularly useful when generating sample data that is displayed in a viewer, similarly to how some of the sample OME-TIFF are generated. This makes me wonder whether we could work towards of a combined API that would offer synthetic data with both features /cc @melissalinkert
| } | ||
| this.axes = getAxes(); | ||
| if (!axes.containsKey(Axes.X) || !axes.containsKey(Axes.Y)) { | ||
| throw new IllegalArgumentException("Missing X or Y axis!"); |
There was a problem hiding this comment.
Assuming additional validation is performed on the axes dimensions, should this logic happen here or directly in getAxes?
| for (int i=0; i<axesData.size(); i++) { | ||
| Map<String, Object> axis = axesData.get(i); | ||
| String name = axis.get("name").toString().toUpperCase(); | ||
| axes.put(Axes.valueOf(name), i); |
There was a problem hiding this comment.
What is the expectation if axes.get(i).get("name") is not in the list of enums?
| axes.put(Axes.valueOf(name), i); | ||
| } | ||
| } catch (Exception e) { | ||
| log.warn("No axes metadata found, defaulting to standard axes TCZYX"); |
There was a problem hiding this comment.
Should this logic only happen when there is no axes metadata?
There are a few conditions under which the block above will throw an exception and fallback to the default 5D dimension order - is that the expectations?
| for (int z=0; z<fullResZ; z++) { | ||
| zIndexMap.put(z, Math.round(z * arrayZ / fullResZ)); | ||
|
|
||
| if (zIndexMap == null) { |
There was a problem hiding this comment.
Is the extra indent on purpose here and in the following lines?
| } | ||
|
|
||
| /** Maps axes to their corresponding array indexes */ | ||
| private Map<Axes, Integer> axes; |
There was a problem hiding this comment.
During review, I found the usage of axes confusing as it refers to:
- an enum for the XYZCT dimension names
- a map allowing to store the
axesmetadata
| int sizeY = 256; | ||
| int sizeX = 512; | ||
| int resolutions = 1; | ||
| String order = DimensionOrder.VALUE_XYCTZ; |
There was a problem hiding this comment.
Same as above, is this scenario legit?
| private final AsyncLoadingCache<Path, ZarrArray> zarrArrayCache; | ||
|
|
||
| public enum Axes { | ||
| X, Y, Z, C, T; |
There was a problem hiding this comment.
The usage of enums here is probably conditional to whether we want to handle OME-Zarr datasets with dimensions that are not one of xyzct
| * @param text The text to write on the image | ||
| * @return byte array containing the grayscale image data | ||
| */ | ||
| public static byte[] generateGreyscaleImageWithText(int width, int height, String text) { |
There was a problem hiding this comment.
Any reason to add this to a separate utility class rather than TestZarr directly?
| @Test | ||
| public void test_XY() throws IOException, InvalidRangeException { | ||
| testDimensions(512, 1024, 0, 0, 0); | ||
| } |
There was a problem hiding this comment.
Should this also include testXYZ, testXYT and testXYZT ?
| } | ||
| } | ||
|
|
||
| Path series_path = path.resolve("0"); // image 0 (one image) |
There was a problem hiding this comment.
This is possible something to revisit and simplify at some point given this pixel buffer is exclusively concerned about the multiscales group context.
|
Discussed next steps with @dominikl @jburel @kkoz. Closing this PR and other open contributions from the OME Dundee team which will be reopened from https://github.com/ome/omero-zarr-pixel-buffer. |


The current limitation is that "the chunks must be stored in the (t, c, z, y, x) order". With this PR the order doesn't matter, it will be taken from the 'axes' metadata (if available, if not default to TCZYX).
Once this works, I'm planning to open another PR which relaxes the requirement of having 5D arrays, and allow 3 or 4D (dropping Z and/or T).Edit: Added the <5d support to this PR too.(FYI @sbesson )