Skip to content

Take axes metadata into account#19

Closed
dominikl wants to merge 19 commits intoglencoesoftware:masterfrom
dominikl:axes
Closed

Take axes metadata into account#19
dominikl wants to merge 19 commits intoglencoesoftware:masterfrom
dominikl:axes

Conversation

@dominikl
Copy link
Copy Markdown
Contributor

@dominikl dominikl commented Apr 30, 2025

The current limitation is that "the chunks must be stored in the (t, c, z, y, x) order". With this PR the order doesn't matter, it will be taken from the 'axes' metadata (if available, if not default to TCZYX). Once this works, I'm planning to open another PR which relaxes the requirement of having 5D arrays, and allow 3 or 4D (dropping Z and/or T). Edit: Added the <5d support to this PR too.

(FYI @sbesson )

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

68f2d2a has also been fixed in #18, could you merge origin/master so that the GitHub actions build run and show the failing tests?

On the failures themselves which I can reproduce locally, part of the difficulty is that the issue might come either from the reading of multiscales 5D arrays with arbitrary order in ZarrPixelBuffer or the generation of these arrays in ZarrPixelBufferTest.
To reduce the number of parameters, would it make sense to use the existing --dimension-order option of bioformats2raw via assertBioFormats2Raw(input, output, "--dimension-order", dimensionOrder);? For fake files, this means the tests can also use FakeReader.readSpecialPixels to decode the plane index and help the troubleshooting.

@jburel
Copy link
Copy Markdown
Contributor

jburel commented Jun 4, 2025

PR fixing the test and adding new one opened against @dominikl's branch

@dominikl dominikl marked this pull request as ready for review June 5, 2025 08:32
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Jun 5, 2025

Code-wise, there are a few System.out.println statements across the implementation which I assume were introduced for debugging purposes and should be removed.

Thinking of the functional testing of this proposal, are there already reference images that would have been converted into OME-Zarr using different supported dimensions orders?

@jburel
Copy link
Copy Markdown
Contributor

jburel commented Jun 5, 2025

We do not have any reference dataset in a different order as far as I am aware
Reading the specification

"the entries MUST be ordered by "type" where the "time" axis must come first (if present), followed by the "channel" or custom axis (if present) and the axes of type "space". If there are three spatial axes where two correspond to the image plane ("yx") and images are stacked along the other (anisotropic) axis ("z"), the spatial axes SHOULD be ordered as "zyx"."  https://ngff.openmicroscopy.org/0.4/index.html

This leads to a different problem. The support for --dimension-order flag in bioformats2raw could generate invalid ome-zarr files
e.g.
bioformats2raw /path/to/file.mrxs /path/to/zarr-pyramid --dimension-order XYZTC
will have an Zarray with order CTZYX
This is, according to the spec, invalid.
We should consider retracting the --dimension-order option from bioformats2raw
The following https://ngff.openmicroscopy.org/rfc/3/index.html could change that

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Jun 5, 2025

This is according to the spec invalid.

Thanks, re-reading this section carefully, any order other than the default XYZTC will indeed be non compliant with the OME-NGFF 0.4 specification.

So effectively there is no functionality introduced by this work at this stage and nothing to test functionally apart from checking for regression. Is the next step on your end to introduce support for XYZ, XYC etc here?

@jburel
Copy link
Copy Markdown
Contributor

jburel commented Jun 5, 2025

Thanks, re-reading this section carefully, any order other than the default XYZTC will indeed be non compliant with the OME-NGFF 0.4 specification.

you mean XYZCT

So effectively there is no functionality introduced by this work at this stage and nothing to test functionally apart from checking for regression. Is the next step on our end to introduce support for XYZ, XYC etc here?

According to the zarr spec, the only possible options are available by rotating XYZ. Bioformats2raw cannot be used in that case since XY must be the first 2 dimensions as specified in https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#Pixels_DimensionOrder.

We could re-introduce the manual creation of the zarr file for XYZ rotation, other rotation options are already tested using zarr file created by bioformats2raw dimension-order option

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Jun 5, 2025

We could re-introduce the manual creation of the zarr file for XYZ rotation, other rotation options are already tested using zarr file created by bioformats2raw dimension-order option

Even if it is more permissive for the order of the spatial axes, XYZ / "zyx" in axes is still the recommendation (SHOULD) from the OME-NGFF 0.4 specification. Is there a real world driver/use case for this?

@jburel
Copy link
Copy Markdown
Contributor

jburel commented Jun 5, 2025

We could re-introduce the manual creation of the zarr file for XYZ rotation, other rotation options are already tested using zarr file created by bioformats2raw dimension-order option

Even if it is more permissive for the order of the spatial axes, XYZ / "zyx" in axes is still the recommendation (SHOULD) from the OME-NGFF 0.4 specification. Is there a real world driver/use case for this?

not really
It was more if we wanted to test that option
Axes support is the first step towards supporting image with dimension <5. This is the real driver

@dominikl
Copy link
Copy Markdown
Contributor Author

dominikl commented Jun 5, 2025

(there's no integration test yet for 7d9f338 , working on it...)

try {
int[] chunks = getChunks()[resolutionLevel];
return new Dimension(chunks[4], chunks[3]);
return new Dimension(chunks[axes.get(Axes.Y)], chunks[axes.get(Axes.X)]);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I'm not really sure about this. I would have thought rather Dimension(chunks[axes.get(Axes.X)], chunks[axes.get(Axes.Y)]) but the chunks[4], chunks[3] confused me a bit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course its x, y... tripped me again that in default order x/y is swapped (...chunks must be stored in the (t, c, z, y, x) order...). Fixed now.

@dominikl
Copy link
Copy Markdown
Contributor Author

dominikl commented Jun 6, 2025

I was a bit too optimistic. The only non 5d which actually works is 4d without timeopoints, the other ones fail.

Screenshot 2025-06-06 at 11 13 34

@dominikl
Copy link
Copy Markdown
Contributor Author

Everythings works now, tested with the zarrs from: https://github.com/BioNGFF/TestZarr/tree/main/examples
Screenshot 2025-06-10 at 13 50 36

"Just" have to add integration tests. That'll probably be more code and work than the functionality itself.

@dominikl dominikl closed this Jun 12, 2025
@dominikl dominikl reopened this Jun 12, 2025
@sbesson sbesson linked an issue Jun 12, 2025 that may be closed by this pull request
@dominikl dominikl closed this Jun 12, 2025
@dominikl dominikl reopened this Jun 12, 2025
@dominikl dominikl closed this Jun 13, 2025
@dominikl dominikl reopened this Jun 13, 2025
@dominikl dominikl closed this Jun 13, 2025
@dominikl dominikl reopened this Jun 13, 2025
@dominikl
Copy link
Copy Markdown
Contributor Author

Can't get github action integration tests started :-/ Maybe it would actually be better if we fork this repository, then we also could add it to our build infracstructure.

@dominikl
Copy link
Copy Markdown
Contributor Author

Thanks @sbesson for starting the build again. Still failing. I don't know why, all tests passing when I run it locally.

@jburel
Copy link
Copy Markdown
Contributor

jburel commented Jun 13, 2025

The tests should now pass. blosc was missing.
I have also made an adjustment so action can be run on personal account
https://github.com/jburel/omero-zarr-pixel-buffer/actions/runs/15635379640 is green

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of inline comments from a code review perspective. Overall, I think we are close to a state where we can start scheduling some functional testing on our end.

Functionally, our minimal requirement will be to ensure there is no regression for the OME-Zarr datasets that are covered by this implementation as described in the README i.e. 5D images or label images stored either on the filesystem or AWS S3.
In addition, we will look into gathering or generating examples of OME-Zarr multiscales newly supported as per this PR. My understanding is that the full list of supported dimensions is the following:

  • 5D arrays with "tczyx" dimensions (current implementation)
  • 4D arrays with "tzyx" dimensions
  • 4D arrays with "czyx" dimensions
  • 4D arrays with "tzyx" dimensions
  • 3D arrays with "tyx" dimensions
  • 3D arrays with "cyx" dimensions
  • 3D arrays with "zyx" dimensions
  • 2D arrays with "yx" dimensions

From an API perspective, it is possible to construct a few scenarios outside the scope of this extension:

  • Zarr arrays with less than 2 or more than 5 dimensions
  • Zarr arrays with dimensions that are not named "xyzct"
  • Zarr arrays with unsupported dimension order e.g. XYTCZ - see glencoesoftware/bioformats2raw#278
    Should the implementation be defensive and fail initialization when encountering any of these?

In terms of unit testing, this PR supplements the existing set of unit tests and introduces a new utility for generating single-resolution Zarr multiscale groups with different dimensionalities. There is a bit of redundancy with what bioformats2raw does but also additional flexibility so ultimately the updated pixel buffer should be able to support arrays generated by bioformats2raw as well as low-level API so it makes sense to keep both.

As a side note, both the FakeReader and TestZarr introduced here share a similar functionality: the ability to produce synthetic data where each plane encodes its current dimension index: via FakeReader.readSpecialPixels and via generateGreyscaleImageWithText. While the former is more amenable to a programmatic usage, the latter is particularly useful when generating sample data that is displayed in a viewer, similarly to how some of the sample OME-TIFF are generated. This makes me wonder whether we could work towards of a combined API that would offer synthetic data with both features /cc @melissalinkert

}
this.axes = getAxes();
if (!axes.containsKey(Axes.X) || !axes.containsKey(Axes.Y)) {
throw new IllegalArgumentException("Missing X or Y axis!");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming additional validation is performed on the axes dimensions, should this logic happen here or directly in getAxes?

for (int i=0; i<axesData.size(); i++) {
Map<String, Object> axis = axesData.get(i);
String name = axis.get("name").toString().toUpperCase();
axes.put(Axes.valueOf(name), i);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation if axes.get(i).get("name") is not in the list of enums?

axes.put(Axes.valueOf(name), i);
}
} catch (Exception e) {
log.warn("No axes metadata found, defaulting to standard axes TCZYX");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logic only happen when there is no axes metadata?
There are a few conditions under which the block above will throw an exception and fallback to the default 5D dimension order - is that the expectations?

for (int z=0; z<fullResZ; z++) {
zIndexMap.put(z, Math.round(z * arrayZ / fullResZ));

if (zIndexMap == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extra indent on purpose here and in the following lines?

}

/** Maps axes to their corresponding array indexes */
private Map<Axes, Integer> axes;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During review, I found the usage of axes confusing as it refers to:

  • an enum for the XYZCT dimension names
  • a map allowing to store the axes metadata

int sizeY = 256;
int sizeX = 512;
int resolutions = 1;
String order = DimensionOrder.VALUE_XYCTZ;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, is this scenario legit?

private final AsyncLoadingCache<Path, ZarrArray> zarrArrayCache;

public enum Axes {
X, Y, Z, C, T;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of enums here is probably conditional to whether we want to handle OME-Zarr datasets with dimensions that are not one of xyzct

* @param text The text to write on the image
* @return byte array containing the grayscale image data
*/
public static byte[] generateGreyscaleImageWithText(int width, int height, String text) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to add this to a separate utility class rather than TestZarr directly?

@Test
public void test_XY() throws IOException, InvalidRangeException {
testDimensions(512, 1024, 0, 0, 0);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also include testXYZ, testXYT and testXYZT ?

}
}

Path series_path = path.resolve("0"); // image 0 (one image)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible something to revisit and simplify at some point given this pixel buffer is exclusively concerned about the multiscales group context.

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Jun 25, 2025

Discussed next steps with @dominikl @jburel @kkoz. Closing this PR and other open contributions from the OME Dundee team which will be reopened from https://github.com/ome/omero-zarr-pixel-buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle images that are not 5D

4 participants