Initial Zarr v3 support#148
Conversation
Expect v3 tests to fail right now as there is no released version of bioformats2raw that includes the `--v3` option. This also brings the JUnit version into alignment with what bioformats2raw uses, so we don't need to keep track of two test APIs.
|
Last two commits here cover everything I had initially planned. The build failures are expected as noted in the commit message for 8577ea3. Expected next steps are:
|
|
98f04a2 updates to use the new RC of bioformats2raw which includes the |
|
Down to just one failing test here, see zarr-developers/zarr-java#47 for a minimal example that reproduces the failing test with zarr-java alone. |
|
Build now passing, so marking as ready for review. This does not switch over to zarr-java for v2 reading; I can do that here if preferred, or as a separate PR after this is merged. |
sbesson
left a comment
There was a problem hiding this comment.
Functionally tested using the same public samples as the ones described in glencoesoftware/bioformats2raw#290 (review) that cover brightfield whole-slide imaging, muliplexed fluorescence whole slide imaging as well as high-content screening.
Various OME-Zarr datasets was generated using these datasets as source and bioformats2raw
--compact--ngff-version 0.5--ngff-version 0.5 --compact
All these datasets were converted back to OME-TIFF using this utility. File sizes are consistent independently of the source
3.8G 0.4_compact/Leica-1.ome.tiff
2.1G 0.4_compact/LuCa-7color_Scan1_XYC.ome.tiff
2.7G 0.4_compact/NIRHTa+001.ome.tiff
3.8G 0.5/Leica-1.ome.tiff
2.1G 0.5/LuCa-7color_Scan1.ome.tiff
2.7G 0.5/NIRHTa+001.ome.tiff
3.8G 0.5_compact/Leica-1.ome.tiff
2.1G 0.5_compact/LuCa-7color_Scan1.ome.tiff
2.7G 0.5_compact/NIRHTa+001.ome.tiff
26G total
The data was loaded into OMERO Plus for visual assessment and confirms the binary data is identical independently of the OME-Zarr source. The conversion failed while converting OME-Zarr v3 datasets that have been generated with sharding options but these are issues with the source data which are already captured elsewhere glencoesoftware/bioformats2raw#295. As we address them upstream, we might need to retest the conversion via raw2ometiff
The code changes are fairly significant but mostly revolved around 1- importing the relevant utilities from zarr-java, 2- isolating the v2 specific calls into dedicated methods, 3- adding v3 counterparts to these methods and 3- adding v2/v3 switches wherever necessary. As a possible next step is to look into using zarr-java as the single library for conversion for both Zarr v2 vs v3, this might reduce the complexity in this code. Unless it would be advantageous functionally to make the transition in one go, I am happy to get this in (and possible tagged as a release candidate) and look into using zarr-java for all input datasets as a follow-up.
While reviewing the v2/v3 switches, I suspect there will still be a need to handle backwards-incompatible changes in the metadata of Zarr v3 datasets. This might be something we need to tackle as support gets introduced to future (currently unreleased) versions of the OME-NGFF specification
erindiel
left a comment
There was a problem hiding this comment.
I converted the test data generated via glencoesoftware/bioformats2raw#302 using --ngff-version 0.5 --compression null with this build of raw2ometiff.
This was successful using default options, --compression JPEG, and --rgb options. The output of --split was also as expected.
Output OME-TIFFs were validated with visualization in FIJI.
Pairs with glencoesoftware/bioformats2raw#290.
This still needs some polishing, HCS support, and tests, so is a draft for now. In the current state, running
CMU-1.svsthroughbioformats2raw --v3 --compress-inner-chunkand then converting the v3 output withraw2ometiff --rgbseems to work as expected.