Switch from jzarr to zarr-java when reading v2/0.4 data#152
Switch from jzarr to zarr-java when reading v2/0.4 data#152melissalinkert merged 10 commits intoglencoesoftware:masterfrom
Conversation
Still some failing tests
Requires updating the minimum Java version to 11, see glencoesoftware/bioformats2raw#294
sbesson
left a comment
There was a problem hiding this comment.
With the removal of jzarr in favor of zarr-java, a lot of the conditionals/versioned APIs to handle different input is now streamlined to use the API. It looks like the only remaining active V2/V3 checks are related pixel type checks. I assume this is because the data type enumerations are different and there is not much we can do here?
In terms of code review, made two small inline comments/suggestion but overall the changes looks good.
This was tested functionally by converting the 3 modalities tested through this V3 support work (brightfield whole slide imaging, fluorescence whole slide imaging and high-content screening) into Zarr v2 and Zarr v3 datasets using the recently released bioformats2raw 0.12.0-rc2
set -x
OUTPUTDIR="v2_default"
mkdir -p ${OUTPUTDIR} && rm -rf ${OUTPUTDIR}/*
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/Leica-1.scn ${OUTPUTDIR}/Leica-1.ome.zarr
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/NIRHTa+001/AS_09125_050116000001_A10f00d0.DIB ${OUTPUTDIR}/NIRHTa+001.ome.zarr
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/LuCa-7color_Scan1.qptiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr
OUTPUTDIR="v3_default"
mkdir -p ${OUTPUTDIR} && rm -rf ${OUTPUTDIR}/*
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/Leica-1.scn ${OUTPUTDIR}/Leica-1.ome.zarr --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/NIRHTa+001/AS_09125_050116000001_A10f00d0.DIB ${OUTPUTDIR}/NIRHTa+001.ome.zarr --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/LuCa-7color_Scan1.qptiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr --ngff-version 0.5
OUTPUTDIR="v3_compact"
mkdir -p ${OUTPUTDIR} && rm -rf ${OUTPUTDIR}/*
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/Leica-1.scn ${OUTPUTDIR}/Leica-1.ome.zarr --compact --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/NIRHTa+001/AS_09125_050116000001_A10f00d0.DIB ${OUTPUTDIR}/NIRHTa+001.ome.zarr --compact --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/LuCa-7color_Scan1.qptiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr --compact --ngff-version 0.5followed by the conversion of these Zarr datasets to OME-TIFF
set -x
OUTPUTDIR="v2_default"
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/Leica-1.ome.zarr ${OUTPUTDIR}/Leica-1.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/NIRHTa+001.ome.zarr ${OUTPUTDIR}/NIRHTa+001.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr ${OUTPUTDIR}/LuCa-7color_Scan1.ome.tiff
OUTPUTDIR="v3_default"
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/Leica-1.ome.zarr ${OUTPUTDIR}/Leica-1.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/NIRHTa+001.ome.zarr ${OUTPUTDIR}/NIRHTa+001.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr ${OUTPUTDIR}/LuCa-7color_Scan1.ome.tiff
OUTPUTDIR="v3_compact"
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/Leica-1.ome.zarr ${OUTPUTDIR}/Leica-1.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/NIRHTa+001.ome.zarr ${OUTPUTDIR}/NIRHTa+001.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr ${OUTPUTDIR}/LuCa-7color_Scan1.ome.tiffThe conversion times are comparable
++ OUTPUTDIR=v2_default
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v2_default/Leica-1.ome.zarr v2_default/Leica-1.ome.tiff
real 0m44.424s
user 2m7.105s
sys 0m4.305s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v2_default/NIRHTa+001.ome.zarr v2_default/NIRHTa+001.ome.tiff
real 22m11.110s
user 2m29.976s
sys 2m2.128s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v2_default/LuCa-7color_Scan1.ome.zarr v2_default/LuCa-7color_Scan1.ome.tiff
real 0m43.105s
user 1m57.833s
sys 0m3.692s
++ OUTPUTDIR=v3_default
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_default/Leica-1.ome.zarr v3_default/Leica-1.ome.tiff
real 1m2.385s
user 2m28.143s
sys 0m4.118s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_default/NIRHTa+001.ome.zarr v3_default/NIRHTa+001.ome.tiff
real 22m32.169s
user 2m52.064s
sys 2m1.268s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_default/LuCa-7color_Scan1.ome.zarr v3_default/LuCa-7color_Scan1.ome.tiff
real 0m55.623s
user 2m14.483s
sys 0m3.273s
++ OUTPUTDIR=v3_compact
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_compact/Leica-1.ome.zarr v3_compact/Leica-1.ome.tiff
real 1m3.946s
user 2m27.578s
sys 0m4.166s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_compact/NIRHTa+001.ome.zarr v3_compact/NIRHTa+001.ome.tiff
real 24m8.412s
user 2m56.810s
sys 2m2.914s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_compact/LuCa-7color_Scan1.ome.zarr v3_compact/LuCa-7color_Scan1.ome.tiff
real 0m57.530s
user 2m19.869s
sys 0m3.328s
and the file size are identical
omero@ngff:/mnt/data/seb/raw2ometiff_152$ du -cs v*/*.ome.tiff
4051868 v2_default/Leica-1.ome.tiff
2112872 v2_default/LuCa-7color_Scan1.ome.tiff
2773620 v2_default/NIRHTa+001.ome.tiff
4051868 v3_compact/Leica-1.ome.tiff
2112872 v3_compact/LuCa-7color_Scan1.ome.tiff
2773620 v3_compact/NIRHTa+001.ome.tiff
4051868 v3_default/Leica-1.ome.tiff
2112872 v3_default/LuCa-7color_Scan1.ome.tiff
2773620 v3_default/NIRHTa+001.ome.tiff
26815080 total
The generated OME-TIFF files were all imported into OMERO for comparison and visual inspection which confirmed the data and metadata is identical.
| if (isV3()) { | ||
| Group v3Series = getZarrGroupV3(s.path); | ||
| if (v3Series == null) { | ||
| if (seriesGroup == null) { |
There was a problem hiding this comment.
As this is the same check as for v2, this probably can be moved outside the if isV3() conditional?
| Requirements | ||
| ============ | ||
|
|
||
| Java 8 or later is required. |
There was a problem hiding this comment.
As in https://github.com/glencoesoftware/bioformats2raw/tree/master?tab=readme-ov-file#development-installation can we also amend the README to update the build requirements?
Exactly. The v2 and v3 https://github.com/zarr-developers/zarr-java/blob/0.1.0/src/main/java/dev/zarr/zarrjava/v2/DataType.java |
|
I tested conversion with existing v2/0.4 datasets in gs-public-zarr-archive: And OpenSlide data converted with the latest bioformats2raw-0.12.0-rc3 with and without All successful conversions opened in FIJI without issue for visualization. CMU-1-Small-Region.zarr failed to convert, but that is also true of raw2ometiff-0.9.0 release, so it is unrelated to this PR. |
It looks like the There were a number of changes in Bio-Formats between 6.12.0 and 7.2.0 that would have impacted SVS image counts. As far as I can tell from our repository configuration data, the correct |
|
Fully agreed with #152 (comment) on what created the inconsistent state of the current or by recreating a new version of the Zarr dataset, deleting the existing one and replacing it altogether. |
|
Since this is approved by all reviewers, merging as mentioned in today's staff meeting so we can proceed with #151. |
See glencoesoftware/bioformats2raw#302.
This also includes changes to the minimum Java version, Gradle version, and build matrix, as was done in glencoesoftware/bioformats2raw#294. This was necessary to make the build work with bioformats2raw 0.12.0-rc2, which is needed to ensure that the same zarr-java version is used for both steps of the tests.
There are no new options here, so testing should be a matter of making sure that previously-working v2 and v3 input data continues to convert as expected. Following similar procedures as in #148 and glencoesoftware/bioformats2raw#302, and then verifying that the OME-TIFF files open in QuPath/OMERO/etc. should be sufficient.