Skip to content

Switch from jzarr to zarr-java when reading v2/0.4 data#152

Merged
melissalinkert merged 10 commits intoglencoesoftware:masterfrom
melissalinkert:zarr-java-v2
Apr 8, 2026
Merged

Switch from jzarr to zarr-java when reading v2/0.4 data#152
melissalinkert merged 10 commits intoglencoesoftware:masterfrom
melissalinkert:zarr-java-v2

Conversation

@melissalinkert
Copy link
Copy Markdown
Member

See glencoesoftware/bioformats2raw#302.

This also includes changes to the minimum Java version, Gradle version, and build matrix, as was done in glencoesoftware/bioformats2raw#294. This was necessary to make the build work with bioformats2raw 0.12.0-rc2, which is needed to ensure that the same zarr-java version is used for both steps of the tests.

There are no new options here, so testing should be a matter of making sure that previously-working v2 and v3 input data continues to convert as expected. Following similar procedures as in #148 and glencoesoftware/bioformats2raw#302, and then verifying that the OME-TIFF files open in QuPath/OMERO/etc. should be sufficient.

@melissalinkert melissalinkert added this to the 0.10.0 milestone Mar 17, 2026
Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the removal of jzarr in favor of zarr-java, a lot of the conditionals/versioned APIs to handle different input is now streamlined to use the API. It looks like the only remaining active V2/V3 checks are related pixel type checks. I assume this is because the data type enumerations are different and there is not much we can do here?
In terms of code review, made two small inline comments/suggestion but overall the changes looks good.

This was tested functionally by converting the 3 modalities tested through this V3 support work (brightfield whole slide imaging, fluorescence whole slide imaging and high-content screening) into Zarr v2 and Zarr v3 datasets using the recently released bioformats2raw 0.12.0-rc2

set -x

OUTPUTDIR="v2_default"
mkdir -p ${OUTPUTDIR} && rm -rf ${OUTPUTDIR}/*
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/Leica-1.scn ${OUTPUTDIR}/Leica-1.ome.zarr
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/NIRHTa+001/AS_09125_050116000001_A10f00d0.DIB ${OUTPUTDIR}/NIRHTa+001.ome.zarr
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/LuCa-7color_Scan1.qptiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr

OUTPUTDIR="v3_default"
mkdir -p ${OUTPUTDIR} && rm -rf ${OUTPUTDIR}/*
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/Leica-1.scn ${OUTPUTDIR}/Leica-1.ome.zarr --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/NIRHTa+001/AS_09125_050116000001_A10f00d0.DIB ${OUTPUTDIR}/NIRHTa+001.ome.zarr --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/LuCa-7color_Scan1.qptiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr --ngff-version 0.5

OUTPUTDIR="v3_compact"
mkdir -p ${OUTPUTDIR} && rm -rf ${OUTPUTDIR}/*
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/Leica-1.scn ${OUTPUTDIR}/Leica-1.ome.zarr --compact --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/NIRHTa+001/AS_09125_050116000001_A10f00d0.DIB ${OUTPUTDIR}/NIRHTa+001.ome.zarr --compact --ngff-version 0.5
time ./bioformats2raw-0.12.0-rc2/bin/bioformats2raw sources/LuCa-7color_Scan1.qptiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr --compact --ngff-version 0.5

followed by the conversion of these Zarr datasets to OME-TIFF

set -x

OUTPUTDIR="v2_default"
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/Leica-1.ome.zarr ${OUTPUTDIR}/Leica-1.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/NIRHTa+001.ome.zarr ${OUTPUTDIR}/NIRHTa+001.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr ${OUTPUTDIR}/LuCa-7color_Scan1.ome.tiff

OUTPUTDIR="v3_default"
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/Leica-1.ome.zarr ${OUTPUTDIR}/Leica-1.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/NIRHTa+001.ome.zarr ${OUTPUTDIR}/NIRHTa+001.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr ${OUTPUTDIR}/LuCa-7color_Scan1.ome.tiff

OUTPUTDIR="v3_compact"
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/Leica-1.ome.zarr ${OUTPUTDIR}/Leica-1.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/NIRHTa+001.ome.zarr ${OUTPUTDIR}/NIRHTa+001.ome.tiff
time ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff ${OUTPUTDIR}/LuCa-7color_Scan1.ome.zarr ${OUTPUTDIR}/LuCa-7color_Scan1.ome.tiff

The conversion times are comparable

++ OUTPUTDIR=v2_default
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v2_default/Leica-1.ome.zarr v2_default/Leica-1.ome.tiff

real    0m44.424s
user    2m7.105s
sys     0m4.305s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v2_default/NIRHTa+001.ome.zarr v2_default/NIRHTa+001.ome.tiff

real    22m11.110s
user    2m29.976s
sys     2m2.128s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v2_default/LuCa-7color_Scan1.ome.zarr v2_default/LuCa-7color_Scan1.ome.tiff

real    0m43.105s
user    1m57.833s
sys     0m3.692s
++ OUTPUTDIR=v3_default
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_default/Leica-1.ome.zarr v3_default/Leica-1.ome.tiff

real    1m2.385s
user    2m28.143s
sys     0m4.118s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_default/NIRHTa+001.ome.zarr v3_default/NIRHTa+001.ome.tiff

real    22m32.169s
user    2m52.064s
sys     2m1.268s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_default/LuCa-7color_Scan1.ome.zarr v3_default/LuCa-7color_Scan1.ome.tiff

real    0m55.623s
user    2m14.483s
sys     0m3.273s
++ OUTPUTDIR=v3_compact
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_compact/Leica-1.ome.zarr v3_compact/Leica-1.ome.tiff

real    1m3.946s
user    2m27.578s
sys     0m4.166s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_compact/NIRHTa+001.ome.zarr v3_compact/NIRHTa+001.ome.tiff

real    24m8.412s
user    2m56.810s
sys     2m2.914s
++ ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff v3_compact/LuCa-7color_Scan1.ome.zarr v3_compact/LuCa-7color_Scan1.ome.tiff

real    0m57.530s
user    2m19.869s
sys     0m3.328s

and the file size are identical

omero@ngff:/mnt/data/seb/raw2ometiff_152$ du -cs v*/*.ome.tiff
4051868	v2_default/Leica-1.ome.tiff
2112872	v2_default/LuCa-7color_Scan1.ome.tiff
2773620	v2_default/NIRHTa+001.ome.tiff
4051868	v3_compact/Leica-1.ome.tiff
2112872	v3_compact/LuCa-7color_Scan1.ome.tiff
2773620	v3_compact/NIRHTa+001.ome.tiff
4051868	v3_default/Leica-1.ome.tiff
2112872	v3_default/LuCa-7color_Scan1.ome.tiff
2773620	v3_default/NIRHTa+001.ome.tiff
26815080	total

The generated OME-TIFF files were all imported into OMERO for comparison and visual inspection which confirmed the data and metadata is identical.

if (isV3()) {
Group v3Series = getZarrGroupV3(s.path);
if (v3Series == null) {
if (seriesGroup == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is the same check as for v2, this probably can be moved outside the if isV3() conditional?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in bab7aaa

Requirements
============

Java 8 or later is required.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in https://github.com/glencoesoftware/bioformats2raw/tree/master?tab=readme-ov-file#development-installation can we also amend the README to update the build requirements?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3f3271e

@melissalinkert
Copy link
Copy Markdown
Member Author

It looks like the only remaining active V2/V3 checks are related pixel type checks. I assume this is because the data type enumerations are different and there is not much we can do here?

Exactly. The v2 and v3 DataType enums are implemented a bit differently, and the ArrayMetadataBuilder implementations require the DataType from the matching package (not the dev.zarr.zarrjava.core.DataType interface). See:

https://github.com/zarr-developers/zarr-java/blob/0.1.0/src/main/java/dev/zarr/zarrjava/v2/DataType.java
https://github.com/zarr-developers/zarr-java/blob/0.1.0/src/main/java/dev/zarr/zarrjava/v3/DataType.java

@erindiel
Copy link
Copy Markdown
Member

erindiel commented Apr 3, 2026

I tested conversion with existing v2/0.4 datasets in gs-public-zarr-archive:

drwxr-xr-x  6 erindiel  staff         192 Apr  3 15:25 03887B26C1_8bit_lynEGFP.zarr
drwxr-xr-x  8 erindiel  staff         256 Apr  3 15:25 CMU-1-Small-Region.zarr

And OpenSlide data converted with the latest bioformats2raw-0.12.0-rc3 with and without --ngff-version 0.5

drwxr-xr-x  5 erindiel  staff         160 Apr  3 15:28 CMU-1-05.zarr
drwxr-xr-x  6 erindiel  staff         192 Apr  3 15:27 CMU-1.zarr
drwxr-xr-x  6 erindiel  staff         192 Apr  3 15:29 Zeiss-5-Uncompressed-05.zarr
drwxr-xr-x  7 erindiel  staff         224 Apr  3 15:28 Zeiss-5-Uncompressed.zarr

All successful conversions opened in FIJI without issue for visualization.

CMU-1-Small-Region.zarr failed to convert, but that is also true of raw2ometiff-0.9.0 release, so it is unrelated to this PR.

% ./raw2ometiff-0.10.0-SNAPSHOT/bin/raw2ometiff CMU-1-Small-Region.zarr CMU-1-Small-Region.ome.tiff
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.pyramid.PyramidFromDirectoryWriter@524d6d96): java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
	at picocli.CommandLine.executeUserObject(CommandLine.java:2050)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
	at picocli.CommandLine.call(CommandLine.java:2875)
	at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.main(PyramidFromDirectoryWriter.java:569)
Caused by: java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
	at java.base/java.util.Objects.checkIndex(Objects.java:361)
	at java.base/java.util.ArrayList.get(ArrayList.java:427)
	at ome.xml.model.OME.getImage(OME.java:730)
	at ome.xml.meta.OMEXMLMetadataImpl.getPixelsSizeX(OMEXMLMetadataImpl.java:3069)
	at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.initialize(PyramidFromDirectoryWriter.java:1192)
	at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.call(PyramidFromDirectoryWriter.java:630)
	at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.call(PyramidFromDirectoryWriter.java:110)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
	... 9 more

@melissalinkert
Copy link
Copy Markdown
Member Author

CMU-1-Small-Region.zarr failed to convert, but that is also true of raw2ometiff-0.9.0 release, so it is unrelated to this PR.

It looks like the OME/METADATA.ome.xml only records one Image, but there are 3 arrays present in the Zarr. The .zattrs files indicate that array 0 was generated from Bio-Formats 7.2.0, and for the other arrays 6.12.0. Timestamps also reflect that the OME and 0 subdirectories were updated more recently than the 1 and 2 arrays.

There were a number of changes in Bio-Formats between 6.12.0 and 7.2.0 that would have impacted SVS image counts. As far as I can tell from our repository configuration data, the correct Image count for CMU-1-Small-Region.svs is expected to be one. Likely the Zarr dataset was regenerated at some point and synced without specifying the --delete option. We can look into cleaning that up separately, I just wanted to investigate quickly to make sure this wasn't some weird bug in either conversion tool.

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Apr 4, 2026

Fully agreed with #152 (comment) on what created the inconsistent state of the current CMU-1-Small-Region.zarr dataset in the gs-public-zarr-archive AWS S3. We should be able to fix this inconsistency either by surgically removing the obsolete groups

aws s3 rm --recursive s3://gs-public-zarr-archive/CMU-1-Small-Region.zarr/1
aws s3 rm --recursive s3://gs-public-zarr-archive/CMU-1-Small-Region.zarr/2

or by recreating a new version of the Zarr dataset, deleting the existing one and replacing it altogether.

@melissalinkert
Copy link
Copy Markdown
Member Author

Since this is approved by all reviewers, merging as mentioned in today's staff meeting so we can proceed with #151.

@melissalinkert melissalinkert merged commit 19535c7 into glencoesoftware:master Apr 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants