Skip to content

Conversation

@ziw-liu
Copy link
Contributor

@ziw-liu ziw-liu commented Jun 25, 2025

Enable writing OME-Zarr 0.5 (zarr v3). Tested output with:

Also includes miscellaneous typing improvements.

Breaking changes:

  • Removed (dysfunctional) option for version 0.1 of OME-Zarr.
  • Added sharding option to array creation methods. This is breaking if all the arguments were supplied as positional at the call site.
  • Removed bytes from available input path types.
  • Replaced the default identity transforms with scale of 1.0 to avoid the schema bug: Add identity to image coordinateTransformations schema ome/ngff#152. This is breaking if an application relied on the default value to be identity.

Performance

Warning

Writing sharded arrays with zarr-python is extremely slow. Writing a 128x2000x4000 array with sharding takes more than 20 minutes! Using tensorstore or zarrs-python is >100x faster.

Other observations on the performance hit when switching to the sharded arrays with zarr-python 3:

Other known issues:

@ziw-liu ziw-liu changed the base branch from main to zarr3-dev June 25, 2025 18:25
@ziw-liu ziw-liu added enhancement New feature or request NGFF OME-NGFF (OME-Zarr format) breaking Breaking change labels Jun 25, 2025
@ziw-liu ziw-liu added this to the 0.3.0 milestone Jun 25, 2025
@ziw-liu ziw-liu marked this pull request as ready for review June 27, 2025 21:53
@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jun 27, 2025

cc @bentaculum

@edyoshikun edyoshikun requested a review from ieivanov June 30, 2025 22:54
Copy link
Collaborator

@aliddell aliddell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question for you, but looks good.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jul 2, 2025

@talonchandler @JoOkuma Here's a daxi2 volume converted so that it has 90x smaller chunks but 10x fewer files:

/hpc/projects/comp.micro/test-iohub/daxi/laser_561.zarr

@mattersoflight
Copy link
Collaborator

mattersoflight commented Jul 2, 2025

@ziw-liu thanks for your careful work with this.
If I remember @aliddell's benchmark, acquire-zarr writes sharded zarr arrays faster than tensorstore, but doesn't require many dependencies for build.

@aliddell Is acquire-zarr API a drop in replacement for parts of the code that execute the file i/o?

I haven't had a chance to read the code closely, but I am curious if the zarr layout generation and metadata creation are sufficiently separated from file i/o that we can improve the performance for sharded arrays using acquire-zarr.

@aliddell
Copy link
Collaborator

aliddell commented Jul 2, 2025

@mattersoflight acquire-zarr is not a drop-in replacement, because it doesn't provide random access writing. If iohub had a "streaming mode," then it could suit there, but if you treat your Zarrs as in-memory arrays, it won't.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jul 2, 2025

Merging into the staging branch (#301). We will handle performance in a future PR.

@ziw-liu ziw-liu merged commit 5382c06 into zarr3-dev Jul 2, 2025
7 checks passed
@ziw-liu ziw-liu deleted the write-ome-zarr-v05 branch July 2, 2025 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change enhancement New feature or request NGFF OME-NGFF (OME-Zarr format)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants