Skip to content

Conversation

@ziw-liu
Copy link
Contributor

@ziw-liu ziw-liu commented Jul 3, 2025

@ziw-liu ziw-liu added the NGFF OME-NGFF (OME-Zarr format) label Jul 3, 2025
@ziw-liu ziw-liu added this to the 0.3.0 milestone Jul 3, 2025
@ziw-liu ziw-liu marked this pull request as ready for review July 11, 2025 00:30
Copy link
Collaborator

@aliddell aliddell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks OK, just one thing I have a question about.

Defaults to None.
scale : Tuple[float], optional
shards_ratio : tuple[int, ...], optional
TCZYX shards ratio of the plate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "shards ratio" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many chunks per shard along each dimension. I think this is easier to use than the exact shard size, since they have to be divisible by the chunk size.

Comment on lines 110 to 111
if output_plate.version == "0.4" and shards_ratio is not None:
raise ValueError("Sharding is not supported in OME-Zarr version 0.4.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ignore in this case? Based on the documentation, users should not expect an exception here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emitting something here could help catch some input error. But maybe a warning is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jul 29, 2025

@aliddell test is failing for the recent acquire-zarr release. Do you have a migration guide?

 ERROR tests/ngff/test_ngff.py::test_acquire_zarr_ome_zarr_05 - TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. acquire_zarr.StreamSettings(*, store_path: str | None = None, s3: acquire_zarr.S3Settings | None = None, version: acquire_zarr.ZarrVersion | None = None, max_threads: typing.SupportsInt | None = None, overwrite: bool | None = None, arrays: list | None = None)

@aliddell
Copy link
Collaborator

@aliddell test is failing for the recent acquire-zarr release. Do you have a migration guide?

 ERROR tests/ngff/test_ngff.py::test_acquire_zarr_ome_zarr_05 - TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. acquire_zarr.StreamSettings(*, store_path: str | None = None, s3: acquire_zarr.S3Settings | None = None, version: acquire_zarr.ZarrVersion | None = None, max_threads: typing.SupportsInt | None = None, overwrite: bool | None = None, arrays: list | None = None)

No, but I should write one. I've fixed the fixture and test for now.

@ieivanov
Copy link
Contributor

ieivanov commented Aug 8, 2025

@ziw-liu as you outline in the PR description, here you've worked around some bugs in libraries we depend on. Could you leave notes (here or some other place) on what the code would ideally look like once these bugs are resolved - for example, we should be able to use both tensorstore and zarr python to write arrays.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Aug 8, 2025

we should be able to use both tensorstore and zarr python to write arrays.

In this specific case it doesn't really matter to the user that process_single_position is using tensorstore. And I doubt zarr-python will be able to match tensorstore's performance anytime soon. Can you elaborate on the benefit of having two internal code paths here?

@ieivanov
Copy link
Contributor

ieivanov commented Aug 8, 2025

I'm not really suggesting two code paths here, I'm just looking to document your though process for the next person who'd carry on this work. For example, would the code be better off if we don't do explicit GC or spawn for all platforms when creating multiprocessing?

ziw-liu and others added 7 commits August 8, 2025 17:19
* Fix tensorstore empty array handling

- Add validation for empty arrays in _save_transformed before tensorstore write
- Skip write operations for empty arrays with warning messages
- Add comprehensive error handling with detailed diagnostics for tensorstore failures
- Improve error messages to include array shapes, sizes, and tensorstore details

This resolves the ValueError: Error aligning dimensions issue when empty arrays
are passed to tensorstore write operations.

* Add empty results check to prevent tensorstore alignment errors

Adds validation in apply_transform_to_tczyx_and_save() to check for empty
results dictionary before calling _save_transformed(). When no valid time
points are available, logs diagnostic message and skips write operation
instead of attempting to write empty arrays to tensorstore, which causes
alignment dimension mismatches.

* Revert "Fix tensorstore empty array handling"

This reverts commit 65c9ddb.

* better handling of output_time_indices

* style
Copy link
Contributor

@ieivanov ieivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this PR. I've tested that it correctly saves data in zarr v3 format using biahub concatenate in conjunction with czbiohub-sf/biahub#104 and I've also tested that the changes here don't break existing pipelines (specifically biahub deskew) which for now will continue writing in zarr v2.

@ieivanov
Copy link
Contributor

ieivanov commented Sep 4, 2025

I'm seeing that process_single_position only supports batching along the time dimension, I'll try to quickly implement batching along channel dimension as well, otherwise we should throw an error.

@ieivanov
Copy link
Contributor

ieivanov commented Sep 4, 2025

Added an error message, sharding along channel dimensions is not immediately straightforward. First attempt here: https://github.com/czbiohub-sf/iohub/tree/batched_channel_processing, if it's at all possible then apply_transform_to_tczyx_and_save will need more work. I'm good with merging this PR, will update czbiohub-sf/biahub#104 after

@ziw-liu ziw-liu merged commit 2d1c5fb into zarr3-dev Sep 4, 2025
7 checks passed
@ziw-liu ziw-liu deleted the write-sharding branch September 4, 2025 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NGFF OME-NGFF (OME-Zarr format)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants