Parallel writing to shards #311

ziw-liu · 2025-07-03T17:09:44Z

Update create_empty_plate to expose sharding.
Add apply_transform_to_tczyx_and_save to loop within a shard and multi-process across shards with process_single_position.
Use tensorstore for writing to work around Reading sharded array with multiprocessing produces wrong result zarr-developers/zarr-python#3221
- Use spawn for all platforms when creating multiprocessing pool due to Segfault when using PyTorch Dataloader with multiple workers google/tensorstore#61
- Add explicit GC calls due to Memory question google/tensorstore#223

aliddell

I think this looks OK, just one thing I have a question about.

aliddell · 2025-07-29T13:10:42Z

iohub/ngff/utils.py

        Defaults to None.
-    scale : Tuple[float], optional
+    shards_ratio : tuple[int, ...], optional
+        TCZYX shards ratio of the plate.


What does "shards ratio" mean?

How many chunks per shard along each dimension. I think this is easier to use than the exact shard size, since they have to be divisible by the chunk size.

aliddell · 2025-07-29T13:51:51Z

iohub/ngff/utils.py

+    if output_plate.version == "0.4" and shards_ratio is not None:
+        raise ValueError("Sharding is not supported in OME-Zarr version 0.4.")


Why not ignore in this case? Based on the documentation, users should not expect an exception here.

Emitting something here could help catch some input error. But maybe a warning is better.

ziw-liu · 2025-07-29T18:07:11Z

@aliddell test is failing for the recent acquire-zarr release. Do you have a migration guide?

 ERROR tests/ngff/test_ngff.py::test_acquire_zarr_ome_zarr_05 - TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. acquire_zarr.StreamSettings(*, store_path: str | None = None, s3: acquire_zarr.S3Settings | None = None, version: acquire_zarr.ZarrVersion | None = None, max_threads: typing.SupportsInt | None = None, overwrite: bool | None = None, arrays: list | None = None)

… API and downsampling behavior

aliddell · 2025-07-29T18:33:51Z

@aliddell test is failing for the recent acquire-zarr release. Do you have a migration guide?

 ERROR tests/ngff/test_ngff.py::test_acquire_zarr_ome_zarr_05 - TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. acquire_zarr.StreamSettings(*, store_path: str | None = None, s3: acquire_zarr.S3Settings | None = None, version: acquire_zarr.ZarrVersion | None = None, max_threads: typing.SupportsInt | None = None, overwrite: bool | None = None, arrays: list | None = None)

No, but I should write one. I've fixed the fixture and test for now.

ieivanov · 2025-08-08T21:54:00Z

@ziw-liu as you outline in the PR description, here you've worked around some bugs in libraries we depend on. Could you leave notes (here or some other place) on what the code would ideally look like once these bugs are resolved - for example, we should be able to use both tensorstore and zarr python to write arrays.

ziw-liu · 2025-08-08T23:12:15Z

we should be able to use both tensorstore and zarr python to write arrays.

In this specific case it doesn't really matter to the user that process_single_position is using tensorstore. And I doubt zarr-python will be able to match tensorstore's performance anytime soon. Can you elaborate on the benefit of having two internal code paths here?

ieivanov · 2025-08-08T23:21:29Z

I'm not really suggesting two code paths here, I'm just looking to document your though process for the next person who'd carry on this work. For example, would the code be better off if we don't do explicit GC or spawn for all platforms when creating multiprocessing?

* Fix tensorstore empty array handling - Add validation for empty arrays in _save_transformed before tensorstore write - Skip write operations for empty arrays with warning messages - Add comprehensive error handling with detailed diagnostics for tensorstore failures - Improve error messages to include array shapes, sizes, and tensorstore details This resolves the ValueError: Error aligning dimensions issue when empty arrays are passed to tensorstore write operations. * Add empty results check to prevent tensorstore alignment errors Adds validation in apply_transform_to_tczyx_and_save() to check for empty results dictionary before calling _save_transformed(). When no valid time points are available, logs diagnostic message and skips write operation instead of attempting to write empty arrays to tensorstore, which causes alignment dimension mismatches. * Revert "Fix tensorstore empty array handling" This reverts commit 65c9ddb. * better handling of output_time_indices * style

ieivanov

I'm happy with this PR. I've tested that it correctly saves data in zarr v3 format using biahub concatenate in conjunction with czbiohub-sf/biahub#104 and I've also tested that the changes here don't break existing pipelines (specifically biahub deskew) which for now will continue writing in zarr v2.

ieivanov · 2025-09-04T01:07:32Z

I'm seeing that process_single_position only supports batching along the time dimension, I'll try to quickly implement batching along channel dimension as well, otherwise we should throw an error.

ieivanov · 2025-09-04T01:33:48Z

Added an error message, sharding along channel dimensions is not immediately straightforward. First attempt here: https://github.com/czbiohub-sf/iohub/tree/batched_channel_processing, if it's at all possible then apply_transform_to_tczyx_and_save will need more work. I'm good with merging this PR, will update czbiohub-sf/biahub#104 after

ziw-liu added 4 commits July 2, 2025 16:46

update signature type hints

d666570

clarify the type of position keys

08e3018

expose sharding in plate creation

d1d1934

isort

37fa029

ziw-liu mentioned this pull request Jul 3, 2025

Concatenate with OME-Zarr v0.5 and sharding czbiohub-sf/biahub#104

Merged

ziw-liu added the NGFF OME-NGFF (OME-Zarr format) label Jul 3, 2025

ziw-liu added this to the 0.3.0 milestone Jul 3, 2025

ziw-liu added 18 commits July 3, 2025 11:54

fix default version

12901ec

fix example code block format

c6cc49f

fix type hints

1cf881a

utility to split indices by shards

9e8f87d

separate apply and save

5d75a2e

simplify random number generation

c560e45

wip: batched writing in time

fd67281

match storage keys with values

1fd0f43

removed unused argument

5a06067

fix import

a41b535

fix string formatting

ed05be7

add more version testing

cb0de21

use tensorstore instead

0482c11

control tensorstore concurrency

e130144

memory management

61e743a

isort

67fd905

format

1edcd30

remove platform check

f6d1e5e

ziw-liu marked this pull request as ready for review July 11, 2025 00:30

ziw-liu requested review from JoOkuma, aliddell, edyoshikun and ieivanov July 11, 2025 00:30

aliddell reviewed Jul 29, 2025

View reviewed changes

ziw-liu added 2 commits July 29, 2025 10:54

warning if shards is specified for 0.4

8727973

set shards to none for 0.4

e90065b

Update acquire-zarr OME v0.5 fixture / aqz test to reflect new config…

6787d64

… API and downsampling behavior

aliddell approved these changes Aug 7, 2025

View reviewed changes

ziw-liu mentioned this pull request Aug 8, 2025

OME-NGFF v0.5 (with Zarr v3) implementation plan #271

Closed

2 tasks

ziw-liu and others added 7 commits August 8, 2025 17:19

add notes about upstream issues

af7c344

add example of sharded plate

80d809c

explicitly add layout in open_ome_zarr

ffc8a9d

bugfix and better type hints

34ba711

better messaging

8c903b6

style

8fc3346

ieivanov approved these changes Sep 4, 2025

View reviewed changes

raise error if attempting sharding along channel dimension

e71c086

ziw-liu merged commit 2d1c5fb into zarr3-dev Sep 4, 2025
7 checks passed

ziw-liu deleted the write-sharding branch September 4, 2025 21:39

srivarra mentioned this pull request Sep 8, 2025

One _echo_finished and added TransformFunction Protocol #328

Draft

		if output_plate.version == "0.4" and shards_ratio is not None:
		raise ValueError("Sharding is not supported in OME-Zarr version 0.4.")

Parallel writing to shards #311

Parallel writing to shards #311

Uh oh!

Conversation

ziw-liu commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aliddell left a comment

Choose a reason for hiding this comment

Uh oh!

aliddell Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

ziw-liu Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

aliddell Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

ziw-liu Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

ziw-liu Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

ziw-liu commented Jul 29, 2025

Uh oh!

aliddell commented Jul 29, 2025

Uh oh!

ieivanov commented Aug 8, 2025

Uh oh!

ziw-liu commented Aug 8, 2025

Uh oh!

ieivanov commented Aug 8, 2025

Uh oh!

ieivanov left a comment

Choose a reason for hiding this comment

Uh oh!

ieivanov commented Sep 4, 2025

Uh oh!

ieivanov commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ziw-liu commented Jul 3, 2025 •

edited

Loading

ieivanov commented Sep 4, 2025 •

edited

Loading