Please describe your wishes and possible alternatives to achieve the desired result.
#2033 at least makes it so that we are no longer creating arrays with very small chunks unnecessarily but from a performance perspective, mapping dask chunks to chunks on disk is not a good idea. Especially with the codec pipeline in zarr being parallelized, the need for dask workers to be responsible for speeding up io is lessened and thus cramming more parallel reads within a worker is probably sensible. I imagine the same is true for hdf5 even if it isn't parallel i.e., doing more/bigger reads per-dask-chunk.
See https://gist.github.com/lgarrison/ef2922fdb8ca78e8753c39758afedb56 for a related example of this issue where having few python process workers but utilizing the rust-based parallelization of zarrs improves throughput.
See https://github.com/zarrs/zarr_benchmarks/blob/9679f36ca795cce65adc603ae41147324208d3d9/scripts/zarr_dask_python_benchmark_roundtrip.py#L22 for using shards as the chunk size for dask (since shards contain chunks) as a starting point for implementing better defaults.
In general some todos:
Please describe your wishes and possible alternatives to achieve the desired result.
#2033 at least makes it so that we are no longer creating arrays with very small chunks unnecessarily but from a performance perspective, mapping dask chunks to chunks on disk is not a good idea. Especially with the codec pipeline in zarr being parallelized, the need for dask workers to be responsible for speeding up io is lessened and thus cramming more parallel reads within a worker is probably sensible. I imagine the same is true for hdf5 even if it isn't parallel i.e., doing more/bigger reads per-dask-chunk.
See https://gist.github.com/lgarrison/ef2922fdb8ca78e8753c39758afedb56 for a related example of this issue where having few python process workers but utilizing the rust-based parallelization of
zarrsimproves throughput.See https://github.com/zarrs/zarr_benchmarks/blob/9679f36ca795cce65adc603ae41147324208d3d9/scripts/zarr_dask_python_benchmark_roundtrip.py#L22 for using
shardsas the chunk size for dask (since shards contain chunks) as a starting point for implementing better defaults.In general some todos: