make write_zarr respect the public chunks parameter when the input is sparse

### Deliberate parameterization of the output zarr geometry should be possible when X is sparse.

I'm noticing that on huge sparse arrays, the chunks parameter does not seem to work based on [this block in io.write_zarr](https://github.com/scverse/anndata/blob/b721425edf4eaae64907be7682a81e42fe752d81/src/anndata/_io/zarr.py#L64), instead falling back to auto sharding:
```
    def callback(
        write_func, store, elem_name: str, elem, *, dataset_kwargs, iospec
    ) -> None:
        if (
            chunks is not None
            and not isinstance(elem, sparse.spmatrix)
            and elem_name.lstrip("/") == "X"
        ):
            dataset_kwargs = dict(dataset_kwargs, chunks=chunks)
        write_func(store, elem_name, elem, dataset_kwargs=dataset_kwargs)
```

The issue is that empirically, auto sharding often leads to tens of millions of inodes when huge sparse arrays are passed.

It appears that even when auto-sharding is turned off, we still can't set the chunk geometry explicitly for sparse stores (let alone the shard factor).

What is the reason for this deliberate dropping of the publicly facing chunks argument?

Autosharding is great, but it should not be the only way to write sparse arrays when the user wants to obtain artifacts with fewer chunks due to any number of real-world constraints, such as HPC inode quotas, faster writes, etc.
Right now the only way to do this seems to be to let anndata write whatever store geometry it guesses at and then re-export these manually with zarr-python api calls, effectively writing the entire artifact twice...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make write_zarr respect the public chunks parameter when the input is sparse #2415

Deliberate parameterization of the output zarr geometry should be possible when X is sparse.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

make write_zarr respect the public chunks parameter when the input is sparse #2415

Description

Deliberate parameterization of the output zarr geometry should be possible when X is sparse.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions