Skip to content

Multiple variables in the same shard file #369

@mjwillson

Description

@mjwillson

Hello!

With zarr v3 sharding, multiple logical chunks can now be stored in a single file, which is great.

Has anyone thought about supporting multiple variables in a single shard file (assuming the variables' shapes are aligned, at least along any dimensions that are chunked)?

We've found it can be a lot more efficient to read data stored this way from cloud storage, when there's a consistent set of variables that you want to read all (or most) of as a group. Likely for similar reasons to why sharding of logical chunks can help: it reduces the number of separate small files that you need to interact with via cloud storage APIs.

This is possible on an ad-hoc basis, e.g. using xarray's Dataset.to_dataarray to stack multiple variables into a single variable, and then saving to zarr, then undoing it after loading the zarr. But it's fundamentally a storage-level optimization and it would be nice if it could be handled at the storage level in a way that's more transparent to the user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions