-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Hello!
With zarr v3 sharding, multiple logical chunks can now be stored in a single file, which is great.
Has anyone thought about supporting multiple variables in a single shard file (assuming the variables' shapes are aligned, at least along any dimensions that are chunked)?
We've found it can be a lot more efficient to read data stored this way from cloud storage, when there's a consistent set of variables that you want to read all (or most) of as a group. Likely for similar reasons to why sharding of logical chunks can help: it reduces the number of separate small files that you need to interact with via cloud storage APIs.
This is possible on an ad-hoc basis, e.g. using xarray's Dataset.to_dataarray to stack multiple variables into a single variable, and then saving to zarr, then undoing it after loading the zarr. But it's fundamentally a storage-level optimization and it would be nice if it could be handled at the storage level in a way that's more transparent to the user.