Description
Is your feature request related to a problem?
In the radio astronomy domain specific xarray-ms, we construct a DataTree representing partitions of a legacy data format where each partition contains regular data cubes. As currently implemented, the custom backend supports a partition_chunks
kwarg in the BackendEntrypoint.open_datatree
method so that it is possible to specify different chunking schemas per partition:
The chunking specification above is specific to a radio astronomy legacy format, but it may be more generally useful to be able to specify per-DataTree node chunking.
Describe the solution you'd like
Currently, BackendEntrypoint.open_datatree
passes it's chunks
kwarg to each Dataset
constructor in the DataTree. This is quite coarse-grained as it applies the same chunking schema to all Datasets in the DataTree.
I propose that the chunks
kwarg in BackendEntrypoint.open_datatree
support a chunking dictionary per path (i.e. DataTree Node). For example:
import xarray
xdt = xarray.open_datatree(..., chunks={
"/path/to/node1": {"time": 20, "frequency": 16},
"/path/to/a/node2": {"time": 10, "frequency": 4},
}
Then, when constructing Datasets in the DataTree, the chunking schema appropriate to the node can be applied.
An entry in the above dictionary does not necessarily need to only apply to a single node. It could also apply the chunking schema to each subtree below the node. But it may be better to make this more explicit
xd = xarray.open_datatree(..., chunks={
# Apply to node1 and any node below
"/path/to/node1/...": {"time": 20, "frequency": 16}
}
Describe alternatives you've considered
We've implemented a custom partition_chunks
kwarg argument in the BackendEntrypoint.open_datatree
method for our legacy data format.
Additional context
No response