A basic default ChunkManager for arrays that report their own chunks #8733
Description
Is your feature request related to a problem?
I'm creating duckarrays for various file backed datastructures for mine that are naturally "chunked". i.e. different parts of the array may appear in completely different files.
Using these "chunks" and the "strides" algorithms can better decide on how to iterate in a convenient manner.
For example, an MP4 file's chunks may be defined as being delimited by I frames, while images stored in a TIFF may be delimited by a page.
So for me, chunks are not so useful for parallel computing, but more for computing locally and choosing the appropriate way to iterate through a large arrays (TB of uncompressed data).
Describe the solution you'd like
I think a default Chunk manager could simply implement compute
as np.asarray
as a default instance, and be a catchall to all other instances.
Advanced users could then go in an reimplement their own chunkmanager, but I was unable to use my duckarrays that incldued a chunk
property because they weren't associated with any chunk manager.
Something as simple as:
diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py
index c009ef48..bf500abb 100644
--- a/xarray/core/parallelcompat.py
+++ b/xarray/core/parallelcompat.py
@@ -681,3 +681,26 @@ class ChunkManagerEntrypoint(ABC, Generic[T_ChunkedArray]):
cubed.store
"""
raise NotImplementedError()
+
+
+class DefaultChunkManager(ChunkMangerEntrypoint):
+ def __init__(self) -> None:
+ self.array_cls = None
+
+ def is_chunked_array(self, data: Any) -> bool:
+ return is_duck_array(data) and hasattr(data, "chunks")
+
+ def chunks(self, data: T_ChunkedArray) -> T_NormalizedChunks:
+ return data.chunks
+
+ def compute(self, *data: T_ChunkedArray | Any, **kwargs) -> tuple[np.ndarray, ...]:
+ raise tuple(np.asarray(d) for d in data)
+
+ def normalize_chunks(self, *args, **kwargs):
+ raise NotImplementedError()
+
+ def from_array(self, *args, **kwargs):
+ raise NotImplementedError()
+
+ def apply_gufunc(self, *args, **kwargs):
+ raise NotImplementedError()
Describe alternatives you've considered
I created my own chunk manager, with my own chunk manager entry point.
Kinda tedious...
Additional context
It seems that this is related to: #7019