Skip to content

Summary: Reading Zarr to a cupy-backed Dataset #16

Open
@TomAugspurger

Description

@TomAugspurger

cupy-xarray had an in-progress PR to enable going from Zarr to a CuPy backed xarray Dataset. That PR was somewhat complicated to implement and restricted users to the kvikio GDSStore. GDSStore is great, especially if you have GPU Direct Storage enabled on your system, but it's limited to local file systems. You wouldn't be able to use other Zarr storage providers, like obstore, fsspec, or icechunk.

Since that PR was started, zarr-python 3.x was released with native support for reading data to host memory. With two small changes to xarray (lazy indexing for cupy arrays, read coordinates to host memory) we're able to support reading from Zarr to a CuPy-backed xarray Dataset, in a way that should feel very natural to users:

>>> import xarray as xr, zarr
>>> zarr.config.enable_gpu()
>>> ds = xr.open_dataset("dataset.zarr", engine="zarr")
>>> print(type(ds.air.data))
<class 'cupy.ndarray'>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions