Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
import pytest
import xarray as xr
import zarr
from obspec_utils import ObjectStoreRegistry
from obstore.store import LocalStore
from xarray.core.variable import Variable

# Local imports
from virtualizarr.manifests import ChunkManifest, ManifestArray
from virtualizarr.manifests.manifest import join
from virtualizarr.manifests.utils import create_v3_array_metadata
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.utils import ceildiv


Expand Down
5 changes: 2 additions & 3 deletions docs/api/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ See the page on data structures for more information.

## Registry

::: virtualizarr.registry.Url
[Urls][virtualizarr.registry.Url] should be parseable by [urllib.parse.urlparse][].
::: virtualizarr.registry.ObjectStoreRegistry
... note
`virtualizarr.registry.ObjectStoreRegistry has been deprecated. Please use [obspec_utils.ObjectStoreRegistry][] instead.

## Array API

Expand Down
1 change: 0 additions & 1 deletion docs/api/serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@

::: virtualizarr.accessor.VirtualiZarrDatasetAccessor.to_icechunk
::: virtualizarr.accessor.VirtualiZarrDatasetAccessor.to_kerchunk

::: virtualizarr.accessor.VirtualiZarrDataTreeAccessor.to_icechunk
10 changes: 6 additions & 4 deletions docs/custom_parsers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ This is advanced material intended for 3rd-party developers, and assumes you hav

## What is a VirtualiZarr parser?

All VirtualiZarr parsers are simply callables that accept the URL pointing to a data source and a [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] that may contain instantiated [ObjectStores][obstore.store.ObjectStore] that can read from that URL, and return an instance of the [`virtualizarr.manifests.ManifestStore`][] class containing information about the contents of the data source.
All VirtualiZarr parsers are simply callables that accept the URL pointing to a data source and a [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] that may contain instantiated [ObjectStores][obstore.store.ObjectStore] that can read from that URL, and return an instance of the [`virtualizarr.manifests.ManifestStore`][] class containing information about the contents of the data source.

```python
from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.registry import ObjectStoreRegistry


def custom_parser(url: str, registry: ObjectStoreRegistry) -> ManifestStore:
Expand Down Expand Up @@ -234,10 +235,11 @@ For example we could test the ability of VirtualiZarr's in-built [`HDFParser`][v

```python
import xarray.testing as xrt
from obspec_utils import ObjectStoreRegistry
from obstore.store import LocalStore

from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obstore.store import LocalStore


project_directory = "/Users/user/my-project"
project_url = f"file://{project_directory}"
Expand Down
5 changes: 3 additions & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,11 @@ In general once the Icechunk specification reaches a stable v1.0, we would recom
No - you can simply open the Kerchunk-formatted references you already have into VirtualiZarr directly. Then you can manipulate them, or re-save them into a new format, such as [Icechunk](https://icechunk.io/):

```python
from obstore.store import LocalStore
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.parsers import KerchunkJSONParser, KerchunkParquetParser
from obstore.store import LocalStore

project_dir="/Users/user/project-dir"
project_url=f"file://{project_dir}"
Expand Down
5 changes: 3 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,10 @@ First, import the necessary functions and classes:
import icechunk
import obstore

from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
```

Zarr can emit a lot of warnings about Numcodecs not being including in the Zarr version 3
Expand Down Expand Up @@ -67,7 +68,7 @@ path = "NEX-GDDP-CMIP6/ACCESS-CM2/ssp126/r1i1p1f1/tasmax/tasmax_day_ACCESS-CM2_s
store = obstore.store.from_url(bucket, region="us-west-2", skip_signature=True)
```

We also need to create an [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] that
We also need to create an [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] that
maps the URL structure to the ObjectStore.

```python exec="on" source="above" session="homepage"
Expand Down
14 changes: 7 additions & 7 deletions docs/migration_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,19 @@ vds = open_virtual_dataset("data1.nc")
```

To provide a more extensible and reliable API, VirtualiZarr V2 requires more explicit configuration by the user.
You now must pass in a valid [Parser][virtualizarr.parsers.typing.Parser] and a [virtualizarr.registry.ObjectStoreRegistry][] to [virtualizarr.open_virtual_dataset][].
You now must pass in a valid [Parser][virtualizarr.parsers.typing.Parser] and a [obspec_utils.ObjectStoreRegistry][] to [virtualizarr.open_virtual_dataset][].
This change adds a bit more verbosity, but is intended to make virtualizing datasets more robust. It is most common for the
[ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] to contain one or more [ObjectStores][obstore.store.ObjectStore]
for reading the original data, but some parsers may accept an empty [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry].
[ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] to contain one or more [ObjectStores][obstore.store.ObjectStore]
for reading the original data, but some parsers may accept an empty [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry].

=== "S3 Store"

```python exec="on" source="material-block" session="migration" result="code"
from obstore.store import S3Store
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

bucket = "nex-gddp-cmip6"
store = S3Store(
Expand All @@ -57,10 +57,10 @@ for reading the original data, but some parsers may accept an empty [ObjectStore


from obstore.store import LocalStore
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

from pathlib import Path

Expand Down Expand Up @@ -116,15 +116,15 @@ vds.vz.to_icechunk(icechunk_store)
In Virtualizarr V1 if you wanted to access the underlying chunks of a dataset, you first had to write the reference to disk. From there you could read those references back into Xarray and access the chunks like you would with a normal Xarray dataset.

In V2 you can now **directly read the chunks from a Parser into Xarray without writing them to disk first**. 🤯
Since each `Parser` is now responsible for creating a [ManifestStore][virtualizarr.manifests.ManifestStore] and the [ManifestStore][virtualizarr.manifests.ManifestStore] has the ability to fetch data through any [ObjectStore][obstore.store.ObjectStore] in the [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry]. You
Since each `Parser` is now responsible for creating a [ManifestStore][virtualizarr.manifests.ManifestStore] and the [ManifestStore][virtualizarr.manifests.ManifestStore] has the ability to fetch data through any [ObjectStore][obstore.store.ObjectStore] in the [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry]. You
can load data using the [ManifestStore][virtualizarr.manifests.ManifestStore] via either Zarr or Xarray. Here's an example using Xarray:

```python exec="on" source="material-block" session="migration" result="code"
import xarray as xr
from obstore.store import S3Store
from obspec_utils import ObjectStoreRegistry

from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

bucket = "nex-gddp-cmip6"
store = S3Store(
Expand Down
14 changes: 7 additions & 7 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ that can access your data. Available ObjectStores are described in the [obstore

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obspec_utils import ObjectStoreRegistry

bucket = "s3://nex-gddp-cmip6"
path = "NEX-GDDP-CMIP6/ACCESS-CM2/ssp126/r1i1p1f1/tasmax/tasmax_day_ACCESS-CM2_ssp126_r1i1p1f1_gn_2015_v2.0.nc"
Expand All @@ -42,7 +42,7 @@ that can access your data. Available ObjectStores are described in the [obstore

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obspec_utils import ObjectStoreRegistry

bucket = "gs://data-bucket"
path = "file-path/data.nc"
Expand All @@ -55,13 +55,13 @@ that can access your data. Available ObjectStores are described in the [obstore
=== "Azure"

```python

import xarray as xr
from obspec_utils import ObjectStoreRegistry
from obstore.store import from_url


from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

bucket = "abfs://data-container"
path = "file-path/data.nc"
Expand All @@ -77,10 +77,10 @@ that can access your data. Available ObjectStores are described in the [obstore

import xarray as xr
from obstore.store import from_url
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

# This examples uses a NetCDF file of CMIP6 from ESGF.
bucket = 'https://esgf-data.ucar.edu'
Expand All @@ -96,10 +96,10 @@ that can access your data. Available ObjectStores are described in the [obstore

import xarray as xr
from obstore.store import S3Store
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

endpoint = "https://nyu1.osn.mghpcc.org"
access_key_id = "<access_key_id>"
Expand All @@ -124,10 +124,10 @@ that can access your data. Available ObjectStores are described in the [obstore

import xarray as xr
from obstore.store import LocalStore
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

from pathlib import Path

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ plugins:
- https://numpy.org/doc/stable/objects.inv
- https://numcodecs.readthedocs.io/en/stable/objects.inv
- https://zarr.readthedocs.io/en/stable/objects.inv
- https://obspec-utils.readthedocs.io/en/stable/objects.inv
- https://developmentseed.org/obstore/latest/objects.inv
- https://filesystem-spec.readthedocs.io/en/latest/objects.inv
- https://requests.readthedocs.io/en/latest/objects.inv
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ dependencies = [
"packaging",
"zarr>=3.1.0",
"obstore>=0.5.1",
"obspec_utils>=0.4.0",
]

# Dependency sets under optional-dependencies are available via PyPI
Expand Down
4 changes: 2 additions & 2 deletions virtualizarr/manifests/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from typing import TYPE_CHECKING, Literal, TypeAlias
from urllib.parse import urlparse

from obspec_utils import ObjectStoreRegistry
from zarr.abc.store import (
ByteRequest,
OffsetByteRequest,
Expand All @@ -18,7 +19,6 @@
from virtualizarr.manifests.array import ManifestArray
from virtualizarr.manifests.group import ManifestGroup
from virtualizarr.manifests.utils import parse_manifest_index
from virtualizarr.registry import ObjectStoreRegistry

if TYPE_CHECKING:
from obstore.store import (
Expand Down Expand Up @@ -93,7 +93,7 @@ class ManifestStore(Store):
Root group of the store.
Contains group metadata, [ManifestArrays][virtualizarr.manifests.ManifestArray], and any subgroups.
registry : ObjectStoreRegistry
[ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] that maps the URL scheme and netloc to [ObjectStore][obstore.store.ObjectStore] instances,
[ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] that maps the URL scheme and netloc to [ObjectStore][obstore.store.ObjectStore] instances,
allowing ManifestStores to read from different ObjectStore instances.

Warnings
Expand Down
5 changes: 2 additions & 3 deletions virtualizarr/parsers/dmrpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from xml.etree import ElementTree as ET

import numpy as np
from obspec_utils import ObjectStoreRegistry, ObstoreReader
from obstore.store import ObjectStore

from virtualizarr.manifests import (
Expand All @@ -15,9 +16,7 @@
)
from virtualizarr.manifests.utils import create_v3_array_metadata
from virtualizarr.parsers.utils import encode_cf_fill_value
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types import ChunkKey
from virtualizarr.utils import ObstoreReader


class DMRPPParser:
Expand Down Expand Up @@ -54,7 +53,7 @@ def __call__(
url
The URL of the input DMR++ file (e.g., "s3://bucket/file.dmrpp").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
5 changes: 3 additions & 2 deletions virtualizarr/parsers/fits.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from pathlib import Path
from typing import Iterable, Optional

from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types.kerchunk import KerchunkStoreRefs


Expand Down Expand Up @@ -45,7 +46,7 @@ def __call__(
url
The URL of the input FITS file (e.g., "s3://bucket/file.fits").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
6 changes: 3 additions & 3 deletions virtualizarr/parsers/hdf/hdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
)

import numpy as np
from obspec_utils import ObjectStoreRegistry, ObstoreReader

from virtualizarr.codecs import zarr_codec_config_to_v3
from virtualizarr.manifests import (
Expand All @@ -20,9 +21,8 @@
from virtualizarr.manifests.utils import create_v3_array_metadata
from virtualizarr.parsers.hdf.filters import codecs_from_dataset
from virtualizarr.parsers.utils import encode_cf_fill_value
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types import ChunkKey
from virtualizarr.utils import ObstoreReader, soft_import
from virtualizarr.utils import soft_import

h5py = soft_import("h5py", "reading hdf files", strict=False)

Expand Down Expand Up @@ -169,7 +169,7 @@ def __call__(
url
The URL of the input HDF5/NetCDF4 file (e.g., `"s3://bucket/store.zarr"`).
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
4 changes: 2 additions & 2 deletions virtualizarr/parsers/kerchunk/json.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from collections.abc import Iterable

import ujson
from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs
from virtualizarr.registry import ObjectStoreRegistry


class KerchunkJSONParser:
Expand Down Expand Up @@ -46,7 +46,7 @@ def __call__(
url
The URL of the input Kerchunk JSON (e.g., "s3://bucket/kerchunk.json").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
5 changes: 3 additions & 2 deletions virtualizarr/parsers/kerchunk/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@
from dataclasses import dataclass, field
from typing import TYPE_CHECKING

from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types.kerchunk import (
KerchunkStoreRefs,
)
Expand Down Expand Up @@ -68,7 +69,7 @@ def __call__(
url
The URL of the input parquet directory (e.g., "s3://bucket/my-kerchunk-references.parq").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
Loading
Loading