Skip to content

Commit b50e303

Browse files
authored
Merge pull request #27 from mlcast-community/fix/detect-zarr-v3-format
fix: detect Zarr v3 format from store files
2 parents c888681 + a0ac4d8 commit b50e303

2 files changed

Lines changed: 31 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased](https://github.com/mlcast-community/mlcast-dataset-validator)
9+
10+
### Fixed
11+
12+
- Detect Zarr v3 format from store files (`zarr.json`) instead of relying on `getattr(ds, "zarr_format", 2)` which always defaulted to v2, causing v3 stores to incorrectly fail the consolidated metadata check [\#27](https://github.com/mlcast-community/mlcast-dataset-validator/pull/27), @franchg
13+
814
## [v0.2.0](https://github.com/mlcast-community/mlcast-dataset-validator/releases/tag/v0.2.0)
915

1016
This release makes the validator easier to use from python and the specs defined in the validator easier to access. This done by allowing for direct calls to validation functions with `xr.Dataset` input. And introducing a cli arg to print selected spec to terminal and adding CI rendering of specs to HTML that are deployted to GitHub Pages for linkable, readable spec docs.

mlcast_dataset_validator/checks/global_attributes/zarr_format.py

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,30 @@ def has_consolidated_metadata(ds, storage_options=None):
4141
return fs.exists(f"{store_root}/.zmetadata")
4242

4343

44+
def _detect_zarr_format(ds, storage_options=None):
45+
"""Detect Zarr format version from the store on disk.
46+
47+
Zarr v3 stores have a ``zarr.json`` file at the root, while v2 stores
48+
have ``.zgroup``. xarray does not expose the format version as a
49+
dataset attribute, so we inspect the store directly.
50+
"""
51+
store_path = ds.encoding.get("source")
52+
if store_path is None:
53+
return 2 # cannot determine, assume v2
54+
55+
if storage_options is None:
56+
storage_options = ds.encoding.get("storage_options")
57+
58+
fs, _, paths = fsspec.get_fs_token_paths(
59+
store_path, storage_options=storage_options
60+
)
61+
store_root = paths[0].rstrip("/")
62+
63+
if fs.exists(f"{store_root}/zarr.json"):
64+
return 3
65+
return 2
66+
67+
4468
@log_function_call
4569
def check_zarr_format(
4670
ds: xr.Dataset,
@@ -55,7 +79,7 @@ def check_zarr_format(
5579
if storage_options is None:
5680
storage_options = ds.encoding.get("storage_options")
5781

58-
zarr_format = getattr(ds, "zarr_format", 2) # Default to Zarr v2
82+
zarr_format = _detect_zarr_format(ds, storage_options)
5983
if zarr_format in allowed_versions:
6084
report.add(
6185
SECTION_ID,

0 commit comments

Comments
 (0)