Skip to content

Commit 82dcbdd

Browse files
committed
docs: make the docs better
1 parent 6467bf8 commit 82dcbdd

File tree

2 files changed

+21
-8
lines changed

2 files changed

+21
-8
lines changed

docs/index.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,11 @@ print(spec.model_dump())
5252

5353
More examples can be found in the [usage guide](usage_zarr_v2.md).
5454

55+
## Installation
56+
57+
`pip install -U pydantic-zarr`
58+
59+
5560
### Limitations
5661

5762
#### No array data operations
@@ -61,11 +66,6 @@ This library only provides tools to represent the *layout* of Zarr groups and ar
6166

6267
This library supports [version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html) of the Zarr format, with partial support for [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). Progress towards complete support for Zarr v3 is tracked by [this issue](https://github.com/d-v-b/pydantic-zarr/issues/3).
6368

64-
65-
## Installation
66-
67-
`pip install -U pydantic-zarr`
68-
6969
## Design
7070

7171
A Zarr group can be modeled as an object with two properties:
@@ -79,12 +79,25 @@ Note the use of the term "modeled": Zarr arrays are useful because they store N-
7979

8080
In `pydantic-zarr`, Zarr groups are modeled by the `GroupSpec` class, which is a [`Pydantic model`](https://docs.pydantic.dev/latest/concepts/models/) with two fields:
8181

82-
- `GroupSpec.attributes`: either a `Mapping` or a `pydantic.BaseModel`.
83-
- `GroupSpec.members`: a mapping with string keys and values that must be `GroupSpec` or `ArraySpec` instances.
82+
- `attributes`: either a `Mapping` or a `pydantic.BaseModel`.
83+
- `members`: either a mapping with string keys and values that must be `GroupSpec` or `ArraySpec` instances, or the value `Null`. The use of nullability is explained in its own [section](#nullable-members).
8484

8585
Zarr arrays are represented by the `ArraySpec` class, which has a similar `attributes` field, as well as fields for all the Zarr array properties (`dtype`, `shape`, `chunks`, etc).
8686

8787
`GroupSpec` and `ArraySpec` are both [generic models](https://docs.pydantic.dev/1.10/usage/models/#generic-models). `GroupSpec` takes two type parameters, the first specializing the type of `GroupSpec.attributes`, and the second specializing the type of the *values* of `GroupSpec.members` (the keys of `GroupSpec.members` are always strings). `ArraySpec` only takes one type parameter, which specializes the type of `ArraySpec.attributes`.
8888

8989
Examples using this generic typing functionality can be found in the [usage guide](usage_zarr_v2.md#using-generic-types).
9090

91+
### Nullable `members`
92+
93+
When a Zarr group has no members, a `GroupSpec` model of that Zarr group will have its `members` attribute set to the empty dict `{}`. But there are scenarios where the members of a Zarr group are unknown:
94+
95+
- Some Zarr storage backends do not support directory listing, in which case it is possible to access a Zarr group and inspect its attributes, but impossible to discover its members. So the members of such a Zarr group are unknown.
96+
- Traversing a deeply nested large Zarr group on high latency storage can be slow. This can be mitigated by only partially traversing the hierarchy, e.g. only inspecting the root group and N subgroups. This defines a sub-hierarchy of the full hierarchy; leaf groups of this subtree by definition did not have their members checked, and so their members are unknown.
97+
- A Zarr hierarchy can be represented as a mapping `M` from paths to nodes (array or group). In this case, if `M["key"]` is a model of a Zarr group `G`, then `M["key/subkey"]` would encode a member of `G`. Since the key structure of the mapping `M` is doing the work of encoding the members of `G`, there is no value in `G` having a members attribute that claims anything about the members of `G`, and so `G.members` should be modeled as unknown.
98+
99+
To handle these cases, `pydantic-zarr` allows the `members` attribute of a `GroupSpec` to be `Null`.
100+
101+
## Standardization
102+
103+
The Zarr specifications do not define a model of the Zarr hierarchy. `pydantic-zarr` is an implementation of a particular model that can be found formalized in this [specification document](https://github.com/d-v-b/zeps/blob/zom/draft/ZEP0006.md), which has been proposed for inclusion in the Zarr specifications. You can find the discussion of that proposal in [this pull request](https://github.com/zarr-developers/zeps/pull/46).

docs/usage_zarr_v2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
The `GroupSpec` and `ArraySpec` classes represent Zarr v2 groups and arrays, respectively. To create an instance of a `GroupSpec` or `ArraySpec` from an existing Zarr group or array, pass the Zarr group / array to the `.from_zarr` method defined on the `GroupSpec` / `ArraySpec` classes. This will result in a `pydantic-zarr` model of the Zarr object.
88

9-
Note that `GroupSpec.from_zarr(zarr_group)` will traverse the entire hierarchy under `zarr_group`. Future versions of this library may introduce a limit on the depth of this traversal: see [#2](https://github.com/d-v-b/pydantic-zarr/issues/2).
9+
> By default `GroupSpec.from_zarr(zarr_group)` will traverse the entire hierarchy under `zarr_group`. This can be extremely slow if used on an extensive Zarr group on high latency storage. To limit the depth of traversal to a specific depth, use the `depth` keyword argument, e.g. `GroupSpec.from_zarr(zarr_group, depth=1)`
1010
1111
Note that `from_zarr` will *not* read the data inside an array.
1212

0 commit comments

Comments
 (0)