Skip to content

Commit 0b423c3

Browse files
authored
Merge pull request #28 from janelia-cellmap/default_chunks_fix
Default chunks fix
2 parents fbe124c + 82dcbdd commit 0b423c3

File tree

8 files changed

+395
-107
lines changed

8 files changed

+395
-107
lines changed

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
runs-on: ubuntu-latest
1212
strategy:
1313
matrix:
14-
python-version: ['3.10', '3.11', '3.12']
14+
python-version: ['3.9', '3.10', '3.11', '3.12']
1515
steps:
1616
- uses: actions/checkout@v4
1717
- name: Install dependencies

docs/index.md

Lines changed: 33 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,20 @@ Static typing and runtime validation for Zarr hierachies.
66

77
## Overview
88

9-
`pydantic-zarr` expresses data stored in the [zarr](https://zarr.readthedocs.io/en/stable/) format with [Pydantic](https://docs.pydantic.dev/1.10/). Specifically, `pydantic-zarr` encodes Zarr groups and arrays as [Pydantic models](https://docs.pydantic.dev/1.10/usage/models/). Programmers can use these models to formalize the structure of Zarr hierarchies, and apply type-checking and runtime validation to Zarr data.
9+
`pydantic-zarr` expresses data stored in the [Zarr](https://zarr.readthedocs.io/en/stable/) format with [Pydantic](https://docs.pydantic.dev/1.10/). Specifically, `pydantic-zarr` encodes Zarr groups and arrays as [Pydantic models](https://docs.pydantic.dev/1.10/usage/models/). These models are useful for formalizing the structure of Zarr hierarchies, type-checking Zarr hierarchies, and runtime validation for Zarr-based data.
1010

1111

1212
```python
1313
import zarr
1414
from pydantic_zarr.v2 import GroupSpec
1515

16+
# create a Zarr group
1617
group = zarr.group(path='foo')
18+
# put an array inside the group
1719
array = zarr.create(store = group.store, path='foo/bar', shape=10, dtype='uint8')
1820
array.attrs.put({'metadata': 'hello'})
1921

20-
# this is a pydantic model
22+
# create a pydantic model to model the Zarr group
2123
spec = GroupSpec.from_zarr(group)
2224
print(spec.model_dump())
2325
"""
@@ -48,41 +50,54 @@ print(spec.model_dump())
4850
"""
4951
```
5052

51-
Important note: this library only provides tools to represent the *layout* of Zarr groups and arrays, and the structure of their attributes. It performs no type checking or runtime validation of the multidimensional array data contained inside Zarr arrays.
53+
More examples can be found in the [usage guide](usage_zarr_v2.md).
5254

5355
## Installation
5456

5557
`pip install -U pydantic-zarr`
5658

57-
## Design
5859

59-
A Zarr group can be schematized as two elements:
60+
### Limitations
6061

62+
#### No array data operations
63+
This library only provides tools to represent the *layout* of Zarr groups and arrays, and the structure of their attributes. `pydantic-zarr` performs no type checking or runtime validation of the multidimensional array data contained *inside* Zarr arrays, and `pydantic-zarr` does not contain any tools for efficiently reading or writing Zarr arrays.
6164

62-
- `attributes`: A dict-like object with string keys and JSON-serializable values.
63-
- `members`: A dict-like object with string keys and values that are other Zarr groups, or Zarr arrays.
65+
#### Supported Zarr versions
6466

65-
A Zarr array can be schematized similarly, but without the `members` property, and with a set of arrays-specific properties like `shape`, `dtype`, etc.
67+
This library supports [version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html) of the Zarr format, with partial support for [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). Progress towards complete support for Zarr v3 is tracked by [this issue](https://github.com/d-v-b/pydantic-zarr/issues/3).
6668

67-
Note the use of the term "schematized": Zarr arrays also represent N-dimensional array data, but `pydantic-zarr` does not treat that data as part of the "schema" of a Zarr array.
69+
## Design
6870

69-
Accordingly, in `pydantic-zarr`, Zarr groups are encoded by the `GroupSpec` class with two fields:
71+
A Zarr group can be modeled as an object with two properties:
7072

73+
- `attributes`: A dict-like object, with keys that are strings, values that are JSON-serializable.
74+
- `members`: A dict-like object, with keys that strings and values that are other Zarr groups, or Zarr arrays.
7175

72-
- `GroupSpec.attributes`: either a `Mapping` or a `pydantic.BaseModel`.
73-
- `GroupSpec.members`: a mapping with string keys and values that must be `GroupSpec` or `ArraySpec` instances.
76+
A Zarr array can be modeled similarly, but without the `members` property (because Zarr arrays cannot contain Zarr groups or arrays), and with a set of array-specific properties like `shape`, `dtype`, etc.
7477

75-
Zarr arrays are represented by the `ArraySpec` class, which has a similar `attributes` field, as well as fields for all the Zarr array properties (`dtype`, `shape`, `chunks`, etc).
78+
Note the use of the term "modeled": Zarr arrays are useful because they store N-dimensional array data, but `pydantic-zarr` does not treat that data as part of the "model" of a Zarr array.
79+
80+
In `pydantic-zarr`, Zarr groups are modeled by the `GroupSpec` class, which is a [`Pydantic model`](https://docs.pydantic.dev/latest/concepts/models/) with two fields:
7681

82+
- `attributes`: either a `Mapping` or a `pydantic.BaseModel`.
83+
- `members`: either a mapping with string keys and values that must be `GroupSpec` or `ArraySpec` instances, or the value `Null`. The use of nullability is explained in its own [section](#nullable-members).
84+
85+
Zarr arrays are represented by the `ArraySpec` class, which has a similar `attributes` field, as well as fields for all the Zarr array properties (`dtype`, `shape`, `chunks`, etc).
7786

7887
`GroupSpec` and `ArraySpec` are both [generic models](https://docs.pydantic.dev/1.10/usage/models/#generic-models). `GroupSpec` takes two type parameters, the first specializing the type of `GroupSpec.attributes`, and the second specializing the type of the *values* of `GroupSpec.members` (the keys of `GroupSpec.members` are always strings). `ArraySpec` only takes one type parameter, which specializes the type of `ArraySpec.attributes`.
7988

80-
Examples using this generic typing functionality can be found in the [usage guide](usage.md#using-generic-types).
89+
Examples using this generic typing functionality can be found in the [usage guide](usage_zarr_v2.md#using-generic-types).
8190

82-
## Supported Zarr versions
91+
### Nullable `members`
8392

84-
This library supports [version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html) of the Zarr format, with partial support for [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). Progress towards complete support for Zarr v3 is tracked by [this issue](https://github.com/d-v-b/pydantic-zarr/issues/3).
93+
When a Zarr group has no members, a `GroupSpec` model of that Zarr group will have its `members` attribute set to the empty dict `{}`. But there are scenarios where the members of a Zarr group are unknown:
94+
95+
- Some Zarr storage backends do not support directory listing, in which case it is possible to access a Zarr group and inspect its attributes, but impossible to discover its members. So the members of such a Zarr group are unknown.
96+
- Traversing a deeply nested large Zarr group on high latency storage can be slow. This can be mitigated by only partially traversing the hierarchy, e.g. only inspecting the root group and N subgroups. This defines a sub-hierarchy of the full hierarchy; leaf groups of this subtree by definition did not have their members checked, and so their members are unknown.
97+
- A Zarr hierarchy can be represented as a mapping `M` from paths to nodes (array or group). In this case, if `M["key"]` is a model of a Zarr group `G`, then `M["key/subkey"]` would encode a member of `G`. Since the key structure of the mapping `M` is doing the work of encoding the members of `G`, there is no value in `G` having a members attribute that claims anything about the members of `G`, and so `G.members` should be modeled as unknown.
98+
99+
To handle these cases, `pydantic-zarr` allows the `members` attribute of a `GroupSpec` to be `Null`.
85100

86-
## Supported Pydantic versions
101+
## Standardization
87102

88-
This library is based on Pydantic version 2.
103+
The Zarr specifications do not define a model of the Zarr hierarchy. `pydantic-zarr` is an implementation of a particular model that can be found formalized in this [specification document](https://github.com/d-v-b/zeps/blob/zom/draft/ZEP0006.md), which has been proposed for inclusion in the Zarr specifications. You can find the discussion of that proposal in [this pull request](https://github.com/zarr-developers/zeps/pull/46).

docs/usage_zarr_v2.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
The `GroupSpec` and `ArraySpec` classes represent Zarr v2 groups and arrays, respectively. To create an instance of a `GroupSpec` or `ArraySpec` from an existing Zarr group or array, pass the Zarr group / array to the `.from_zarr` method defined on the `GroupSpec` / `ArraySpec` classes. This will result in a `pydantic-zarr` model of the Zarr object.
88

9-
Note that `GroupSpec.from_zarr(zarr_group)` will traverse the entire hierarchy under `zarr_group`. Future versions of this library may introduce a limit on the depth of this traversal: see [#2](https://github.com/d-v-b/pydantic-zarr/issues/2).
9+
> By default `GroupSpec.from_zarr(zarr_group)` will traverse the entire hierarchy under `zarr_group`. This can be extremely slow if used on an extensive Zarr group on high latency storage. To limit the depth of traversal to a specific depth, use the `depth` keyword argument, e.g. `GroupSpec.from_zarr(zarr_group, depth=1)`
1010
1111
Note that `from_zarr` will *not* read the data inside an array.
1212

@@ -78,7 +78,7 @@ print(dict(group2['bar'].attrs))
7878

7979
### Creating from an array
8080

81-
The `ArraySpec` class has a `from_array` static method that takes a numpy-array-like object and returns an `ArraySpec` with `shape` and `dtype` fields matching those of the array-like object.
81+
The `ArraySpec` class has a `from_array` static method that takes an array-like object and returns an `ArraySpec` with `shape` and `dtype` fields matching those of the array-like object.
8282

8383
```python
8484
from pydantic_zarr.v2 import ArraySpec
@@ -241,53 +241,64 @@ print(GroupSpec.from_flat(tree).model_dump())
241241

242242
The `like` method works by converting both input models to `dict` via `pydantic.BaseModel.model_dump`, and comparing the `dict` representation of the models. This means that instances of two different subclasses of `GroupSpec`, which would not be considered equal according to the `==` operator, will be considered `like` if and only if they serialize to identical `dict` instances.
243243

244-
The `like` method also takes keyword arguments `include` and `exclude`, which results in attributes being explicitly included or excluded from the model comparison. So it's possible to use `like` to check if two `ArraySpec` instances have the same `shape` and `dtype` by calling `array_a.like(array_b, include={'shape', 'dtype'})`. This is useful if you don't care about the compressor or filters and just want to ensure that you can safely write an in-memory array to a Zarr array.
244+
The `like` method takes keyword arguments `include` and `exclude`, which determine the attributes included or excluded from the model comparison. So it's possible to use `like` to check if two `ArraySpec` instances have the same `shape`, `dtype` and `chunks` by calling `array_a.like(array_b, include={'shape', 'dtype', 'chunks'})`. This is useful if you don't care about the compressor or filters and just want to ensure that you can safely write an in-memory array to a Zarr array, which depends just on the two arrays having matching `shape`, `dtype`, and `chunks` attributes.
245245

246246
```python
247247
from pydantic_zarr.v2 import ArraySpec, GroupSpec
248248
import zarr
249249
arr_a = ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))
250-
arr_b = ArraySpec(shape=(2,), dtype='uint8', chunks=(1,)) # array with different shape
250+
# make an array with a different shape
251+
arr_b = ArraySpec(shape=(2,), dtype='uint8', chunks=(1,))
251252

252-
print(arr_a.like(arr_b)) # False, because of mismatched shape
253+
# Returns False, because of mismatched shape
254+
print(arr_a.like(arr_b))
253255
#> False
254256

255-
print(arr_a.like(arr_b, exclude={'shape'})) # True, because we exclude shape.
257+
# Returns True, because we exclude shape.
258+
print(arr_a.like(arr_b, exclude={'shape'}))
256259
#> True
257260

258261
# `ArraySpec.like` will convert a zarr.Array to ArraySpec
259262
store = zarr.MemoryStore()
260-
arr_a_stored = arr_a.to_zarr(store, path='arr_a') # this is a zarr.Array
263+
# This is a zarr.Array
264+
arr_a_stored = arr_a.to_zarr(store, path='arr_a')
261265

262-
print(arr_a.like(arr_a_stored)) # arr_a is like the zarr.Array version of itself
266+
# arr_a is like the zarr.Array version of itself
267+
print(arr_a.like(arr_a_stored))
263268
#> True
264269

265-
print(arr_b.like(arr_a_stored)) # False, because of mismatched shape
270+
# Returns False, because of mismatched shape
271+
print(arr_b.like(arr_a_stored))
266272
#> False
267273

268-
print(arr_b.like(arr_a_stored, exclude={'shape'})) # True, because we exclude shape.
274+
# Returns True, because we exclude shape.
275+
print(arr_b.like(arr_a_stored, exclude={'shape'}))
269276
#> True
270277

271-
# the same thing thing for groups
278+
# The same thing, but for groups
272279
g_a = GroupSpec(attributes={'foo': 10}, members={'a': arr_a, 'b': arr_b})
273280
g_b = GroupSpec(attributes={'foo': 11}, members={'a': arr_a, 'b': arr_b})
274281

275-
print(g_a.like(g_a)) # g_a is like itself
282+
# g_a is like itself
283+
print(g_a.like(g_a))
276284
#> True
277285

278-
print(g_a.like(g_b)) # False, because of mismatched attributes
286+
# Returns False, because of mismatched attributes
287+
print(g_a.like(g_b))
279288
#> False
280289

281-
print(g_a.like(g_b, exclude={'attributes'})) # True, because we ignore attributes
290+
# Returns True, because we ignore attributes
291+
print(g_a.like(g_b, exclude={'attributes'}))
282292
#> True
283293

284-
print(g_a.like(g_a.to_zarr(store, path='g_a'))) # g_a is like its zarr.Group counterpart
294+
# g_a is like its zarr.Group counterpart
295+
print(g_a.like(g_a.to_zarr(store, path='g_a')))
285296
#> True
286297
```
287298

288299
## Using generic types
289300

290-
The following examples demonstrate how to specialize `GroupSpec` and `ArraySpec` with type parameters. By specializing `GroupSpec` or `ArraySpec` in this way, python type checkers and Pydantic can type-check elements of a Zarr hierarchy.
301+
This example shows how to specialize `GroupSpec` and `ArraySpec` with type parameters. By specializing `GroupSpec` or `ArraySpec` in this way, python type checkers and Pydantic can type-check elements of a Zarr hierarchy.
291302

292303
```python
293304
import sys
@@ -324,7 +335,7 @@ print(SpecificAttrsGroup(attributes={'a': 100, 'b': 100}))
324335
#> zarr_version=2 attributes={'a': 100, 'b': 100} members={}
325336

326337
# a Zarr group that only contains arrays -- no subgroups!
327-
# we re-use the Tattributes type variable defined in pydantic_zarr.core
338+
# we re-use the TAttr type variable defined in pydantic_zarr.core
328339
ArraysOnlyGroup = GroupSpec[TAttr, ArraySpec]
329340

330341
try:

poetry.lock

Lines changed: 16 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ python = "^3.9"
1111
zarr = "^2.14.2"
1212
pydantic = "^2.0.0"
1313
typing-extensions = {version = "^4.7.1", python = "<3.12"}
14+
eval-type-backport = "^0.1.3"
1415

1516
[tool.poetry.group.dev.dependencies]
1617
pytest = "^7.3.1"

src/pydantic_zarr/core.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
from __future__ import annotations
2+
from typing_extensions import TypeAlias
23
from typing import (
34
Any,
45
Dict,
56
Literal,
67
Mapping,
78
Set,
8-
TypeAlias,
99
Union,
1010
)
1111
from pydantic import BaseModel, ConfigDict

0 commit comments

Comments
 (0)