Continue on v3 PR by meggart · Pull Request #226 · JuliaIO/Zarr.jl

meggart · 2025-11-28T10:45:54Z

This is my branch based on the changes by @mkitti and @lazarusA. The main change I did is that I removed the concept of the FormattedStore again for now and rather put the task of transforming chunk Cartesian Indices into string into the hand of the calling function.

I think when I started the package it was not a good decision to put some of the Cartesian Index logic into the Storage part of the package because this is rather an encoding step and independent of the storage backend.

This is not yet super cleaned up, but I think the goal should be to have storage backend function like getindex, setindex! etc only operate on string keys, which is why I renamed the CartesianIndex-related functions to indicate that additional information is needed to parse them.

Also for helper functions like writeattrs which definitely I would prefer we explicitly pass the zarr version information instead of wrapping stores into v2 and v3. @mkitti could you live with such a solution? If not we could as well return to using some FormattedStore type and attach the way chunks are encoded to the storage type, but somehow I would prefer if AbstractStore remained a very low-level abstraction for dealing with basic IO operations, independent of Zarr versions and encoding options.

This reduces the test diff

This also reduces the test diff

Change VersionedStore to FormattedStore

…v3-prototype

TODO: Fix Zarr v3 type strings

meggart · 2025-11-28T10:46:40Z

@lazarusA Do you know if I can easily make this PR against yours so my changes are more easy to compare?

lazarusA · 2025-11-28T10:54:37Z

@lazarusA Do you know if I can easily make this PR against yours so my changes are more easy to compare?

I think there is not need. git diffs were enough for me to follow what you did.

mkitti · 2025-11-28T13:39:51Z

In Zarr v3, we can have a V2 chunk-key-encoding:
https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/v2/index.html

That is we can have a situation where the arrayvor group information is stored in zarr.json while the chunks are stored in 0.0.0 instead of c/0/0/0. We cannot derive this from the zarr format alone. The chunk-key-encoding is independent from the zarr format.

mkitti · 2025-11-28T13:48:19Z

One application is transitioning a V2 array to a V3 array. You can do this by just adding a zarr.json to an existing V2 array without moving the chunks by using a V2 chunk-key-encoding in the zarr json.

There are also other possible chunk-key-encoding extensions. For example, I have proposed a suffix chunk-key-encoding zarr-extension where chunks may have a file extension:
zarr-developers/zarr-extensions#28

Chunks with file extensions are interesting because the chunks themselves could be PNG files or even (Geo)TIFFs.

mkitti

We need to abstract ChunkKeyEncoding out because V3 arrays can use a V2 chunk key encoding, also there are chunk-key-encoding extensions.

mkitti · 2025-11-28T13:58:10Z

src/chunkencoding.jl

@@ -0,0 +1,27 @@
+
+struct ChunkEncoding


I think we should call this ChunkKeyEncoding to match the name in the V3 specification:
https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/index.html

We will need either AbstractChunkKeyEncoding or make ChunkKeyEncoding abstract.

We will need the following chunk key encodings:

V3ChunkKeyEncoding - this is the default chunk key encoding in Zarr v3, uses dimension separator / by default but can be .

V2ChunkKeyEncoding- this is the default chunk key encoding in v2, uses dimension separator . by default but can be /

SuffixChunkKeyEncoding - this adds a file extension to a base chunk encoding

There is at least one more ChunkKeyEncoding extension I am aware of

mkitti · 2025-11-28T13:58:58Z

src/chunkencoding.jl

+
+struct ChunkEncoding
+    sep::Char
+    prefix::Bool


Prefix is really a property of the type of chunk-key-encoding

mkitti · 2025-11-28T14:01:30Z

src/chunkencoding.jl

+default_sep(::ZarrFormat{2}) = DS2
+default_sep(::ZarrFormat{3}) = DS3
+default_prefix(::ZarrFormat{2}) = false
+default_prefix(::ZarrFormat{3}) = true


Prefixes are not a general property of chunk key encodings.

mkitti · 2025-11-28T14:14:21Z

The other chunk-key-encoding that has bee proposed is FanOut: zarr-developers/zarr-extensions#31

mkitti

I think I see better what are you trying to do with the chunk-key-encoding, but I think this is going to be problematic in the long run if we do not make this extensible.

Regarding the V3 Metadata structure, I think this is going to require much more extensive changes to get correct. It should be quite different from the V2 Metadata model since codecs can take on a tree-like structure. The sharding codec for example can have multiple nested levels. There is really no distinction between filters and compressors.

mkitti · 2025-11-28T15:32:11Z

src/metadata.jl

+    shape::Base.RefValue{NTuple{N, Int}}
+    chunks::NTuple{N, Int}
+    dtype::String  # data_type in v3
+    compressor::C


If we are going to implement V3 metadata properly, we could probably remove this and filters and replace with Codecs.

mkitti · 2025-11-28T15:32:26Z

src/metadata.jl

+    node_type::String
+    shape::Base.RefValue{NTuple{N, Int}}
+    chunks::NTuple{N, Int}
+    dtype::String  # data_type in v3


Might as well call this data_type.

mkitti · 2025-11-28T15:33:05Z

src/metadata.jl

+    dtype::String  # data_type in v3
+    compressor::C
+    fill_value::Union{T, Nothing}
+    order::Char


We can also remove order this is in the bytes codec now.

mkitti · 2025-11-28T15:33:21Z

src/metadata.jl

+    compressor::C
+    fill_value::Union{T, Nothing}
+    order::Char
+    filters::F  # not yet supported


Can be removed and replaced with codecs

mkitti · 2025-12-02T21:34:15Z

Hey guys, any thoughts? Should try to meet Thursday?

mkitti · 2025-12-11T14:21:24Z

Please just merge this. It would be easier to discuss changes in more focused pull requests.

mkitti and others added 30 commits March 11, 2025 22:11

Add dimension separator as a type parameter

0345a1c

Fix ZipStore constructor

61786e3

Fix ConsolidatedStore

cbb23ce

Fix S3Store constructor

e4630a9

Add version as a type parameter

b9e175f

Check metadata for dimension_separator and zarr_format

3624376

Implement VersionStorage wrapper rather than modifying AbstractStorage

2b3bbb2

Fix ConslidatedStore wrapper around HTTP

5f35ebf

This reduces the test diff

Add getproperty forwarding from VersionedStorage

c685387

This also reduces the test diff

Add some tests for propertynames

8d5606d

Add Storage/versionstore.jl

a6fcc2b

Add VersionedStorage param change constructors

f6883f8

Add V2 chunk encoding support

3cf746d

Fix Base.UInt8 constructor for ASCIIChar

d218dc2

Add ZstdCompressor

6f722b5

fix typo

865dac7

Prototype Zarr v3 support

6d7dc21

Modify tutorial to match current storage display

b394457

Ensure configuration key exists

8e71a33

Change VersionedStore to FormattedStore

08288fd

Merge pull request #1 from mkitti/mkitti-formatted-store

5bb7358

Change VersionedStore to FormattedStore

Merge branch 'mkitti-dimension-separator-type-parameter' into mkitti-…

020b3dd

…v3-prototype

Add {get,write}attrs for FormattedStore{3}

0046e14

Add separator function for V2ChunkKeyEncoding

34afb27

Fix formattedstore, add writemetadata

514ba87

Attempt to allow for Zarr v3 array creation

4ce5895

TODO: Fix Zarr v3 type strings

Fix Zarr v3 array creation

3298a5c

Implement CRC32c Zarr v3 codec

646ba9c

Merge branch 'master' into mkitti-v3-prototype

d4217fb

Fix spelling of Evaluate in comment

07352f3

mkitti and others added 15 commits August 27, 2025 15:43

Fix default chunk_key_encoding

42b2519

Merge branch 'master' into mkitti-v3-prototype

9722a1a

dont

32da023

adds type AbstractMetadata

1df9efe

dispatch

db8a08c

fix tests

136470e

py v3 baseline

616f563

julia version

df7cbf4

claude's sharding version, debug, integrate now

f369106

offset nbytes order

c3ba31e

Merge branch 'master' into continue_v3_prototype

7913fbc

Merge branch 'master' into continue_v3_prototype

b741d64

Move chunk encoding logic away from storage but into metadata

f888a15

fix v2 tests

7f8b536

remove FormattedStore

a44fc98

mkitti requested changes Nov 28, 2025

View reviewed changes

Comments

Conversation

meggart commented Nov 28, 2025

Uh oh!

meggart commented Nov 28, 2025

Uh oh!

lazarusA commented Nov 28, 2025

Uh oh!

mkitti commented Nov 28, 2025

Uh oh!

mkitti commented Nov 28, 2025

Uh oh!

mkitti left a comment

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti commented Nov 28, 2025

Uh oh!

mkitti left a comment

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti commented Dec 2, 2025

Uh oh!

mkitti commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants