Skip to content

Conversation

@meggart
Copy link
Collaborator

@meggart meggart commented Nov 28, 2025

This is my branch based on the changes by @mkitti and @lazarusA. The main change I did is that I removed the concept of the FormattedStore again for now and rather put the task of transforming chunk Cartesian Indices into string into the hand of the calling function.

I think when I started the package it was not a good decision to put some of the Cartesian Index logic into the Storage part of the package because this is rather an encoding step and independent of the storage backend.

This is not yet super cleaned up, but I think the goal should be to have storage backend function like getindex, setindex! etc only operate on string keys, which is why I renamed the CartesianIndex-related functions to indicate that additional information is needed to parse them.

Also for helper functions like writeattrs which definitely I would prefer we explicitly pass the zarr version information instead of wrapping stores into v2 and v3. @mkitti could you live with such a solution? If not we could as well return to using some FormattedStore type and attach the way chunks are encoded to the storage type, but somehow I would prefer if AbstractStore remained a very low-level abstraction for dealing with basic IO operations, independent of Zarr versions and encoding options.

mkitti and others added 30 commits March 11, 2025 22:11
This reduces the test diff
This also reduces the test diff
Change VersionedStore to FormattedStore
TODO: Fix Zarr v3 type strings
@meggart
Copy link
Collaborator Author

meggart commented Nov 28, 2025

@lazarusA Do you know if I can easily make this PR against yours so my changes are more easy to compare?

@lazarusA
Copy link
Contributor

@lazarusA Do you know if I can easily make this PR against yours so my changes are more easy to compare?

I think there is not need. git diffs were enough for me to follow what you did.

@mkitti
Copy link
Member

mkitti commented Nov 28, 2025

In Zarr v3, we can have a V2 chunk-key-encoding:
https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/v2/index.html

That is we can have a situation where the arrayvor group information is stored in zarr.json while the chunks are stored in 0.0.0 instead of c/0/0/0. We cannot derive this from the zarr format alone. The chunk-key-encoding is independent from the zarr format.

@mkitti
Copy link
Member

mkitti commented Nov 28, 2025

One application is transitioning a V2 array to a V3 array. You can do this by just adding a zarr.json to an existing V2 array without moving the chunks by using a V2 chunk-key-encoding in the zarr json.

There are also other possible chunk-key-encoding extensions. For example, I have proposed a suffix chunk-key-encoding zarr-extension where chunks may have a file extension:
zarr-developers/zarr-extensions#28

Chunks with file extensions are interesting because the chunks themselves could be PNG files or even (Geo)TIFFs.

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to abstract ChunkKeyEncoding out because V3 arrays can use a V2 chunk key encoding, also there are chunk-key-encoding extensions.

@@ -0,0 +1,27 @@

struct ChunkEncoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should call this ChunkKeyEncoding to match the name in the V3 specification:
    https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/index.html
  2. We will need either AbstractChunkKeyEncoding or make ChunkKeyEncoding abstract.
  3. We will need the following chunk key encodings:
  • V3ChunkKeyEncoding - this is the default chunk key encoding in Zarr v3, uses dimension separator / by default but can be .
  • V2ChunkKeyEncoding- this is the default chunk key encoding in v2, uses dimension separator . by default but can be /
  • SuffixChunkKeyEncoding - this adds a file extension to a base chunk encoding
  • There is at least one more ChunkKeyEncoding extension I am aware of


struct ChunkEncoding
sep::Char
prefix::Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix is really a property of the type of chunk-key-encoding

default_sep(::ZarrFormat{2}) = DS2
default_sep(::ZarrFormat{3}) = DS3
default_prefix(::ZarrFormat{2}) = false
default_prefix(::ZarrFormat{3}) = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixes are not a general property of chunk key encodings.

@mkitti
Copy link
Member

mkitti commented Nov 28, 2025

The other chunk-key-encoding that has bee proposed is FanOut: zarr-developers/zarr-extensions#31

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see better what are you trying to do with the chunk-key-encoding, but I think this is going to be problematic in the long run if we do not make this extensible.

Regarding the V3 Metadata structure, I think this is going to require much more extensive changes to get correct. It should be quite different from the V2 Metadata model since codecs can take on a tree-like structure. The sharding codec for example can have multiple nested levels. There is really no distinction between filters and compressors.

shape::Base.RefValue{NTuple{N, Int}}
chunks::NTuple{N, Int}
dtype::String # data_type in v3
compressor::C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to implement V3 metadata properly, we could probably remove this and filters and replace with Codecs.

node_type::String
shape::Base.RefValue{NTuple{N, Int}}
chunks::NTuple{N, Int}
dtype::String # data_type in v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well call this data_type.

dtype::String # data_type in v3
compressor::C
fill_value::Union{T, Nothing}
order::Char
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also remove order this is in the bytes codec now.

compressor::C
fill_value::Union{T, Nothing}
order::Char
filters::F # not yet supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed and replaced with codecs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants