-
Notifications
You must be signed in to change notification settings - Fork 29
Continue on v3 PR #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Continue on v3 PR #226
Conversation
This reduces the test diff
This also reduces the test diff
Change VersionedStore to FormattedStore
TODO: Fix Zarr v3 type strings
|
@lazarusA Do you know if I can easily make this PR against yours so my changes are more easy to compare? |
I think there is not need. |
|
In Zarr v3, we can have a V2 chunk-key-encoding: That is we can have a situation where the arrayvor group information is stored in zarr.json while the chunks are stored in |
|
One application is transitioning a V2 array to a V3 array. You can do this by just adding a zarr.json to an existing V2 array without moving the chunks by using a V2 chunk-key-encoding in the zarr json. There are also other possible chunk-key-encoding extensions. For example, I have proposed a suffix chunk-key-encoding zarr-extension where chunks may have a file extension: Chunks with file extensions are interesting because the chunks themselves could be PNG files or even (Geo)TIFFs. |
mkitti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to abstract ChunkKeyEncoding out because V3 arrays can use a V2 chunk key encoding, also there are chunk-key-encoding extensions.
| @@ -0,0 +1,27 @@ | |||
|
|
|||
| struct ChunkEncoding | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we should call this
ChunkKeyEncodingto match the name in the V3 specification:
https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/index.html - We will need either
AbstractChunkKeyEncodingor makeChunkKeyEncodingabstract. - We will need the following chunk key encodings:
- V3ChunkKeyEncoding - this is the default chunk key encoding in Zarr v3, uses dimension separator
/by default but can be. - V2ChunkKeyEncoding- this is the default chunk key encoding in v2, uses dimension separator
.by default but can be/ - SuffixChunkKeyEncoding - this adds a file extension to a base chunk encoding
- There is at least one more ChunkKeyEncoding extension I am aware of
|
|
||
| struct ChunkEncoding | ||
| sep::Char | ||
| prefix::Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix is really a property of the type of chunk-key-encoding
| default_sep(::ZarrFormat{2}) = DS2 | ||
| default_sep(::ZarrFormat{3}) = DS3 | ||
| default_prefix(::ZarrFormat{2}) = false | ||
| default_prefix(::ZarrFormat{3}) = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefixes are not a general property of chunk key encodings.
|
The other chunk-key-encoding that has bee proposed is FanOut: zarr-developers/zarr-extensions#31 |
mkitti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see better what are you trying to do with the chunk-key-encoding, but I think this is going to be problematic in the long run if we do not make this extensible.
Regarding the V3 Metadata structure, I think this is going to require much more extensive changes to get correct. It should be quite different from the V2 Metadata model since codecs can take on a tree-like structure. The sharding codec for example can have multiple nested levels. There is really no distinction between filters and compressors.
| shape::Base.RefValue{NTuple{N, Int}} | ||
| chunks::NTuple{N, Int} | ||
| dtype::String # data_type in v3 | ||
| compressor::C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to implement V3 metadata properly, we could probably remove this and filters and replace with Codecs.
| node_type::String | ||
| shape::Base.RefValue{NTuple{N, Int}} | ||
| chunks::NTuple{N, Int} | ||
| dtype::String # data_type in v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well call this data_type.
| dtype::String # data_type in v3 | ||
| compressor::C | ||
| fill_value::Union{T, Nothing} | ||
| order::Char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also remove order this is in the bytes codec now.
| compressor::C | ||
| fill_value::Union{T, Nothing} | ||
| order::Char | ||
| filters::F # not yet supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed and replaced with codecs
This is my branch based on the changes by @mkitti and @lazarusA. The main change I did is that I removed the concept of the FormattedStore again for now and rather put the task of transforming chunk Cartesian Indices into string into the hand of the calling function.
I think when I started the package it was not a good decision to put some of the Cartesian Index logic into the Storage part of the package because this is rather an encoding step and independent of the storage backend.
This is not yet super cleaned up, but I think the goal should be to have storage backend function like
getindex,setindex!etc only operate on string keys, which is why I renamed the CartesianIndex-related functions to indicate that additional information is needed to parse them.Also for helper functions like
writeattrswhich definitely I would prefer we explicitly pass the zarr version information instead of wrapping stores into v2 and v3. @mkitti could you live with such a solution? If not we could as well return to using someFormattedStoretype and attach the way chunks are encoded to the storage type, but somehow I would prefer ifAbstractStoreremained a very low-level abstraction for dealing with basic IO operations, independent of Zarr versions and encoding options.