-
Notifications
You must be signed in to change notification settings - Fork 12
Check sharding bounds #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check sharding bounds #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should also be checked in ArrayMetadata? Imaginge someone uses teh ArrayMetadata constructure directly or creates a faulty zarr.json that zarr-java reads.
Good idea @normanrz, I moved the checks into the ArrayMetadata constructor. Then, they will be called in all three cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Maybe as a follow-up, you could go through some of the other validations we have in zarr-python:
https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/codecs/pipeline.py#L471
https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/codecs/transpose.py#L49
https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/core/metadata.py#L213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brokkoli71, as we are exploring how user would specific shard sizes via options, it is really useful that the ArrayMetadataBuilder
API include some validation of the various chunk sizes.
In addition to the test case introduced in this PR, can I also offer sbesson@bf3f08f which should test the validation in the non-sharding scenario. The first two parametrized tests should pass with the current HEAD of zarr-java (as per the mismatching number of dimensions) and the two additional tests are now covered by the extra checks introduced in this PR
Add unit test for invalid chunk bounds
After including the latest release for a round of functional testing, I realised that the containment check introduced in this PR enforces that for every dimension
This constraint is critical when writing multiscale groups as defined in the OME-NGFF specification where the shape of each array is reduced alongside 2 or more dimensions and the chunk/shard shape might be kept constant. Taking the concrete example of one of the Zarr dataset converted as part of the OME NGFF 2024 challenge using zarr-python, the 11th resolution of the multiscale group has the following array metadata {
"chunk_grid": {
"configuration": {
"chunk_shape": [
1,
1,
1,
4096,
4096
]
},
"name": "regular"
},
"chunk_key_encoding": {
"name": "default"
},
"codecs": [
{
"configuration": {
"chunk_shape": [
1,
1,
1,
1024,
1024
],
"codecs": [
{
"configuration": {
"endian": "little"
},
"name": "bytes"
},
{
"configuration": {
"blocksize": 0,
"clevel": 5,
"cname": "zstd",
"shuffle": "shuffle",
"typesize": 2
},
"name": "blosc"
}
],
"index_codecs": [
{
"configuration": {
"endian": "little"
},
"name": "bytes"
},
{
"name": "crc32c"
}
]
},
"name": "sharding_indexed"
}
],
"data_type": "uint16",
"dimension_names": [
"t",
"c",
"z",
"y",
"x"
],
"fill_value": 0,
"node_type": "array",
"shape": [
1,
1,
1,
91,
141
],
"zarr_format": 3
} With this change, such metadata cannot be written using zarr-java. This feels at odds with the Zarr v3 specification where I see two main constraints around shape sizes:
|
I think that constraint is too strict. The array size is independent of the chunk size (except that they need to have the same number of dimensions). For sharding, inner chunk sizes need to evenly divide the outer chunk sizes, which implies inner_chunk[i] <= outer_chunk[i]. |
fixes #5
this PR implements checks if the dataset shape, shard shape, (nested shard shapes) and chunk shape are compatible with each other