Add default compressors to config#2470
Conversation
|
I am a little confused: in which cases should we use VLenBytes and in which cases VLenUTF8? Or do we need both at the same time? |
|
Also, do we need a default compressor or are default filters sufficient? |
# Conflicts: # tests/test_v2.py
jhamman
left a comment
There was a problem hiding this comment.
Thanks for working on this @brokkoli71. I think @normanrz is also going to review this but I wanted to also bring up an additional point.
In Zarr2 we told people to set zarr.storage.default_compressor = SomeCompressor() This was simple but also an odd way to manage config. What we have now is much better. However, I wonder if we should do something to catch folks trying do set the default_compressor variable. Thoughts?
dstansby
left a comment
There was a problem hiding this comment.
Thanks! Couple of things:
- Print statements need removing
- The config naming could probably be improved
This would also be a good oppurtunity to update or add user facing documentation on what default compressors are used for what types of data - is that something you could add in this PR?
|
I think we should also have defaults for v3: |
zstd isn't in the spec yet: zarr-developers/zarr-specs#256 |
|
thanks for your feedback, i will integrate it this week 👍🏼 |
normanrz
left a comment
There was a problem hiding this comment.
Thanks! Could you please go through the docstrings again and add some info about the default compressor, filters, codecs? Then, I think this is good to go.
dstansby
left a comment
There was a problem hiding this comment.
I've left lots of small requests for changes, mainly around the docstrings - in general they are great and clear, but I think worth fixing a lot of the little issues while we're here.
I left most of the docstring comments in asynchronous.py, but they also apply to the other files that have updated docstrintgs.
src/zarr/api/asynchronous.py
Outdated
| this collection specify the transformation from array values to stored bytes. | ||
| V3 only. V2 arrays should use `filters` and `compressor` instead. | ||
| If no codecs are provided, default codecs will be used: | ||
| - For numeric arrays, the default is `BytesCodec` and `ZstdCodec`. |
There was a problem hiding this comment.
| - For numeric arrays, the default is `BytesCodec` and `ZstdCodec`. | |
| - For numeric arrays, the default is `BytesCodec` and `ZstdCodec`. |
Can we also document the default compression level (and any other parameters) here?
Co-authored-by: David Stansby <dstansby@gmail.com>
Head branch was pushed to by a user without write access
# Conflicts: # tests/test_config.py
|
is there a reason why none of the default compressors / codecs have a configuration? |
|
and a second question, why aren't strings / bytes compressed with |
This PR adds:
zarr_format=2zarr.configfixes #2267
Should
_get_default_array_bytes_codecforzarr_format=3also be configurable inzarr.config?TODO: