Conversation
|
| dtype: string; | ||
| astype?: string; |
There was a problem hiding this comment.
zarr v3 data types can be a string or a JSON object with type {name: string, configuration: object}. See an example here
There was a problem hiding this comment.
this one i think needs to be restricted to number types; can numeric types also have that format? is that how endianness is stored?
There was a problem hiding this comment.
ah I see in the spec:
Each data type is associated with an identifier, which can be used in metadata documents to refer to the data type. For the data types defined in this specification, the identifier is a simple ASCII string. However, extensions may use any JSON value to identify a data type.
There was a problem hiding this comment.
and endianness is specified as a bytes codec, I'm inferring.
There was a problem hiding this comment.
yeah the bytes codec sets the endianness of the encoded data, but for decoded data, endianness is up to the implementation
| #TypedArrayOut: TypedArrayConstructor<A> | ||
|
|
||
| constructor(configuration: FixedScaleOffsetConfig) { | ||
| const { data_type } = coerce_dtype(configuration.dtype); |
There was a problem hiding this comment.
testing on v3 has made me realize that coerce_dtype is really only meant for v2 data type strings, so this will need to be adjusted.
|
I think resolving the v2 vs. v3 issues may be trickier than I'm prepared to take on right now. Would you be willing to accept v2-only support @manzt ? The |
|
Very interested by this codec. Also for V3. Any progress on merging this one? |
02099c7 to
1facac0
Compare
|
the thanks to a lot of community involvement we made a new codec in zarr-extensions called
The codec must halt if any value isn't covered by rounding, the out-of-range mode, or the lookup table. This, combined with the explicit lookup table, can be complicate performance in interpreted languages. @manzt what effort is needed to get this into zarrita? the scale-offset part is really easy but the casting part might be warrant some performance considerations. I have a rust implementation here that could be compiled to wasm, but I don't know the rust + js interop story at all, so no clue if that work is worth the performance benefit. casting is the tricky part of the scale-offset transformation. the actual scaling + offsetting is simple (but we also have a codec for that. |
|
@kleinschmidt are you interested in working on a zarr v3 implementation of the scale offset functionality? |
I'd be up for adding a pure JS version in zarrita proper (at least initially). Folks can dynamically swap in codecs with |
It's not currently a priority for us at the moment so not for the foreseeable future. |
|
sounds good, I'm happy to take this up |
This implementation is my (i.e. a TypeScript non-knower's) attempt to pattern match the
other codecs here along with @manzt's suggestion in
manzt/numcodecs.js#49 (comment).
One thing I noticed is that the
ArrayArrayCodectype has a single typeparameter (I think?). Does that imply that all array-array codecs must output
the same type that they accept as input? If so, I think that's probably too
restrictive; the python fixedscaleoffset codec explicitly supports encoding to a
different type than the input uses, and my intended use case is to decode
int16-quantized floats.