Skip to content

Implement fixedscaleoffset codec#312

Open
kleinschmidt wants to merge 5 commits intomanzt:mainfrom
kleinschmidt:dfk/decode-fixedoffsetscale
Open

Implement fixedscaleoffset codec#312
kleinschmidt wants to merge 5 commits intomanzt:mainfrom
kleinschmidt:dfk/decode-fixedoffsetscale

Conversation

@kleinschmidt
Copy link
Copy Markdown

@kleinschmidt kleinschmidt commented Oct 31, 2025

This implementation is my (i.e. a TypeScript non-knower's) attempt to pattern match the
other codecs here along with @manzt's suggestion in
manzt/numcodecs.js#49 (comment).

One thing I noticed is that the ArrayArrayCodec type has a single type
parameter (I think?). Does that imply that all array-array codecs must output
the same type that they accept as input? If so, I think that's probably too
restrictive; the python fixedscaleoffset codec explicitly supports encoding to a
different type than the input uses, and my intended use case is to decode
int16-quantized floats.


@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Oct 31, 2025

⚠️ No Changeset found

Latest commit: 6e0129b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Comment on lines +7 to +8
dtype: string;
astype?: string;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr v3 data types can be a string or a JSON object with type {name: string, configuration: object}. See an example here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one i think needs to be restricted to number types; can numeric types also have that format? is that how endianness is stored?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see in the spec:

Each data type is associated with an identifier, which can be used in metadata documents to refer to the data type. For the data types defined in this specification, the identifier is a simple ASCII string. However, extensions may use any JSON value to identify a data type.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and endianness is specified as a bytes codec, I'm inferring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the bytes codec sets the endianness of the encoded data, but for decoded data, endianness is up to the implementation

@kleinschmidt kleinschmidt changed the title Implement decode-only fixedscaleoffset codec Implement fixedscaleoffset codec Nov 1, 2025
#TypedArrayOut: TypedArrayConstructor<A>

constructor(configuration: FixedScaleOffsetConfig) {
const { data_type } = coerce_dtype(configuration.dtype);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing on v3 has made me realize that coerce_dtype is really only meant for v2 data type strings, so this will need to be adjusted.

@kleinschmidt
Copy link
Copy Markdown
Author

I think resolving the v2 vs. v3 issues may be trickier than I'm prepared to take on right now. Would you be willing to accept v2-only support @manzt ?

The numcodecs.zarr3 python functions generate a different ID (numcodecs.fixedscaleoffset vs. fixedscaleoffset in v2) so leaving this as-is will simply error.

@emmanuelmathot
Copy link
Copy Markdown

Very interested by this codec. Also for V3. Any progress on merging this one?
cc @ahocevar

@d-v-b
Copy link
Copy Markdown

d-v-b commented Apr 1, 2026

the fixedscaleoffset codec should not be used for zarr v3 data because it does not define how values from one data type should be cast to values of another data type. the old zarr v2 fixedscaleoffset codec basically just used numpy's casting behavior, but that's not really workable in zarr v3 because zarr v3 has a formal data type model, and we don't want to be dependent on the runtime behavior of numpy for something that spans multiple programming languages.

thanks to a lot of community involvement we made a new codec in zarr-extensions called cast_value that is narrowly scoped to managing casting between ints and floats. the tl;dr is that the codec configuration contains a declaration of:

  • the target data type
  • a rounding mode
  • a mode for handling out-of-range values
  • an explicit input scalar : output scalar lookup table (necessary for non-numeric values like NaN)

The codec must halt if any value isn't covered by rounding, the out-of-range mode, or the lookup table. This, combined with the explicit lookup table, can be complicate performance in interpreted languages.

@manzt what effort is needed to get this into zarrita? the scale-offset part is really easy but the casting part might be warrant some performance considerations. I have a rust implementation here that could be compiled to wasm, but I don't know the rust + js interop story at all, so no clue if that work is worth the performance benefit.

casting is the tricky part of the scale-offset transformation. the actual scaling + offsetting is simple (but we also have a codec for that.

@d-v-b
Copy link
Copy Markdown

d-v-b commented Apr 12, 2026

@kleinschmidt are you interested in working on a zarr v3 implementation of the scale offset functionality?

@manzt
Copy link
Copy Markdown
Owner

manzt commented Apr 12, 2026

@manzt what effort is needed to get this into zarrita? the scale-offset part is really easy but the casting part might be warrant some performance considerations. I have a rust implementation here that could be compiled to wasm, but I don't know the rust + js interop story at all, so no clue if that work is worth the performance benefit.

I'd be up for adding a pure JS version in zarrita proper (at least initially). Folks can dynamically swap in codecs with zarr.registry.set, which I think satisfy the case where one wants a faster implementation (at the cost of a much larger bundle size with WASM).

@kleinschmidt
Copy link
Copy Markdown
Author

@kleinschmidt are you interested in working on a zarr v3 implementation of the scale offset functionality?

It's not currently a priority for us at the moment so not for the foreseeable future.

@d-v-b
Copy link
Copy Markdown

d-v-b commented Apr 12, 2026

sounds good, I'm happy to take this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants