Skip to content

additional fields in codec metadata #23

@d-v-b

Description

@d-v-b

The following parameters are currently exposed in codec metadata, i.e., the JSON form of the codec when it is written to zarr.json / .zarray:

Current metadata

field type required? notes
zero_level int yes the value in the source data that corresponds to 0 detections
conversion_gain float yes the intensity of a single observed event in the source data

"true" metadata

But this codec takes more parameters than that. Here are the remaining parameters that fully specify the codec:

field type required? notes
decoded_dtype Zarr data type identifier probably not the data type of the input data. Unclear if this is really needed, because it an be inferred from the type of the input array, and the codec can contain logic that validates certain constraints (e.g., raising an error if an incompatible data type is used). The Zarr V3 codec API defines a procedure a codec can use for statically checking its compatibility with an artbirary ndarray, so maybe we could put that logic there
encoded_dtype Zarr data type identifier yes the data type of the array generated by the encode operation. This controls the amount of quantization. Right now this is hard-coded to uint8, but conceivably users might want to control this, but I don't have an intuition for that.
input_max int, default is 32767 unclear the max value in the input. used in make_anscombe_lookup. This depends on the decoded_dtype parameter, because for integer inputs, input_max can't be larger than the largest value in the range of values defined for decoded_dtype.
beta float, [0, 1] inclusive yes used to generate the lookup table from input values -> ancombed values. I don't have an intuition for this parameter yet, but I just know that it's data-dependent, and necessary for defining the lookup table, and thus effectively part of the codec configuration.

key questions

  • what would break if we don't use decoded_dtype as a parameter to encode?
  • should users specify the encoded_dtype?
  • is input_max dependent on the maximum observed value in the user's data, or the maximum possible value, given the data type?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions