Problem
Input data with scale_factor/add_offset attributes (zarr v2 format) loses encoding during conversion, resulting in uint16 → float64 data type promotion.
Root Cause
- Original data in Zarr sample service has
scale_factor: 0.0001, add_offset: -0.1, dtype: "<u2" (uint16)
- Conversion process doesn't detect these v2 attributes to create zarr v3
numcodecs.fixedscaleoffset codec
- Encoding propagation in
create_measurements_encoding() overwrites with simple compressor, losing scale/offset configuration
Expected Behavior
Automatically convert zarr v2 scale/offset attributes to zarr v3 FixedScaleOffset codec:
numcodecs.zarr3.FixedScaleOffset(
offset=-0.1,
scale=10000, # 1/0.0001
dtype='uint16',
astype='uint16'
)
Files Affected
src/eopf_geozarr/s2_optimization/s2_multiscale.py (lines 319-329)
src/eopf_geozarr/conversion/geozarr.py (encoding functions)
Impact
- Data type inflation (uint16 → float64)
- Loss of compression efficiency
- Incorrect data representation in output zarr v3 files
Notes
- for some reasons, xarray decodes the data properly, probably because it reads the attributes and apply the V2 conversion on V3.
cc @vincentsarago
Problem
Input data with
scale_factor/add_offsetattributes (zarr v2 format) loses encoding during conversion, resulting in uint16 → float64 data type promotion.Root Cause
scale_factor: 0.0001,add_offset: -0.1,dtype: "<u2"(uint16)numcodecs.fixedscaleoffsetcodeccreate_measurements_encoding()overwrites with simple compressor, losing scale/offset configurationExpected Behavior
Automatically convert zarr v2 scale/offset attributes to zarr v3 FixedScaleOffset codec:
Files Affected
src/eopf_geozarr/s2_optimization/s2_multiscale.py(lines 319-329)src/eopf_geozarr/conversion/geozarr.py(encoding functions)Impact
Notes
cc @vincentsarago