Skip to content

Implement lossy compression via quantization #330

@davidhassell

Description

@davidhassell

Lossy compression via quantization was introduced in CF-1.12 (section 8.4). We need to implement this in cfdm.

Requirements

  1. Read quantization metadata from a dataset, store it in memory, and write it back out to disk

  2. Allow quantization metadata stored in memory to be modified.

    • But this should not be too easy, because changing the metadata will not change the actual quantization of the data. The use case is correcting mistakes in the quantization metadata.
  3. Quantize data during the process of writing to disk, creating the correct quantization metadata in the output dataset.

    • But only if the data are not already quantized, as determined by the absense of quantization metadata. We should not allow the quantization of quantized data.

Proposal

  • A new cfdm.Quantization class to store quantization metadata.
  • New methods to get/set/del quantization metadata:
    • get_quantization, _set_quantization, _del_quantization. The set/del methods are underscored to help warn that their use will not result in a change in the data, and may result in the quantization becoming inconsistent with the data.
  • New methods to get/set/del an instruction to quantize the data when writing it to disk with cfdm.write. These methods do nothing until the time of writing:
    • get_quantize_on_write, set_quantize_on_write, del_quantize_on_write. These methods do not need to be underscored, since there is no risk of inconsistencies.
    • If quantization metadata already exists then set_quantize_on_write will fail.
    • If a quantize-on-write instruction already exists then _set_quantization will fail.
  • The actual quantization of data is handled trivially (from our point of view) by the netCDF4 Python library.
    • I notice that netCDF4 Python can not yet handle the CF-allowed DigitRound quantization. I presume that this is because netCDF-C doesn't handle it, but I haven't checked on this.

PR to follow imminently.

(Pinging @czender, just so you're aware of other implementations out there)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions