-
Notifications
You must be signed in to change notification settings - Fork 13
Closed
Labels
compressiondataset readRelating to reading datasetsRelating to reading datasetsdataset writeRelating to writing datasetsRelating to writing datasetsenhancementNew feature or requestNew feature or requestnew in CF-1.12New in CF-1.12New in CF-1.12
Milestone
Description
Lossy compression via quantization was introduced in CF-1.12 (section 8.4). We need to implement this in cfdm.
Requirements
-
Read quantization metadata from a dataset, store it in memory, and write it back out to disk
-
Allow quantization metadata stored in memory to be modified.
- But this should not be too easy, because changing the metadata will not change the actual quantization of the data. The use case is correcting mistakes in the quantization metadata.
-
Quantize data during the process of writing to disk, creating the correct quantization metadata in the output dataset.
- But only if the data are not already quantized, as determined by the absense of quantization metadata. We should not allow the quantization of quantized data.
Proposal
- A new
cfdm.Quantizationclass to store quantization metadata. - New methods to get/set/del quantization metadata:
get_quantization,_set_quantization,_del_quantization. The set/del methods are underscored to help warn that their use will not result in a change in the data, and may result in the quantization becoming inconsistent with the data.
- New methods to get/set/del an instruction to quantize the data when writing it to disk with
cfdm.write. These methods do nothing until the time of writing:get_quantize_on_write,set_quantize_on_write,del_quantize_on_write. These methods do not need to be underscored, since there is no risk of inconsistencies.- If quantization metadata already exists then
set_quantize_on_writewill fail. - If a quantize-on-write instruction already exists then
_set_quantizationwill fail.
- The actual quantization of data is handled trivially (from our point of view) by the
netCDF4Python library.- I notice that
netCDF4Python can not yet handle the CF-allowed DigitRound quantization. I presume that this is because netCDF-C doesn't handle it, but I haven't checked on this.
- I notice that
PR to follow imminently.
(Pinging @czender, just so you're aware of other implementations out there)
sadielbartholomew
Metadata
Metadata
Assignees
Labels
compressiondataset readRelating to reading datasetsRelating to reading datasetsdataset writeRelating to writing datasetsRelating to writing datasetsenhancementNew feature or requestNew feature or requestnew in CF-1.12New in CF-1.12New in CF-1.12