-
Notifications
You must be signed in to change notification settings - Fork 17
Description
The current way of handling uncertainties in yadg is not optimal for the following reasons:
- we always store absolute uncertainties for each datapoint in each
data_var, sometimes at eachutsleading to a huge amount of duplicate data - the provenance of the uncertainty is unclear: how is it determined? Is it the
str->floatrounding error? Is it theint->floatscale factor? Is it from instrument spec sheets? - other properties of the uncertainty (what is the distribution - normal, rectangular?; what is the coverage factor?)
There is very little prior art on how to systematically annotate this:
- The NetCDF CF Metadata Standards propose annotating the nominal value as follows:
The only additional piece of metadata discussed is a
float q(time) ; q:standard_name = "specific_humidity" ; q:units = "g/g" ; q:ancillary_variables = "q_error_limit q_detection_limit" ; float q_error_limit(time) q_error_limit:standard_name = "specific_humidity standard_error" ; q_error_limit:units = "g/g" ;standard_error_multiplierattribute of the standard error ancillary variable (hereq_error_limit), which is the coverage factor. - In NeXus, at NIAC2014 the following uncertainty proposal was not accepted:
The
NXroot NXentry NXdata @signal=“data” @data_axes=“xy” @data_uncertainty=“esd” @esd_uncertainty_components=“esd_uncertainties” data: float[300, 300] xy: float[300, 300] esd: float[300, 300] esd_uncertainties:NXuncertainty electronic : float[300, 300] @basis=“Johnson`` ``noise” counting_statistics: float[300, 300] @basis=“shot`` ``noise” secondary_standard: float[300, 300] @basis=“esd”NXuncertaintyclass definition is (no longer?) available. - Also in NeXus, the
NXaberrationandNXemdefinitions may contain anuncertainty NX_FLOATanduncertainty_model NX_CHARattributes. - The FAIRMAT NeXus reserve the
_errorssuffix for uncertainty data. Note that "The dimensions of theFIELDNAME_errorsfield must match the dimensions of the correspondingFIELDNAME field." - In the h5rdmtoolbox, an uncertainty dataset can be attached to its parent via
field.ancillary_datasets, but there is no further convention.
Since there's no standard way of annotating uncertainty metadata in either NetCDF or NeXus, it looks like we'll have to roll our own:
-
annotate the
nominalvariable using itsancillary_variables; the name of the ancillary is not strictly regulated, butuncertaintyis preferred toerror:float val(uts) ; val.units = "..." val.ancillary_variables = "val_uncertainty ..."As the space
" "character is used as separator in theancillary_variablesfield, nominal variables with whitespace in their names have to be clobbered. -
annotate the
uncertaintyvariable to indicate it's an uncertainty using the NetCDF conventions, attach other metadata:float val_uncertainty(...) ; val_uncertainty.units = "..." val_uncertainty.standard_name = "val standard_error" val_uncertainty.standard_error_multiplier = 1 val_uncertainty.comment = "..." val_uncertainty.references = "..." val_uncertainty.yadg_uncertainty_absolute = {0, 1} val_uncertainty.yadg_uncertainty_distribution = {"normal", "rectangular", ...} val_uncertainty.yadg_uncertainty_source = {"sigfig", "scaling", "datasheets", "explicit", ...}Here, we introduce the following three yadg-specific metadata fields:
- The
yadg_uncertainty_absoluteindicates whether the uncertainty is absolute (1) or relative (0). If the uncertainty is relative, theval.unitsshould de dimensionless (%, ppm or similar). - The
yadg_uncertainty_distributionindicates whether the underlying distribution is normal (most common), rectangular (e.g. from rounding); further options to be defined later as necessary. - The
yadg_uncertainty_sourceindicates the origin of the uncertainty, wheresigfigmeansstr->floatconversion,scalingmeansint->floatconversion,datasheetsmeans from datasheets programmed into yadg, andexplicitmeans explicitly specified in the source data
The other "standard" NetCDF metadata has its usual meaning. The
commentandreferencesfield in particular can be used to provide more information about where the uncertainty determination comes from. - The