-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I've spent some time tabulating various raster data models and how these can apply comprehensively to a conversion tool. Supporting a full range will require major adustmets to the CLI, but would massively improve the accuracy, correctness, explicitness and applicability of the tool.
It should be possible to do this incrementally.
The present behaviour, expressed in the proposed terms is:
--semantics point_center_strict--transfer assign_centres--agg <stat>(when applicable)--out value
That this presently produces "gaps" between sample points can be avoided by upsampling with interpolation, including nearest neighbour.
Raster→DGGS semantics × transfer operators
Core CLI
--semantics <name>: what a raster cell value means--transfer <name>: how values are mapped to DGGS cells--out <schema>: output schema (scalar value vs fractions/histogram/etc.)--agg <stat>: aggregation/statistic (when applicable)--area-model ...: how areas/overlaps are computed (when applicable)--nodata-*/--valid-coverage-threshold: nodata + partial coverage behavior
Output schemas (--out)
value(default): one scalar value per DGGS cell.fractions: convenience output for categorical rasters: emits area fractions per class for each DGGS cell.- Conceptually this is a special case / alias of
histogramwith: categorical bins = classes, weighting = overlap area, and typically--normalize-by target_area. - Useful defaults/companions:
--classes {auto,<list>},--sparse true,--nodata-is-class {true,false},--normalize-by {target_area,valid_overlap}.
- Conceptually this is a special case / alias of
histogram: general “distribution” output per DGGS cell. Can represent:- categorical histogram: per-class area fractions or counts, and/or
- numeric histogram: values binned into ranges for continuous rasters (e.g., elevation bins).
- Typical companion flags:
--histogram-type {categorical,numeric}- if
numeric:--bins <e0,e1,...,en>or--bin-width <w> --bin-origin <x0> --hist-weight {area,count}(oftenareaforoverlay_*,countforsample_*)--normalize-by {target_area,valid_overlap}whenhist-weight=area
list: emit all contributing samples/values (primarily forpoint_center_strictcoarsening workflows).
Equivalence note:
--out fractions is equivalent to something like:
--out histogram --histogram-type categorical --hist-weight area --normalize-by target_area
(keeping fractions is mainly for usability/readability).
Area/overlap policy flags (used by overlay_* and mass_preserve)
--area-model {planar,geodesic,sphere}--area-crs {native,EPSG:XXXX,auto_equal_area}(whenplanar)--ellipsoid {WGS84,GRS80}(whengeodesic)--sphere {authalic,radius:<m>}(whensphere)--normalize-by {valid_overlap,target_area}--valid-coverage-threshold <0..1>--nodata-is-class {true,false}(categorical overlays)
Note
The following table is made with assistance from an LLM and may carry a few errors I haven't yet picked up.
Semantics × transfer/operator matrix
Legend:
- ✓ appropriate/common,
- △ possible (caveats)
- ✗ inappropriate (breaks semantics)
Raster semantics (--semantics) |
Concise meaning | Examples | assign_centers (--transfer assign_centers) |
sample_nn (--transfer sample_nn) |
sample_interp (--transfer sample_interp --interp ...) |
overlay_weighted (--transfer overlay_weighted) |
overlay_mode (--transfer overlay_mode) |
mass_preserve (--transfer mass_preserve) |
Area policy needed? | Typical --out / --agg |
Recommended defaults (incl. area & nodata) |
|---|---|---|---|---|---|---|---|---|---|---|---|
point_center_strict |
Value applies only at pixel centre; no implied value elsewhere | “Observation grid” / samples on a lattice; conceptual sensor samples | ✓ (centre→DGGS binning; gaps expected if DGGS finer) | ✗ | ✗ | ✗ | ✗ | ✗ | No | --out value; if DGGS coarser: --agg {mean|min|max|median|mode|list} |
--transfer assign_centers --out value; coarser DGGS: --agg list or a user-chosen stat; --nodata-policy {skip|emit} |
point_sample_field |
Samples of a continuous field (reconstructable) | DEM (often treated this way), modeled temperature/pressure surfaces | △ | ✓ | ✓ (preferred) | △ (changes support to areal) | ✗ | ✗ | Only if using overlay | --out value; --agg mean etc. when coarsening |
--transfer sample_interp --interp bilinear --out value; if overlay chosen: require user acknowledgement or switch semantics |
cell_average |
Value is average over pixel area (block support) | Climate “mean over grid cell”, averaged concentration grids | ✗ | △ | ✗ | ✓ (preferred) | ✗ | ✗ | Yes | --out value --agg mean |
--transfer overlay_weighted --agg mean --normalize-by valid_overlap; if raster CRS is geographic: --area-model geodesic --ellipsoid WGS84; else --area-model planar --area-crs native; optional --valid-coverage-threshold |
piecewise_constant |
Uniform value across pixel area (“pixel-as-polygon”), often categorical | Land cover class, soil class, zone IDs, masks | ✗ | △ | ✗ | ✓ | ✓ (preferred for single class) | ✗ | Yes (for overlay*) | Classes: --out value --agg mode OR --out fractions; Numeric: --agg mean |
Eco‑ISEA3H categorical centroid (target-centre): --transfer sample_nn --out value (carries nulls). Eco‑ISEA3H fraction: --transfer overlay_weighted --out fractions --normalize-by target_area. Eco‑ISEA3H mode: --transfer overlay_mode --out value --nodata-is-class false --valid-coverage-threshold 0.2 (threshold configurable). Area model: geodesic (lon/lat) else planar. |
fraction_cover |
Value is a proportion of class/material in pixel | % tree cover, % impervious, fractional water/snow | ✗ | △ | △ | ✓ (preferred) | ✗ | ✗ | Yes | --out value --agg mean (or --out histogram) |
--transfer overlay_weighted --agg mean --normalize-by valid_overlap; area model geodesic for lon/lat; optional --valid-coverage-threshold |
count_total |
Value is a total in the cell (extensive; must conserve sums) | Population count per cell, emissions totals, incident counts | ✗ | ✗ | ✗ | △ (only if it truly redistributes totals; otherwise wrong) | ✗ | ✓ (preferred) | Yes | --out value --agg sum |
--transfer mass_preserve --agg sum; area model geodesic (lon/lat) else planar; --valid-coverage-threshold often 0 (but consider AOI masking rules) |
density |
Per-area intensity (intensive) | People/km², biomass density, W/m², rainfall rate | ✗ | △ | △ | ✓ (preferred when density is an areal average) | ✗ | △ (only if converting density↔count explicitly using area) | Yes (for overlay/mass) | --out value --agg mean |
--transfer overlay_weighted --agg mean --normalize-by valid_overlap; if user requests mass-preserve, require explicit conversion mode: --convert density_to_count etc. |
event_indicator |
Presence/absence or event count binned to cells | Fire detected (0/1), lightning presence; event counts | △ (presence variant only) | △ (presence) / ✗ (counts) | ✗ | △ | △ | ✓ (counts) | Yes (for overlay/mass) | Presence: --out value (any/all) or fractions; Counts: --agg sum |
Presence: overlay_mode or overlay_weighted --out fractions + thresholding; Counts: mass_preserve --agg sum |
Transfer operator definitions (CLI)
-
--transfer assign_centers
For each raster pixel, compute its pixel centre coordinate → DGGS index; emit one value per input pixel. Produces a sparse DGGS result (gaps) when DGGS resolution is finer. -
--transfer sample_nn
For each DGGS cell, sample raster at the DGGS cell centre using nearest-neighbour. -
--transfer sample_interp --interp {bilinear,cubic,lanczos,...}
For each DGGS cell, sample raster at DGGS centre with interpolation (continuous fields). -
--transfer overlay_weighted
For each DGGS cell, compute overlap-weighted outputs (means, fractions, histograms). Requires--area-model .... -
--transfer overlay_mode
For each DGGS cell, compute class with maximum overlap area. Typically paired with--valid-coverage-thresholdand--nodata-is-class. Requires--area-model .... -
--transfer mass_preserve
Redistribute each raster cell total into DGGS cells proportional to overlap area (or potentially ancillary weights). Conserves sums. Requires--area-model ....
Eco‑ISEA3H strategies expressed in flags (examples from #17)
Categorical / Centroid (target-centre sampling):
--semantics piecewise_constant --transfer sample_nn --out value --nodata-policy emit
Categorical / Fraction:
--semantics piecewise_constant --transfer overlay_weighted --out fractions --normalize-by target_area --area-model geodesic
Categorical / Mode with coverage threshold (their 0.2):
--semantics piecewise_constant --transfer overlay_mode --out value --valid-coverage-threshold 0.2 --nodata-is-class false --area-model geodesic
Continuous / Centroid:
- strict source-centre binning:
--semantics point_center_strict --transfer assign_centers --out value - target-centre sampling (more typical “centroid sampling”):
--semantics point_sample_field --transfer sample_interp --interp bilinear --out value
Continuous / Mean (area-weighted):
--semantics cell_average --transfer overlay_weighted --out value --agg mean --area-model geodesic --normalize-by valid_overlap
Originally posted by @alpha-beta-soup in #15 (comment)