Skip to content

Raster semantics × transfer operators #50

@alpha-beta-soup

Description

@alpha-beta-soup

I've spent some time tabulating various raster data models and how these can apply comprehensively to a conversion tool. Supporting a full range will require major adustmets to the CLI, but would massively improve the accuracy, correctness, explicitness and applicability of the tool.

It should be possible to do this incrementally.

The present behaviour, expressed in the proposed terms is:

  • --semantics point_center_strict
  • --transfer assign_centres
  • --agg <stat> (when applicable)
  • --out value

That this presently produces "gaps" between sample points can be avoided by upsampling with interpolation, including nearest neighbour.

Raster→DGGS semantics × transfer operators

Core CLI

  • --semantics <name> : what a raster cell value means
  • --transfer <name> : how values are mapped to DGGS cells
  • --out <schema> : output schema (scalar value vs fractions/histogram/etc.)
  • --agg <stat> : aggregation/statistic (when applicable)
  • --area-model ... : how areas/overlaps are computed (when applicable)
  • --nodata-* / --valid-coverage-threshold : nodata + partial coverage behavior

Output schemas (--out)

  • value (default): one scalar value per DGGS cell.
  • fractions: convenience output for categorical rasters: emits area fractions per class for each DGGS cell.
    • Conceptually this is a special case / alias of histogram with: categorical bins = classes, weighting = overlap area, and typically --normalize-by target_area.
    • Useful defaults/companions: --classes {auto,<list>}, --sparse true, --nodata-is-class {true,false}, --normalize-by {target_area,valid_overlap}.
  • histogram: general “distribution” output per DGGS cell. Can represent:
    • categorical histogram: per-class area fractions or counts, and/or
    • numeric histogram: values binned into ranges for continuous rasters (e.g., elevation bins).
    • Typical companion flags:
      • --histogram-type {categorical,numeric}
      • if numeric: --bins <e0,e1,...,en> or --bin-width <w> --bin-origin <x0>
      • --hist-weight {area,count} (often area for overlay_*, count for sample_*)
      • --normalize-by {target_area,valid_overlap} when hist-weight=area
  • list: emit all contributing samples/values (primarily for point_center_strict coarsening workflows).

Equivalence note:
--out fractions is equivalent to something like:
--out histogram --histogram-type categorical --hist-weight area --normalize-by target_area
(keeping fractions is mainly for usability/readability).

Area/overlap policy flags (used by overlay_* and mass_preserve)

  • --area-model {planar,geodesic,sphere}
  • --area-crs {native,EPSG:XXXX,auto_equal_area} (when planar)
  • --ellipsoid {WGS84,GRS80} (when geodesic)
  • --sphere {authalic,radius:<m>} (when sphere)
  • --normalize-by {valid_overlap,target_area}
  • --valid-coverage-threshold <0..1>
  • --nodata-is-class {true,false} (categorical overlays)

Note

The following table is made with assistance from an LLM and may carry a few errors I haven't yet picked up.

Semantics × transfer/operator matrix

Legend:

  • appropriate/common,
  • possible (caveats)
  • inappropriate (breaks semantics)
Raster semantics (--semantics) Concise meaning Examples assign_centers (--transfer assign_centers) sample_nn (--transfer sample_nn) sample_interp (--transfer sample_interp --interp ...) overlay_weighted (--transfer overlay_weighted) overlay_mode (--transfer overlay_mode) mass_preserve (--transfer mass_preserve) Area policy needed? Typical --out / --agg Recommended defaults (incl. area & nodata)
point_center_strict Value applies only at pixel centre; no implied value elsewhere “Observation grid” / samples on a lattice; conceptual sensor samples ✓ (centre→DGGS binning; gaps expected if DGGS finer) No --out value; if DGGS coarser: --agg {mean|min|max|median|mode|list} --transfer assign_centers --out value; coarser DGGS: --agg list or a user-chosen stat; --nodata-policy {skip|emit}
point_sample_field Samples of a continuous field (reconstructable) DEM (often treated this way), modeled temperature/pressure surfaces ✓ (preferred) △ (changes support to areal) Only if using overlay --out value; --agg mean etc. when coarsening --transfer sample_interp --interp bilinear --out value; if overlay chosen: require user acknowledgement or switch semantics
cell_average Value is average over pixel area (block support) Climate “mean over grid cell”, averaged concentration grids ✓ (preferred) Yes --out value --agg mean --transfer overlay_weighted --agg mean --normalize-by valid_overlap; if raster CRS is geographic: --area-model geodesic --ellipsoid WGS84; else --area-model planar --area-crs native; optional --valid-coverage-threshold
piecewise_constant Uniform value across pixel area (“pixel-as-polygon”), often categorical Land cover class, soil class, zone IDs, masks ✓ (preferred for single class) Yes (for overlay*) Classes: --out value --agg mode OR --out fractions; Numeric: --agg mean Eco‑ISEA3H categorical centroid (target-centre): --transfer sample_nn --out value (carries nulls). Eco‑ISEA3H fraction: --transfer overlay_weighted --out fractions --normalize-by target_area. Eco‑ISEA3H mode: --transfer overlay_mode --out value --nodata-is-class false --valid-coverage-threshold 0.2 (threshold configurable). Area model: geodesic (lon/lat) else planar.
fraction_cover Value is a proportion of class/material in pixel % tree cover, % impervious, fractional water/snow ✓ (preferred) Yes --out value --agg mean (or --out histogram) --transfer overlay_weighted --agg mean --normalize-by valid_overlap; area model geodesic for lon/lat; optional --valid-coverage-threshold
count_total Value is a total in the cell (extensive; must conserve sums) Population count per cell, emissions totals, incident counts △ (only if it truly redistributes totals; otherwise wrong) ✓ (preferred) Yes --out value --agg sum --transfer mass_preserve --agg sum; area model geodesic (lon/lat) else planar; --valid-coverage-threshold often 0 (but consider AOI masking rules)
density Per-area intensity (intensive) People/km², biomass density, W/m², rainfall rate ✓ (preferred when density is an areal average) △ (only if converting density↔count explicitly using area) Yes (for overlay/mass) --out value --agg mean --transfer overlay_weighted --agg mean --normalize-by valid_overlap; if user requests mass-preserve, require explicit conversion mode: --convert density_to_count etc.
event_indicator Presence/absence or event count binned to cells Fire detected (0/1), lightning presence; event counts △ (presence variant only) △ (presence) / ✗ (counts) ✓ (counts) Yes (for overlay/mass) Presence: --out value (any/all) or fractions; Counts: --agg sum Presence: overlay_mode or overlay_weighted --out fractions + thresholding; Counts: mass_preserve --agg sum

Transfer operator definitions (CLI)

  • --transfer assign_centers
    For each raster pixel, compute its pixel centre coordinate → DGGS index; emit one value per input pixel. Produces a sparse DGGS result (gaps) when DGGS resolution is finer.

  • --transfer sample_nn
    For each DGGS cell, sample raster at the DGGS cell centre using nearest-neighbour.

  • --transfer sample_interp --interp {bilinear,cubic,lanczos,...}
    For each DGGS cell, sample raster at DGGS centre with interpolation (continuous fields).

  • --transfer overlay_weighted
    For each DGGS cell, compute overlap-weighted outputs (means, fractions, histograms). Requires --area-model ....

  • --transfer overlay_mode
    For each DGGS cell, compute class with maximum overlap area. Typically paired with --valid-coverage-threshold and --nodata-is-class. Requires --area-model ....

  • --transfer mass_preserve
    Redistribute each raster cell total into DGGS cells proportional to overlap area (or potentially ancillary weights). Conserves sums. Requires --area-model ....


Eco‑ISEA3H strategies expressed in flags (examples from #17)

Categorical / Centroid (target-centre sampling):

  • --semantics piecewise_constant --transfer sample_nn --out value --nodata-policy emit

Categorical / Fraction:

  • --semantics piecewise_constant --transfer overlay_weighted --out fractions --normalize-by target_area --area-model geodesic

Categorical / Mode with coverage threshold (their 0.2):

  • --semantics piecewise_constant --transfer overlay_mode --out value --valid-coverage-threshold 0.2 --nodata-is-class false --area-model geodesic

Continuous / Centroid:

  • strict source-centre binning:
    --semantics point_center_strict --transfer assign_centers --out value
  • target-centre sampling (more typical “centroid sampling”):
    --semantics point_sample_field --transfer sample_interp --interp bilinear --out value

Continuous / Mean (area-weighted):

  • --semantics cell_average --transfer overlay_weighted --out value --agg mean --area-model geodesic --normalize-by valid_overlap

Originally posted by @alpha-beta-soup in #15 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions