Skip to content

Categorical & continuous sampling methods #17

@alpha-beta-soup

Description

@alpha-beta-soup

Mechenich, M.F., Žliobaitė, I. Eco-ISEA3H, a machine learning ready spatial database for ecometric and species distribution modeling. Sci Data 10, 77 (2023). https://doi.org/10.1038/s41597-023-01966-x

This paper has details of various sampling strategies employed for indexing raster data.

Categorical

  • Centroid: record the categorical variable occuring at each cell centroid. Nulls are carried over.
  • Fraction: record the proportion of each cell's area covered by each categorical value. There would be a fraction attribute for each class for each cell. (A sparse data structure could help manage this.)
  • Mode: as it suggestes on the tin; but a null value used in cases where fraction attributes sum to less than 0.2 of the cell's area. (I think this is probably wrong; it leads to data loss for cells on the edge of nodata areas. Perhaps there should be a switch for whether null should be a valid modal value; or to give a threshold like 0.2 as a parameter.)

Continuous

  • Centroid: as above.
  • Mean: area-weighted arithmetic mean. The authors are careful to do the conversion operations in the native coordinate reference system. For data in authalic coordinate reference systems, the area-weighted mean is the simple mean. But for data in WGS84, they calculate the size of each pixel and use that as a weight when calculating the mean. See VRT warping method causing spatial inconsistencies.  #14 for other discussion on how we handle reprojection issues currently; it may need revision.

This issue should be closed when this tool is capable of reproducing all of these cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions