-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Mechenich, M.F., Žliobaitė, I. Eco-ISEA3H, a machine learning ready spatial database for ecometric and species distribution modeling. Sci Data 10, 77 (2023). https://doi.org/10.1038/s41597-023-01966-x
This paper has details of various sampling strategies employed for indexing raster data.
Categorical
- Centroid: record the categorical variable occuring at each cell centroid. Nulls are carried over.
- Fraction: record the proportion of each cell's area covered by each categorical value. There would be a fraction attribute for each class for each cell. (A sparse data structure could help manage this.)
- Mode: as it suggestes on the tin; but a null value used in cases where fraction attributes sum to less than 0.2 of the cell's area. (I think this is probably wrong; it leads to data loss for cells on the edge of nodata areas. Perhaps there should be a switch for whether null should be a valid modal value; or to give a threshold like 0.2 as a parameter.)
Continuous
- Centroid: as above.
- Mean: area-weighted arithmetic mean. The authors are careful to do the conversion operations in the native coordinate reference system. For data in authalic coordinate reference systems, the area-weighted mean is the simple mean. But for data in WGS84, they calculate the size of each pixel and use that as a weight when calculating the mean. See VRT warping method causing spatial inconsistencies. #14 for other discussion on how we handle reprojection issues currently; it may need revision.
This issue should be closed when this tool is capable of reproducing all of these cases.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request