Proper dataset management is crucial for geospatial ML.
Nearby pixels are highly correlated. Random splits cause spatial data leakage, leading to over-optimistic validation metrics.
Solution: Use block-based splitting.
graph TD
Raw[Raw Map] --> Grid[Grid Blocks]
Grid --> Split{Spatial Split}
Split --> Train[Train Blocks]
Split --> Val[Val Blocks]
Satellite imagery is often dominated by "Background" or "Water" classes.
Strategies:
- Oversampling: Sample patches centered on rare classes.
- Loss Weighting: Inversely proportional to class frequency.