This repository provides a pipeline for generating spatial signature predictions using satellite imagery data. The pipeline integrates computer vision models for feature embedding creation and an XGBoost model for classification, producing predictions for each image chip.
- Python >= 3.10
- PyTorch == 2.5.1
- torchvision == 0.19
- CUDA == 12.6
To create the required Conda environment, run:
conda env create -f environment.ymlTo execute the spatial signature prediction pipeline, use the following command:
pipeline.spatial_sig_prediction(
geo_path="../eo/data/example/london_25_25_grid_clipped.geojson",
vrt_file="/satellite_demoland/data/mosaic_cube/vrt_allbands/2017_combined.vrt",
xgb_weights="../eo/data/weights/xgb_model_25_latlonh6_feb25_weighted.bin",
model_weights="/satellite_demoland/models/satlas/weights/satlas-model-v1-lowres.pth",
output_path="../eo/predictions/test_london_h6.parquet",
h3_resolution=6
)geo_path: Path to the GeoJSON or Parquet file containing the spatial grid.vrt_file: Path to the VRT file containing satellite imagery.xgb_weights: Path to the trained XGBoost model weights (.binfile).model_weights: Path to the deep learning model weights (Satlas). The weights can be downloaded from: https://huggingface.co/allenai/satlas-pretrain/blob/6e7d6eb1804162733c485a3f542fdc85a2addc55/satlas-model-v1-lowres.pthoutput_path: Path where the Parquet file with predictions will be saved.h3_resolution: H3 resolution level for hexagonal spatial aggregation in predictions.
We show an example of how to run the pipeline and addition functions, such as plotting image examples in notebooks/run_pipeline.ipynb.
The pipeline generates a Parquet file containing spatial signature predictions indexed by grid cells.
prediction_file.columns = ['id', 'prediction', 'probabilities', 'lon_h3', 'lat_h3', 'geometry']lon_h3andlat_h3: Centroid coordinates of the hexagon at the specifiedh3_resolution.prediction: Predicted class label.probabilities: Confidence scores for each class.geometry: Geospatial representation of the grid cell.
The model assigns each grid cell to one of the following spatial classes:
class_labels = {
'Accessible suburbia': 0,
'Connected residential neighbourhoods': 1,
'Countryside agriculture': 2,
'Dense residential neighbourhoods': 3,
'Dense urban neighbourhoods': 4,
'Disconnected suburbia': 5,
'Gridded residential quarters': 6,
'Open sprawl': 7,
'Urban buffer': 8,
'Urbanity': 9,
'Warehouse/Park land': 10,
'Wild countryside': 11
}