A Python pipeline for producing landslide susceptibility maps from GIS conditioning factors (slope, elevation, lithology, distance to streams, …) and a landslide inventory. The model uses logistic regression and reports both statistical inference (statsmodels: coefficients, p-values, odds ratios, McFadden pseudo-R²) and predictive performance (scikit-learn: ROC/AUC, cross-validation, confusion matrix). Output is a GeoTIFF probability surface plus a 5-class susceptibility map.
pip install -r requirements.txtSystem dependency:
rasterio,geopandas, andfionarequire a working GDAL installation (e.g.apt install gdal-bin libgdal-devon Debian/Ubuntu, or use the conda-forge channel:conda install -c conda-forge gdal rasterio geopandas).
python main.py --demoThis generates a small synthetic study area under data/demo/ (200 × 200
pixels, 7 conditioning factors, 150 landslide presence points) and runs the
full pipeline end-to-end. It produces:
outputs/susceptibility.tif— landslide probability per pixel (0–1)outputs/susceptibility_classes.tif— 5 classes (Very Low … Very High)outputs/stats_report.txt— coefficients, p-values, odds ratios, pseudo-R²figures/roc.png— ROC curve with AUCfigures/feature_importance.png— standardized coefficient bar chart
On the synthetic data the model typically achieves AUC ≈ 0.85–0.95.
python main.py \
--rasters /path/to/factors_folder \
--inventory /path/to/landslides.shp \
--categorical lithology,landuse \
--buffer 500 \
--classes 5- Conditioning factor rasters — one GeoTIFF per factor in a single folder. All rasters must share the same CRS, transform, width, and height. The filename (without extension) becomes the factor name.
- Landslide inventory — shapefile or GeoJSON of points or polygons (polygons are converted to centroids). Must have a defined CRS.
- Categorical factors — pass via
--categoricalas a comma-separated list of factor names (matching the GeoTIFF basenames). These are one-hot encoded; remaining factors are standardized.
| Flag | Default | Purpose |
|---|---|---|
--demo |
off | Generate synthetic data and run end-to-end |
--rasters |
– | Folder of conditioning factor GeoTIFFs |
--inventory |
– | Path to landslide inventory file |
--categorical |
"" |
Comma-separated names of categorical factors |
--out-dir |
outputs/ |
Where to write GeoTIFFs and stats report |
--fig-dir |
figures/ |
Where to write ROC + importance plots |
--buffer |
500 |
Exclusion buffer (CRS units) for absence sampling |
--test-size |
0.25 |
Held-out fraction |
--cv-folds |
5 |
Stratified k-fold AUC cross-validation |
--classes |
5 |
Number of susceptibility classes |
--class-method |
quantile |
quantile or jenks (Natural Breaks) |
--seed |
42 |
RNG seed |
.
├── main.py # CLI entry point
├── requirements.txt
├── src/landslide/
│ ├── io_raster.py # raster stack load / write / nodata mask
│ ├── io_inventory.py # inventory load + CRS reprojection
│ ├── sampling.py # presence/absence point sampling
│ ├── features.py # extraction, OHE, scaling, train/test split
│ ├── model_stats.py # statsmodels Logit (inference)
│ ├── model_sklearn.py # sklearn LR (prediction + CV)
│ ├── predict.py # pixel-wise prediction → GeoTIFF
│ ├── classify.py # quantile / Jenks susceptibility classes
│ ├── plots.py # ROC + feature importance plots
│ └── synthetic.py # demo data generator
├── data/demo/ # written by `--demo`
├── figures/ # ROC + importance plots
└── outputs/ # susceptibility.tif, classes.tif, report
- Load every GeoTIFF in
--rastersinto a single (bands, H, W) stack and validate alignment. - Load the inventory and reproject to the raster CRS; convert polygon geometries to centroids.
- Sample absence points uniformly from valid pixels, excluding a buffer around presence locations (default 500 CRS units), 1:1 with presences.
- Extract raster values at every sample location; drop rows that hit any nodata pixel.
- Encode categorical factors with one-hot encoding and standardize numerics; produce a stratified 75/25 train/test split.
- Fit
statsmodels.Logiton the training set → writestats_report.txtwith the full summary, tidy coefficient table, odds-ratios, and McFadden pseudo-R². - Fit
sklearn.LogisticRegression(L2-regularized) on the training set; report 5-fold CV AUC, held-out accuracy, AUC, and confusion matrix. - Plot ROC curve and standardized-coefficient bar chart.
- Predict landslide probability for every valid pixel in the study
area and write
susceptibility.tif. - Classify the surface into 5 susceptibility levels (quantile or
Jenks Natural Breaks) and write
susceptibility_classes.tif.
- Odds ratios (
exp(coef)) describe how the odds of a landslide change per 1-σ increase in a standardized factor (or the effect of a category vs. the reference class). - AUC is the primary skill metric: 0.7–0.8 = acceptable, 0.8–0.9 = excellent, > 0.9 = outstanding (Hosmer & Lemeshow).
- Susceptibility classes are relative — they rank pixels within a single study area and should not be compared across areas without recalibration.
- Logistic regression assumes linear effects on the log-odds scale and independence between samples — spatial autocorrelation is not modelled.
- The default 1:1 absence:presence ratio is a common but arbitrary choice; results are sensitive to absence sampling strategy.
- The synthetic generator produces signal that is generated by a logistic model, so AUC on demo data overstates real-world performance.