Logistic Regression — Landslide Susceptibility Mapping

A Python pipeline for producing landslide susceptibility maps from GIS conditioning factors (slope, elevation, lithology, distance to streams, …) and a landslide inventory. The model uses logistic regression and reports both statistical inference (statsmodels: coefficients, p-values, odds ratios, McFadden pseudo-R²) and predictive performance (scikit-learn: ROC/AUC, cross-validation, confusion matrix). Output is a GeoTIFF probability surface plus a 5-class susceptibility map.

Installation

pip install -r requirements.txt

System dependency: rasterio, geopandas, and fiona require a working GDAL installation (e.g. apt install gdal-bin libgdal-dev on Debian/Ubuntu, or use the conda-forge channel: conda install -c conda-forge gdal rasterio geopandas).

Quick start (synthetic demo)

python main.py --demo

This generates a small synthetic study area under data/demo/ (200 × 200 pixels, 7 conditioning factors, 150 landslide presence points) and runs the full pipeline end-to-end. It produces:

outputs/susceptibility.tif — landslide probability per pixel (0–1)
outputs/susceptibility_classes.tif — 5 classes (Very Low … Very High)
outputs/stats_report.txt — coefficients, p-values, odds ratios, pseudo-R²
figures/roc.png — ROC curve with AUC
figures/feature_importance.png — standardized coefficient bar chart

On the synthetic data the model typically achieves AUC ≈ 0.85–0.95.

Real-data usage

python main.py \
    --rasters /path/to/factors_folder \
    --inventory /path/to/landslides.shp \
    --categorical lithology,landuse \
    --buffer 500 \
    --classes 5

Input data requirements

Conditioning factor rasters — one GeoTIFF per factor in a single folder. All rasters must share the same CRS, transform, width, and height. The filename (without extension) becomes the factor name.
Landslide inventory — shapefile or GeoJSON of points or polygons (polygons are converted to centroids). Must have a defined CRS.
Categorical factors — pass via --categorical as a comma-separated list of factor names (matching the GeoTIFF basenames). These are one-hot encoded; remaining factors are standardized.

CLI options

Flag	Default	Purpose
`--demo`	off	Generate synthetic data and run end-to-end
`--rasters`	–	Folder of conditioning factor GeoTIFFs
`--inventory`	–	Path to landslide inventory file
`--categorical`	`""`	Comma-separated names of categorical factors
`--out-dir`	`outputs/`	Where to write GeoTIFFs and stats report
`--fig-dir`	`figures/`	Where to write ROC + importance plots
`--buffer`	`500`	Exclusion buffer (CRS units) for absence sampling
`--test-size`	`0.25`	Held-out fraction
`--cv-folds`	`5`	Stratified k-fold AUC cross-validation
`--classes`	`5`	Number of susceptibility classes
`--class-method`	`quantile`	`quantile` or `jenks` (Natural Breaks)
`--seed`	`42`	RNG seed

Project layout

.
├── main.py                       # CLI entry point
├── requirements.txt
├── src/landslide/
│   ├── io_raster.py              # raster stack load / write / nodata mask
│   ├── io_inventory.py           # inventory load + CRS reprojection
│   ├── sampling.py               # presence/absence point sampling
│   ├── features.py               # extraction, OHE, scaling, train/test split
│   ├── model_stats.py            # statsmodels Logit (inference)
│   ├── model_sklearn.py          # sklearn LR (prediction + CV)
│   ├── predict.py                # pixel-wise prediction → GeoTIFF
│   ├── classify.py               # quantile / Jenks susceptibility classes
│   ├── plots.py                  # ROC + feature importance plots
│   └── synthetic.py              # demo data generator
├── data/demo/                    # written by `--demo`
├── figures/                      # ROC + importance plots
└── outputs/                      # susceptibility.tif, classes.tif, report

How it works

Load every GeoTIFF in --rasters into a single (bands, H, W) stack and validate alignment.
Load the inventory and reproject to the raster CRS; convert polygon geometries to centroids.
Sample absence points uniformly from valid pixels, excluding a buffer around presence locations (default 500 CRS units), 1:1 with presences.
Extract raster values at every sample location; drop rows that hit any nodata pixel.
Encode categorical factors with one-hot encoding and standardize numerics; produce a stratified 75/25 train/test split.
Fit statsmodels.Logit on the training set → write stats_report.txt with the full summary, tidy coefficient table, odds-ratios, and McFadden pseudo-R².
Fit sklearn.LogisticRegression (L2-regularized) on the training set; report 5-fold CV AUC, held-out accuracy, AUC, and confusion matrix.
Plot ROC curve and standardized-coefficient bar chart.
Predict landslide probability for every valid pixel in the study area and write susceptibility.tif.
Classify the surface into 5 susceptibility levels (quantile or Jenks Natural Breaks) and write susceptibility_classes.tif.

Interpretation notes

Odds ratios (exp(coef)) describe how the odds of a landslide change per 1-σ increase in a standardized factor (or the effect of a category vs. the reference class).
AUC is the primary skill metric: 0.7–0.8 = acceptable, 0.8–0.9 = excellent, > 0.9 = outstanding (Hosmer & Lemeshow).
Susceptibility classes are relative — they rank pixels within a single study area and should not be compared across areas without recalibration.

Limitations

Logistic regression assumes linear effects on the log-odds scale and independence between samples — spatial autocorrelation is not modelled.
The default 1:1 absence:presence ratio is a common but arbitrary choice; results are sensitive to absence sampling strategy.
The synthetic generator produces signal that is generated by a logistic model, so AUC on demo data overstates real-world performance.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/landslide		src/landslide
.gitignore		.gitignore
HelloWorld.Rmd		HelloWorld.Rmd
HelloWorld.docx		HelloWorld.docx
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logistic Regression — Landslide Susceptibility Mapping

Installation

Quick start (synthetic demo)

Real-data usage

Input data requirements

CLI options

Project layout

How it works

Interpretation notes

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Logistic Regression — Landslide Susceptibility Mapping

Installation

Quick start (synthetic demo)

Real-data usage

Input data requirements

CLI options

Project layout

How it works

Interpretation notes

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages