Satellite-Based Deforestation & Land-Cover Change Analysis
Kalpitiya Peninsula, Sri Lanka ย ยทย 2016 โ 2025
- Overview
- Study Area
- Pipeline Architecture
- Outputs & Visualisations
- Repository Structure
- Getting Started
- Dependencies
- License
TerraShift Sri Lanka is a fully reproducible, machine-learning-driven remote sensing pipeline that ingests raw Landsat 8 & 9 Level-2 Surface Reflectance imagery and produces publication-grade environmental analytics โ all from a single Jupyter notebook.
| Stage | What it does |
|---|---|
| ๐ฐ๏ธ Ingest | Discovers and loads 10 multi-year Landsat scenes (2016 โ 2025) |
| โ๏ธ Cloud Mask | Removes contaminated pixels using QA-Pixel confidence flags |
| ๐งฎ Feature Engineering | Derives 11 spectral features: B2โB7, NDVI, NDWI, NDBI, NBR, EVI |
| ๐บ๏ธ Land Masking | Builds a persistent NDWI water mask + GPS-to-UTM Kalpitiya ROI |
| ๐ค Classification | Trains dual Random Forest classifiers (11-feature + 6-band baseline) |
| โ Validation | Cross-scene validation on 3 unseen dates โ Accuracy + Cohen's ฮบ |
| ๐ค GIS Export | Writes GeoTIFFs with native embedded RGBA colormaps for QGIS/ArcGIS |
| ๐ Change Detection | Pixel-wise class transitions โ loss/gain/net hectares per interval |
| ๐ Forecasting | Triple-model ensemble + 95% CI projected to 2030 |
| Property | Value |
|---|---|
| Location | Kalpitiya Peninsula, Northwest Sri Lanka |
| Bounding Box | Lat 8.00ยฐ โ 8.65ยฐ N ยท Lon 79.68ยฐ โ 79.90ยฐ E |
| WRS-2 Path/Row | 142 / 054 |
| Satellite | Landsat 8 (LC08) & Landsat 9 (LC09) ยท Collection 2 ยท Level-2 SR |
| Temporal Coverage | February 2016 โ March 2025 ยท 10 scenes |
| Bands | B2โB7 (Surface Reflectance) + QA-Pixel cloud mask |
| Native Resolution | 30 m per pixel |
The Kalpitiya Peninsula is a unique coastal ecosystem under heavy land-use pressure โ shrimp aquaculture expansion, salt pan development, mangrove clearance, and urbanisation all compete within a narrow strip of coastline. The pipeline isolates this ROI using a pyproj GPS-to-UTM bounding-box transformation combined with a persistent NDWI water mask to completely exclude the Indian Ocean and Puttalam Lagoon from statistics.
Six indices are computed per scene and written to individual out/<index>/ directories:
NDVI = (B5 - B4) / (B5 + B4) # Vegetation health & canopy density
NDWI = (B3 - B5) / (B3 + B5) # Water bodies & coastal inundation
NDBI = (B5 - B6) / (B5 + B6) # Built-up & impervious surfaces
NBR = (B5 - B7) / (B5 + B7) # Burn scar & bare soil detection
EVI = 2.5 * (B5-B4)/(B5+6*B4-7.5*B2+1) # Enhanced canopy signal (reduces soil noise)# Flag pixels that are water in >60% of all scenes
LAND_MASK = ndwi_across_all_scenes > 0.15 # shape: (H, W) boolean
# GPS (WGS84) โ projected CRS โ raster pixel row/col
transformer = Transformer.from_crs("EPSG:4326", raster_crs, always_xy=True)
x_min, y_min = transformer.transform(79.68, 8.00)
row_min, col_min = rowcol(land_transform, x_min, y_min)
KALP_MASK = LAND_MASK & ROI_MASK # only land pixels inside the peninsulaSaved to out/stats/land_mask.tif and out/stats/kalpitiya_mask.tif.
Pseudo-labels are generated from spectral-index thresholds on the earliest scene, then two independent classifiers are trained:
| Model | Features | Purpose |
|---|---|---|
| Standard RF | B2โB7 + NDVI + NDWI + NDBI + NBR + EVI (11 total) | Primary classifier |
| Bands-only RF | B2โB7 only (6 total) | Honest generalisation baseline |
Both are cross-validated on 3 completely unseen chronological scenes, recording Accuracy and Cohen's ฮบ per scene and overall.
Serialised models are saved to out/models/:
rf_model.pkl # Standard 11-feature Random Forest
scaler.pkl # StandardScaler for standard model
rf_bands_only.pkl # Bands-only 6-feature Random Forest
scaler_bands.pkl # StandardScaler for bands-only model
Every scene is classified and written as a LZW-compressed GeoTIFF with a native RGBA colormap embedded directly in the file โ opens in the correct colours in QGIS/ArcGIS with zero configuration:
dst.write_colormap(1, {
0: (33, 113, 181, 255), # ๐ง Water โ Blue
1: (35, 139, 69, 255), # ๐ฟ Vegetation โ Green
2: (212, 168, 67, 255), # ๐ก Bare Soil โ Yellow
3: (203, 24, 29, 255), # ๐ด Built-Up โ Red
})Saved to out/classmaps/<YYYY_MM>_classmap.tif.
For every consecutive scene pair, the pipeline computes pixel-wise class transitions:
| Output Column | Description |
|---|---|
lost_ha |
Vegetation converted to another class (hectares) |
gained_ha |
Vegetation recovered from another class (hectares) |
net_ha |
Net change โ negative = net deforestation |
loss_rate |
Annualised loss velocity (ha / year) |
gain_rate |
Annualised regrowth velocity (ha / year) |
Results โ out/stats/sequential_changes.csv and out/stats/transition_matrix.csv.
Three complementary regression models are fitted to the land-only NDVI time-series:
lin = LinearRegression() # captures long-term slope
poly = make_pipeline(PolynomialFeatures(2), LR()) # captures acceleration curves
gbr = GradientBoostingRegressor(n_estimators=100) # captures step-wise shifts
ensemble = (lin.predict(X_f) + poly.predict(X_f) + gbr.predict(X_f)) / 3A 95% confidence interval is derived from residual standard error using Student's t-distribution:
CI_margin = t_crit ยท sโ ยท โ(1/n + (x_future โ xฬ)ยฒ / SSโ)
All figures are in out/graphs/ at โฅ 130 dpi.
| File | What it shows |
|---|---|
timelineDataCollection.png |
Scene acquisition timeline โ which dates were captured and when |
cloudCoverAnalysis.png |
Per-scene cloud cover percentage bar chart |
trueColorImagesGrid.png |
RGB true-colour preview grid across all 10 scenes |
ndviImagesGrid.png |
NDVI spatial grid across all scenes (Red-Yellow-Green colormap) |
ndviHistGraph.png |
Overlapping NDVI density distributions โ tracks spectral shifts over time |
presistentLandMark.png |
Persistent land mask visualisation (green = land, white = masked water) |
rfAccuracyEval.png |
3-panel ML dashboard: Confusion Matrix ยท Cross-scene accuracy ยท Gini Feature Importances |
landCoverTrend.png |
Stacked bar chart of all 4 land-cover classes across every scene (hectares) |
beforeVsAfter.png |
Side-by-side baseline (2016) vs. final (2025) classified spatial maps |
transitionMatrix.png |
Class-to-class transition probability heatmap |
deforestrationRates.png |
Annualised deforestation vs. regrowth bar chart (ha/year per interval) |
summeryPred.png |
4-panel summary: NDVI trend ยท Dense vegetation % ยท Stacked land-cover ยท Hectare trajectories to 2030 |
GeoTIFFs in
out/classmaps/carry embedded colour tables โ just open them in QGIS and they render correctly with no extra setup.
terraShiftSriLanka/
โ
โโโ ๐ retrain.ipynb โ MASTER PIPELINE โ run this end-to-end
โโโ ๐ catogirizingData.py โ moves raw .TIF files from uncategorized/ into data/raw/<YYYY_MM>/
โโโ ๐ getFileNames.py โ lists all filenames in uncategorized/ to landsat_filenames.txt
โโโ ๐ landsat_filenames.txt โ inventory of raw Landsat filenames
โโโ ๐ README.md
โโโ ๐ LICENSE
โโโ โ๏ธ .gitignore
โ
โโโ uncategorized/ โ drop raw downloaded .TIF files here, then run catogirizingData.py
โ
โโโ data/
โ โโโ raw/ โ organised scenes (git-ignored)
โ โโโ 2016_02/
โ โโโ 2016_03/
โ โโโ ...
โ
โโโ out/ โ all pipeline outputs
โโโ classmaps/ โ classified GeoTIFFs with embedded RGBA colormaps
โโโ change/ โ pairwise change-detection rasters
โโโ graphs/ โ 12 PNG visualisations (tracked in git)
โโโ models/ โ rf_model.pkl ยท rf_bands_only.pkl ยท scaler.pkl ยท scaler_bands.pkl
โโโ stats/ โ sequential_changes.csv ยท transition_matrix.csv ยท land_mask.tif
โโโ ndvi/ ndwi/ ndbi/ โ spectral index rasters (git-ignored)
โโโ nbr/ evi/ โ additional index rasters (git-ignored)
โโโ stacked/ โ multi-band stacked TIFFs (git-ignored)
git clone https://github.com/heyisula/terraShiftSriLanka.git
cd terraShiftSriLankapip install rasterio numpy pandas scikit-learn matplotlib seaborn scipy joblib pyproj tqdmDrop your USGS Level-2 .TIF files into uncategorized/, then run:
python catogirizingData.pyThis automatically moves every file into the correct data/raw/<YYYY_MM>/ folder based on the date embedded in the filename.
To get a list of all raw files first:
python getFileNames.py # writes landsat_filenames.txt# Open in Jupyter and click Run All
jupyter notebook retrain.ipynb
# OR open in VS Code and click โถโถ Run All CellsAll outputs are written automatically to out/.
| Package | Role |
|---|---|
rasterio |
GeoTIFF I/O, CRS handling, embedded colormap writing |
numpy |
Array operations, masking, feature matrix construction |
pandas |
Time-series tables, CSV export |
scikit-learn |
Random Forest, StandardScaler, accuracy metrics |
scipy |
Student-t CI, linear regression statistics |
matplotlib + seaborn |
All charts, dashboards, and spatial maps |
pyproj |
GPS WGS84 โ UTM coordinate transformation |
joblib |
Model serialisation (.pkl files) |
Based on the 10-scene Landsat analysis spanning 2016 โ 2025 over the Kalpitiya Peninsula:
| Finding | Detail |
|---|---|
| ๐ NDVI Declining Trend | Mean land-only NDVI shows a statistically significant negative slope, confirming progressive vegetation loss over the decade |
| ๐ฟ Vegetation โ Bare Soil | The dominant transition class is forest/scrub converting to bare soil โ driven primarily by shrimp aquaculture pond expansion and salt pan development |
| ๐ด Built-Up Acceleration | Built-up area (urbanisation) shows a rising trajectory, particularly concentrated in the southern approach roads to Kalpitiya town |
| ๐ฉถ Ash/Grey Pixels on Land | Pixels appearing grey in classified maps are not ocean โ they are shrimp ponds, salt pans, and brackish mudflats flagged by the persistent water mask (NDWI > 0.15 in >60% of scenes), correctly excluded from vegetation statistics |
| ๐ Seasonal Oscillation | February scenes consistently show higher NDVI than March scenes of the same year โ reflecting the tail of Sri Lanka's north-east monsoon keeping scrub vegetation green longer into Q1 |
| ๐ค Model Accuracy | The Standard 11-feature RF outperforms the Bands-only baseline by a meaningful margin in Cohen's ฮบ, confirming that engineered spectral indices (especially NDVI and NDWI) carry genuine predictive information beyond raw reflectance |
| ๐ฎ 2030 Forecast | The ensemble model projects continued NDVI decline. The 95% confidence interval widens significantly beyond 2027, reflecting uncertainty in the pace of aquaculture and urban expansion |
All images below are generated by
retrain.ipynband stored inout/graphs/.
Acquisition dates for all 10 Landsat scenes across the 2016โ2025 study period.
Per-scene cloud cover percentage โ scenes with high cloud fraction are automatically masked.
RGB true-colour composite previews across all 10 scenes โ shows visible land-cover change over time.
NDVI spatial maps across all 10 scenes (Red = bare/water ยท Yellow = sparse vegetation ยท Green = dense canopy).
Overlapping NDVI density distributions per scene โ the leftward shift of the green peak confirms progressive canopy thinning.
Persistent land mask (green = confirmed land, white = permanently masked water). Shrimp ponds, salt pans and mudflats are correctly masked out to prevent skewing vegetation statistics.
3-panel ML dashboard: Confusion Matrix (in-scene) ยท Cross-scene accuracy comparison (Standard vs Bands-only RF) ยท Gini Feature Importances. NDVI and NDWI are consistently the top two most informative features.
Side-by-side classified maps: baseline (earliest scene) vs. final (2025). Green = Vegetation ยท Yellow = Bare Soil ยท Red = Built-Up ยท Blue = Water ยท Grey = Masked.
Stacked bar chart showing the real-world hectare composition of all 4 land-cover classes across all scenes.
Class-to-class transition probability heatmap โ reveals the dominant conversion pathways (e.g. Vegetation โ Bare Soil).
Annualised deforestation loss (Crimson) vs. regrowth gain (Teal) in hectares per year for every consecutive scene interval.
4-panel summary dashboard: smoothed NDVI trend with ensemble forecast to 2030 + 95% CI ยท Dense vegetation (%) over time ยท Stacked land-cover composition ยท Absolute hectare trajectories.
Released under the MIT License โ see LICENSE for details.
If you use this repository in a publication or academic work, please cite:
@misc{heyisula2026,
author = {Isula Dissanayake},
title = {TerraShift Sri Lanka: Kalpitiya Deforestation and Land-Cover Change Analysis (2016-2025)},
year = {2026},
howpublished = {\url{https://github.com/heyisula/terraShiftSriLanka}},
note = {GitHub repository}
}Made with Satelite Imagery, Python and remote sensing science
โญ Star this repo if you found it useful!