Skip to content

heyisula/terraShiftSriLanka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

33 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒฟ TerraShift Sri Lanka

Satellite-Based Deforestation & Land-Cover Change Analysis

Kalpitiya Peninsula, Sri Lanka ย ยทย  2016 โ€“ 2025


Python Jupyter scikit-learn Rasterio SciPy License: MIT


๐Ÿ“Œ Table of Contents


๐ŸŒ Overview

TerraShift Sri Lanka is a fully reproducible, machine-learning-driven remote sensing pipeline that ingests raw Landsat 8 & 9 Level-2 Surface Reflectance imagery and produces publication-grade environmental analytics โ€” all from a single Jupyter notebook.

Stage What it does
๐Ÿ›ฐ๏ธ Ingest Discovers and loads 10 multi-year Landsat scenes (2016 โ†’ 2025)
โ˜๏ธ Cloud Mask Removes contaminated pixels using QA-Pixel confidence flags
๐Ÿงฎ Feature Engineering Derives 11 spectral features: B2โ€“B7, NDVI, NDWI, NDBI, NBR, EVI
๐Ÿ—บ๏ธ Land Masking Builds a persistent NDWI water mask + GPS-to-UTM Kalpitiya ROI
๐Ÿค– Classification Trains dual Random Forest classifiers (11-feature + 6-band baseline)
โœ… Validation Cross-scene validation on 3 unseen dates โ†’ Accuracy + Cohen's ฮบ
๐Ÿ“ค GIS Export Writes GeoTIFFs with native embedded RGBA colormaps for QGIS/ArcGIS
๐Ÿ”„ Change Detection Pixel-wise class transitions โ†’ loss/gain/net hectares per interval
๐Ÿ“ˆ Forecasting Triple-model ensemble + 95% CI projected to 2030

๐Ÿ›ฐ๏ธ Study Area

Property Value
Location Kalpitiya Peninsula, Northwest Sri Lanka
Bounding Box Lat 8.00ยฐ โ€“ 8.65ยฐ N ยท Lon 79.68ยฐ โ€“ 79.90ยฐ E
WRS-2 Path/Row 142 / 054
Satellite Landsat 8 (LC08) & Landsat 9 (LC09) ยท Collection 2 ยท Level-2 SR
Temporal Coverage February 2016 โ†’ March 2025 ยท 10 scenes
Bands B2โ€“B7 (Surface Reflectance) + QA-Pixel cloud mask
Native Resolution 30 m per pixel

The Kalpitiya Peninsula is a unique coastal ecosystem under heavy land-use pressure โ€” shrimp aquaculture expansion, salt pan development, mangrove clearance, and urbanisation all compete within a narrow strip of coastline. The pipeline isolates this ROI using a pyproj GPS-to-UTM bounding-box transformation combined with a persistent NDWI water mask to completely exclude the Indian Ocean and Puttalam Lagoon from statistics.


๐Ÿ”ฌ Pipeline Architecture

1 ยท Spectral Index Computation

Six indices are computed per scene and written to individual out/<index>/ directories:

NDVI = (B5 - B4) / (B5 + B4)               # Vegetation health & canopy density
NDWI = (B3 - B5) / (B3 + B5)               # Water bodies & coastal inundation
NDBI = (B5 - B6) / (B5 + B6)               # Built-up & impervious surfaces
NBR  = (B5 - B7) / (B5 + B7)               # Burn scar & bare soil detection
EVI  = 2.5 * (B5-B4)/(B5+6*B4-7.5*B2+1)   # Enhanced canopy signal (reduces soil noise)

2 ยท Persistent Land Mask + Kalpitiya ROI

# Flag pixels that are water in >60% of all scenes
LAND_MASK = ndwi_across_all_scenes > 0.15   # shape: (H, W) boolean

# GPS (WGS84) โ†’ projected CRS โ†’ raster pixel row/col
transformer = Transformer.from_crs("EPSG:4326", raster_crs, always_xy=True)
x_min, y_min = transformer.transform(79.68, 8.00)
row_min, col_min = rowcol(land_transform, x_min, y_min)

KALP_MASK = LAND_MASK & ROI_MASK   # only land pixels inside the peninsula

Saved to out/stats/land_mask.tif and out/stats/kalpitiya_mask.tif.

3 ยท Dual Random Forest Classification

Pseudo-labels are generated from spectral-index thresholds on the earliest scene, then two independent classifiers are trained:

Model Features Purpose
Standard RF B2โ€“B7 + NDVI + NDWI + NDBI + NBR + EVI (11 total) Primary classifier
Bands-only RF B2โ€“B7 only (6 total) Honest generalisation baseline

Both are cross-validated on 3 completely unseen chronological scenes, recording Accuracy and Cohen's ฮบ per scene and overall.

Serialised models are saved to out/models/:

rf_model.pkl          # Standard 11-feature Random Forest
scaler.pkl            # StandardScaler for standard model
rf_bands_only.pkl     # Bands-only 6-feature Random Forest
scaler_bands.pkl      # StandardScaler for bands-only model

4 ยท GIS-Ready Classification Export

Every scene is classified and written as a LZW-compressed GeoTIFF with a native RGBA colormap embedded directly in the file โ€” opens in the correct colours in QGIS/ArcGIS with zero configuration:

dst.write_colormap(1, {
    0: (33,  113, 181, 255),   # ๐Ÿ’ง Water      โ†’ Blue
    1: (35,  139,  69, 255),   # ๐ŸŒฟ Vegetation โ†’ Green
    2: (212, 168,  67, 255),   # ๐ŸŸก Bare Soil  โ†’ Yellow
    3: (203,  24,  29, 255),   # ๐Ÿ”ด Built-Up   โ†’ Red
})

Saved to out/classmaps/<YYYY_MM>_classmap.tif.

5 ยท Sequential Change Detection

For every consecutive scene pair, the pipeline computes pixel-wise class transitions:

Output Column Description
lost_ha Vegetation converted to another class (hectares)
gained_ha Vegetation recovered from another class (hectares)
net_ha Net change โ€” negative = net deforestation
loss_rate Annualised loss velocity (ha / year)
gain_rate Annualised regrowth velocity (ha / year)

Results โ†’ out/stats/sequential_changes.csv and out/stats/transition_matrix.csv.

6 ยท Ensemble Trend Forecasting to 2030

Three complementary regression models are fitted to the land-only NDVI time-series:

lin  = LinearRegression()                               # captures long-term slope
poly = make_pipeline(PolynomialFeatures(2), LR())       # captures acceleration curves
gbr  = GradientBoostingRegressor(n_estimators=100)      # captures step-wise shifts

ensemble = (lin.predict(X_f) + poly.predict(X_f) + gbr.predict(X_f)) / 3

A 95% confidence interval is derived from residual standard error using Student's t-distribution:

CI_margin = t_crit ยท sโ‚‘ ยท โˆš(1/n + (x_future โˆ’ xฬ„)ยฒ / SSโ‚“)

๐Ÿ“Š Outputs & Visualisations

All figures are in out/graphs/ at โ‰ฅ 130 dpi.

File What it shows
timelineDataCollection.png Scene acquisition timeline โ€” which dates were captured and when
cloudCoverAnalysis.png Per-scene cloud cover percentage bar chart
trueColorImagesGrid.png RGB true-colour preview grid across all 10 scenes
ndviImagesGrid.png NDVI spatial grid across all scenes (Red-Yellow-Green colormap)
ndviHistGraph.png Overlapping NDVI density distributions โ€” tracks spectral shifts over time
presistentLandMark.png Persistent land mask visualisation (green = land, white = masked water)
rfAccuracyEval.png 3-panel ML dashboard: Confusion Matrix ยท Cross-scene accuracy ยท Gini Feature Importances
landCoverTrend.png Stacked bar chart of all 4 land-cover classes across every scene (hectares)
beforeVsAfter.png Side-by-side baseline (2016) vs. final (2025) classified spatial maps
transitionMatrix.png Class-to-class transition probability heatmap
deforestrationRates.png Annualised deforestation vs. regrowth bar chart (ha/year per interval)
summeryPred.png 4-panel summary: NDVI trend ยท Dense vegetation % ยท Stacked land-cover ยท Hectare trajectories to 2030

GeoTIFFs in out/classmaps/ carry embedded colour tables โ€” just open them in QGIS and they render correctly with no extra setup.


๐Ÿ“ Repository Structure

terraShiftSriLanka/
โ”‚
โ”œโ”€โ”€ ๐Ÿ““ retrain.ipynb              โ† MASTER PIPELINE โ€” run this end-to-end
โ”œโ”€โ”€ ๐Ÿ catogirizingData.py        โ† moves raw .TIF files from uncategorized/ into data/raw/<YYYY_MM>/
โ”œโ”€โ”€ ๐Ÿ getFileNames.py            โ† lists all filenames in uncategorized/ to landsat_filenames.txt
โ”œโ”€โ”€ ๐Ÿ“„ landsat_filenames.txt      โ† inventory of raw Landsat filenames
โ”œโ”€โ”€ ๐Ÿ“„ README.md
โ”œโ”€โ”€ ๐Ÿ“„ LICENSE
โ””โ”€โ”€ โš™๏ธ  .gitignore
โ”‚
โ”œโ”€โ”€ uncategorized/                โ† drop raw downloaded .TIF files here, then run catogirizingData.py
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ raw/                      โ† organised scenes (git-ignored)
โ”‚       โ”œโ”€โ”€ 2016_02/
โ”‚       โ”œโ”€โ”€ 2016_03/
โ”‚       โ””โ”€โ”€ ...
โ”‚
โ””โ”€โ”€ out/                          โ† all pipeline outputs
    โ”œโ”€โ”€ classmaps/                โ† classified GeoTIFFs with embedded RGBA colormaps
    โ”œโ”€โ”€ change/                   โ† pairwise change-detection rasters
    โ”œโ”€โ”€ graphs/                   โ† 12 PNG visualisations (tracked in git)
    โ”œโ”€โ”€ models/                   โ† rf_model.pkl ยท rf_bands_only.pkl ยท scaler.pkl ยท scaler_bands.pkl
    โ”œโ”€โ”€ stats/                    โ† sequential_changes.csv ยท transition_matrix.csv ยท land_mask.tif
    โ”œโ”€โ”€ ndvi/  ndwi/  ndbi/       โ† spectral index rasters (git-ignored)
    โ”œโ”€โ”€ nbr/   evi/               โ† additional index rasters (git-ignored)
    โ””โ”€โ”€ stacked/                  โ† multi-band stacked TIFFs (git-ignored)

๐Ÿš€ Getting Started

1. Clone

git clone https://github.com/heyisula/terraShiftSriLanka.git
cd terraShiftSriLanka

2. Install dependencies

pip install rasterio numpy pandas scikit-learn matplotlib seaborn scipy joblib pyproj tqdm

3. Organise raw Landsat scenes

Drop your USGS Level-2 .TIF files into uncategorized/, then run:

python catogirizingData.py

This automatically moves every file into the correct data/raw/<YYYY_MM>/ folder based on the date embedded in the filename.

To get a list of all raw files first:

python getFileNames.py    # writes landsat_filenames.txt

4. Run the pipeline

# Open in Jupyter and click Run All
jupyter notebook retrain.ipynb

# OR open in VS Code and click โ–ถโ–ถ Run All Cells

All outputs are written automatically to out/.


๐Ÿ“ฆ Dependencies

Package Role
rasterio GeoTIFF I/O, CRS handling, embedded colormap writing
numpy Array operations, masking, feature matrix construction
pandas Time-series tables, CSV export
scikit-learn Random Forest, StandardScaler, accuracy metrics
scipy Student-t CI, linear regression statistics
matplotlib + seaborn All charts, dashboards, and spatial maps
pyproj GPS WGS84 โ†’ UTM coordinate transformation
joblib Model serialisation (.pkl files)

๐Ÿ” Key Findings & Insights

Based on the 10-scene Landsat analysis spanning 2016 โ†’ 2025 over the Kalpitiya Peninsula:

Finding Detail
๐Ÿ“‰ NDVI Declining Trend Mean land-only NDVI shows a statistically significant negative slope, confirming progressive vegetation loss over the decade
๐ŸŒฟ Vegetation โ†’ Bare Soil The dominant transition class is forest/scrub converting to bare soil โ€” driven primarily by shrimp aquaculture pond expansion and salt pan development
๐Ÿ”ด Built-Up Acceleration Built-up area (urbanisation) shows a rising trajectory, particularly concentrated in the southern approach roads to Kalpitiya town
๐Ÿฉถ Ash/Grey Pixels on Land Pixels appearing grey in classified maps are not ocean โ€” they are shrimp ponds, salt pans, and brackish mudflats flagged by the persistent water mask (NDWI > 0.15 in >60% of scenes), correctly excluded from vegetation statistics
๐Ÿ“Š Seasonal Oscillation February scenes consistently show higher NDVI than March scenes of the same year โ€” reflecting the tail of Sri Lanka's north-east monsoon keeping scrub vegetation green longer into Q1
๐Ÿค– Model Accuracy The Standard 11-feature RF outperforms the Bands-only baseline by a meaningful margin in Cohen's ฮบ, confirming that engineered spectral indices (especially NDVI and NDWI) carry genuine predictive information beyond raw reflectance
๐Ÿ”ฎ 2030 Forecast The ensemble model projects continued NDVI decline. The 95% confidence interval widens significantly beyond 2027, reflecting uncertainty in the pace of aquaculture and urban expansion

๐Ÿ–ผ๏ธ Visual Results Gallery

All images below are generated by retrain.ipynb and stored in out/graphs/.

๐Ÿ—“๏ธ Scene Timeline & Cloud Cover

Scene Timeline Acquisition dates for all 10 Landsat scenes across the 2016โ€“2025 study period.

Cloud Cover Per-scene cloud cover percentage โ€” scenes with high cloud fraction are automatically masked.


๐ŸŒˆ True Colour & NDVI Grids

True Colour Grid RGB true-colour composite previews across all 10 scenes โ€” shows visible land-cover change over time.

NDVI Grid NDVI spatial maps across all 10 scenes (Red = bare/water ยท Yellow = sparse vegetation ยท Green = dense canopy).


๐Ÿ“Š NDVI Spectral Distribution

NDVI Histogram Overlapping NDVI density distributions per scene โ€” the leftward shift of the green peak confirms progressive canopy thinning.


๐Ÿ—บ๏ธ Persistent Land Mask

Land Mask Persistent land mask (green = confirmed land, white = permanently masked water). Shrimp ponds, salt pans and mudflats are correctly masked out to prevent skewing vegetation statistics.


๐Ÿค– Machine Learning Evaluation

RF Accuracy Dashboard 3-panel ML dashboard: Confusion Matrix (in-scene) ยท Cross-scene accuracy comparison (Standard vs Bands-only RF) ยท Gini Feature Importances. NDVI and NDWI are consistently the top two most informative features.


๐Ÿ—บ๏ธ Before vs. After Classification Maps

Before vs After Side-by-side classified maps: baseline (earliest scene) vs. final (2025). Green = Vegetation ยท Yellow = Bare Soil ยท Red = Built-Up ยท Blue = Water ยท Grey = Masked.


๐Ÿ“ˆ Land Cover Trend & Change

Land Cover Trend Stacked bar chart showing the real-world hectare composition of all 4 land-cover classes across all scenes.

Transition Matrix Class-to-class transition probability heatmap โ€” reveals the dominant conversion pathways (e.g. Vegetation โ†’ Bare Soil).

Deforestation Rates Annualised deforestation loss (Crimson) vs. regrowth gain (Teal) in hectares per year for every consecutive scene interval.


๐Ÿ”ฎ Forecasting Dashboard

Summary & Forecast 4-panel summary dashboard: smoothed NDVI trend with ensemble forecast to 2030 + 95% CI ยท Dense vegetation (%) over time ยท Stacked land-cover composition ยท Absolute hectare trajectories.


๐Ÿ“„ License & Citation

Released under the MIT License โ€” see LICENSE for details.

If you use this repository in a publication or academic work, please cite:

@misc{heyisula2026,
  author       = {Isula Dissanayake},
  title        = {TerraShift Sri Lanka: Kalpitiya Deforestation and Land-Cover Change Analysis (2016-2025)},
  year         = {2026},
  howpublished = {\url{https://github.com/heyisula/terraShiftSriLanka}},
  note         = {GitHub repository}
}

Made with Satelite Imagery, Python and remote sensing science

โญ Star this repo if you found it useful!

About

TerraShiftSriLanka analyzes land cover changes in a selected region of Sri Lanka using satellite imagery.

Resources

License

Stars

Watchers

Forks

Contributors