Skip to content

saiga143/citysegmentdeprivation

Repository files navigation

DOI

City Segment Morphological Deprivation (CSMD) Model

This repository contains the analysis pipeline for the City Segment Morphological Deprivation (CSMD) model, associated with an accepted-in-principle paper at Nature Cities.

The CSMD model maps morphologically deprived city segments across cities in Africa, Asia, and Latin America and the Caribbean, using City Segments v1 (CIESIN), IDEABench v2 labels, and Random Forest modelling.

Repository status: Accepted in principle at Nature Cities. The repository is being prepared for final publication. Large processed outputs (RF model, prediction GeoPackages, derived datasets) are hosted on Zenodo; raw source datasets remain with their original providers.

Figure1_Overview_CSMD


Workflow Overview

Step Description
A. Preprocessing & labelling Standardise city-segment data; assign DUA-overlap training labels from IDEABench v2
B. RF training & LOCO validation VSURF variable selection; Random Forest training; leave-one-city-out validation
C. Global prediction / application Apply final RF model to 5 000+ cities across the Global South
D. Comparative alignment Contextual triangulation against SSI, Million Neighborhoods, and WRI Urban Land Use
E. Manuscript figures & tables Generate all published figures and summary tables
F. Revision 2 coverage & omission analysis UCDB / GHS-POP coverage quantification in response to reviewer requests

Repository Structure

citysegmentdeprivation/
├── 1_preprocessing/          # City-segment standardisation and label creation
├── 2_modelling/
│   ├── 01_training/          # VSURF, RF training, LOCO validation
│   └── 02_application/       # Global RF application and city-level summaries
├── 3_comparitive_analysis/   # SSI, MN, WRI comparison scripts and outputs
│                             #   (note: folder name retains historical spelling)
├── 4_Figures_Tables/         # Manuscript figure and summary-table notebooks
├── notebooks/
│   └── revision2_coverage/   # Post-revision UCDB/GHS-POP coverage notebooks
├── outputs/
│   ├── tables/revision2/     # Final and intermediate omission CSVs
│   └── figures/revision2/    # Revised coverage/omission figures
├── docs/                     # Extended documentation (model decisions, data sources, etc.)
├── environment/              # Conda environment, pip requirements, R packages
├── zenodo/                   # Zenodo deposit manifest
└── data/                     # Small committed reference files

Key folders

1_preprocessing/ — Prepares standardised city-segment features and benchmark-labelled training CSVs from IDEABench v2.

2_modelling/ — RF training (01_training/) and global application to 5 000+ cities (02_application/). City-level summary statistics and the final city deprivation CSV are committed under 02_application/summary_statistics/.

3_comparitive_analysis/ — Contextual alignment with SSI, Million Neighborhoods (MN), and WRI Urban Land Use datasets. The folder name preserves the historical spelling used throughout the project.

4_Figures_Tables/ — Notebooks that generate all manuscript figures and global summary tables from committed intermediate CSVs.

notebooks/revision2_coverage/ — Five notebooks added in Revision 2 to quantify UCDB/GHS-POP coverage and omissions. See notebooks/revision2_coverage/README.md for execution order and data requirements.

outputs/ — Final committed outputs: figures (outputs/figures/revision2/) and tables (outputs/tables/revision2/). Intermediate joins are under outputs/tables/revision2/intermediate/.

docs/ — Extended documentation including predictor definitions, model decisions, data-source citations, and a code-to-figure map.

environment/ — Reproducible environment files (environment.yml, requirements.txt, r_packages.R).

zenodo/ — Manifest of files deposited on Zenodo (DOI: 10.5281/zenodo.20486977).


Environment Setup

Conda is recommended, particularly for geospatial dependencies (geopandas, rasterio, fiona).

conda env create -f environment/environment.yml
conda activate csmd

Optional pip fallback (may require manual installation of geospatial system libraries):

pip install -r environment/requirements.txt

R packages (for VSURF variable selection):

Rscript environment/r_packages.R

Key Model Details

Property Value
Training labels IDEABench v2, 8 cities
Label rule DUA spatial overlap ≥ 0.30
Final predictors (8) i5_par_area, i1_pop_area, B_AVG_SEG, i9_roads_par, i6_paru_area, PARU_A_SEG, B_CV_SEG, REGION_CODE / REG1_GHSL
Validation Leave-one-city-out (LOCO)
Decision threshold p(DUA) ≥ 0.40
Comparative analyses Contextual alignment / triangulation with SSI, MN, WRI — not strict ground-truth validation

See docs/predictor_definitions.md and docs/model_decisions.md for full detail.


Data Availability

What Where
Final CSVs, selected outputs, scripts, notebooks, documentation This GitHub repository
RF model (.joblib), prediction GeoPackages, derived UCDB+GHS-POP GPKG, large comparison outputs Zenodo DOI: 10.5281/zenodo.20486977
GHS Urban Centre Database (UCDB) 2019 V1.2 JRC / GHSL official download
GHS-POP R2023A 2025 epoch, 100 m JRC / GHSL official download

Raw rasters, per-country prediction GeoPackages, model binaries, and other large files are excluded from the repository by .gitignore.

Third-party input datasets

The five external input datasets below must be obtained from their original providers. They are not included in this repository or the project Zenodo package.

Dataset Purpose Access Local path
City Segments v1 Segment polygons and built-environment predictors for preprocessing, model application, and figures Harvard Dataverse — DOI: 10.7910/DVN/XLRSF0 data_external/city_segments/
IDEABench Reference deprived/non-deprived labels for the eight benchmark training cities DANS DataStation — DOI: 10.17026/PT/X4NJII (access conditions apply) data_external/ideabench/
Slum Severity Index (SSI) External comparison product for service-related deprivation in sub-Saharan Africa Zenodo — DOI: 10.5281/zenodo.14998570 data_external/ssi_raw/
Million Neighborhoods (MN) External comparison product for building-to-street access complexity in sub-Saharan Africa millionneighborhoods.africa/download data_external/mn_raw/
WRI Urban Land Use dataset External comparison product for intra-urban land-use classes; informal subdivision and atomistic classes used for WRI comparison GEE asset: projects/wri-datalab/urban_land_use/V1 data_external/wri_raw/

See docs/data_availability.md for full dataset citations, download links, and local path conventions.


Reproduction Guide

Full raw-to-final reproduction requires downloading external datasets from their original providers and running computationally intensive steps (rasterisation over the global 100 m GHS-POP grid, RF global application across 5 000+ cities). This is feasible but not a one-command process.

Paper-output reproduction is more accessible: download the Zenodo deposit, place processed files in the expected locations, and run the figure/table notebooks directly from committed intermediate CSVs.

Revision 2 coverage notebooks use a data_external/ convention for large external inputs. See notebooks/revision2_coverage/README.md for the directory layout and setup instructions.

Key documentation

Document Contents
docs/code_figure_map.md Maps each manuscript figure/table to the notebook or script that generates it
docs/model_decisions.md Rationale for modelling choices (threshold, predictors, validation)
docs/predictor_definitions.md Definition and units of all RF predictors
docs/data_availability.md Dataset citations, download URLs, and licence notes
docs/coverage_and_omissions.md Coverage and omission methodology for the Revision 2 analysis
notebooks/revision2_coverage/README.md Execution order and data requirements for coverage notebooks

Citation

If you use or build upon this work, please cite the Zenodo deposit and, once published, the journal article.

DOI

Data deposit: Veeravalli, S. G. (2026). A Global, Standardized City Segment Morphological Deprivation (CSMD) Model: Preprocessing, Training, Predictions, and Cross-Dataset Comparisons (Version v4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20486977

Paper: Citation will be added after publication. Currently accepted in principle at Nature Cities.


Acknowledgements

This work is supported by:

  • FORMAS (Swedish Research Council for Sustainable Development), project DEPRIMAP (2023-01210) — https://sola.kau.se/deprimap/
  • NAISS (National Academic Infrastructure for Supercomputing in Sweden), partially funded by the Swedish Research Council through grant agreement no. 2022-06725 — computation for model training
  • CIESIN for City Segments v1 (Harvard Dataverse, DOI: 10.7910/DVN/XLRSF0) and IDEAtlas for IDEABench (DOI: 10.17026/PT/X4NJII; paper DOI: 10.1016/j.rse.2026.115272)
image

About

City Segment Morphological Deprivation (CSMD) model: an open, reproducible workflow for preprocessing, training, applying, and validating a global urban deprivation classifier. All scripts used in the manuscript, including comparisons with SSI, MN, and WRI datasets, are included.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors