Skip to content

Commit eb8ba74

Browse files
committed
data: add India agriculture raster and vector layer configs
- Add india_worldcover (ESA WorldCover, resampled 10x) and india_soc_change (SOC prediction change 2018-2023) to raster_config.yaml - Add WorldCover resampling preprocessing cell to 01_raster_layers.ipynb - Add resample_factor option to WorldCereal workflow in 04_layers_step1.ipynb for large countries that exceed Mapbox 500KB tile limit - Document data directory structure in LAYER_PROCESSING_GUIDELINES.md
1 parent 5292b51 commit eb8ba74

4 files changed

Lines changed: 97 additions & 21 deletions

File tree

data-processing/LAYER_PROCESSING_GUIDELINES.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,20 @@ uv sync
4949
source .venv/bin/activate
5050
```
5151

52+
### Data directory
53+
54+
Raw inputs and processed outputs live in `data-processing/data/`, which is gitignored. Create it manually on first clone:
55+
56+
```
57+
data-processing/data/
58+
├── raw/ # Original files from data providers, organised by theme/country
59+
│ └── {Theme}/{Country}/...
60+
└── processed/ # Pipeline outputs (COG, MBTiles, APNGs), organised the same way
61+
└── {Theme}/{Country}/{Rasters|Vectors|APNGs}/...
62+
```
63+
64+
All config paths (`base_path`, `input_file`, `output_file`, `vector_file`) are relative to `data-processing/src/`, so they use `../data/raw/...` and `../data/processed/...`.
65+
5266
### Environment variables
5367

5468
Copy `.env.example` to `.env` and fill in credentials:

data-processing/notebooks/01_raster_layers.ipynb

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
"\n",
1919
"The source of truth for all raster layer definitions is `raster_config.yaml`. Most layers run straight from config with no manual steps required and are not documented here.\n",
2020
"\n",
21-
"This notebook documents only layers that needed **custom preprocessing** before running the standard processor \u2014 merging tiles, resampling, clipping to boundaries, assigning a missing CRS, etc. These steps are kept for reproducibility and reference.\n",
21+
"This notebook documents only layers that needed **custom preprocessing** before running the standard processor merging tiles, resampling, clipping to boundaries, assigning a missing CRS, etc. These steps are kept for reproducibility and reference.\n",
2222
"\n",
2323
"To process layers that require no preprocessing, use the **working cell** at the bottom of this notebook."
2424
],
@@ -298,7 +298,7 @@
298298
" output_file = input_file.parent / f\"0007MZ_GIL_L4_AGB_{suffix}.tiff\"\n",
299299
" with rio.open(output_file, \"w\", **meta) as dst:\n",
300300
" dst.write(src.read(i), 1)\n",
301-
" print(f\"{suffix} ({description}) \u2192 {output_file}\")"
301+
" print(f\"{suffix} ({description}) {output_file}\")"
302302
]
303303
},
304304
{
@@ -319,7 +319,7 @@
319319
" input_file = raw_dir / f\"0007MZ_GIL_L4_AGB_{suffix}.tiff\"\n",
320320
" output_file = raw_dir / f\"0007MZ_GIL_L4_AGB_{suffix}_resampled.tiff\"\n",
321321
" resample_raster(input_file, output_file, scale_factor=3, resampling_method=Resampling.average)\n",
322-
" print(f\"{suffix} resampled \u2192 {output_file}\")"
322+
" print(f\"{suffix} resampled {output_file}\")"
323323
]
324324
},
325325
{
@@ -337,6 +337,18 @@
337337
")"
338338
]
339339
},
340+
{
341+
"cell_type": "markdown",
342+
"source": "#### India - Agriculture (WorldCover)",
343+
"metadata": {}
344+
},
345+
{
346+
"cell_type": "code",
347+
"source": "# Resample WorldCover tile from 10m to ~100m (36000x36000 → 3600x3600).\n# 5x was still too dense — styled RGBA tiles exceed Mapbox 500KB limit at z8.\n# Categorical data — use nearest resampling to preserve land cover classes.\nfrom rasterio.enums import Resampling\n\ninput_file = Path(\n \"../data/raw/Agriculture/India/GDA-AGRI_UC1-India/ESA_WORLDCOVER/\"\n \"ESA_WorldCover_10m_2021_v200_N30E075_Map/ESA_WorldCover_10m_2021_v200_N30E075_Map.tif\"\n)\noutput_file = Path(\n \"../data/raw/Agriculture/India/GDA-AGRI_UC1-India/ESA_WORLDCOVER/\"\n \"ESA_WorldCover_10m_2021_v200_N30E075_Map/ESA_WorldCover_10m_2021_v200_N30E075_Map_resampled.tif\"\n)\n\nresample_raster(input_file, output_file, scale_factor=10, resampling_method=Resampling.nearest)\nprint(f\"Resampled → {output_file}\")",
348+
"metadata": {},
349+
"execution_count": null,
350+
"outputs": []
351+
},
340352
{
341353
"cell_type": "markdown",
342354
"id": "new-section-header",
@@ -350,7 +362,7 @@
350362
"1. Add the layer definition to `raster_config.yaml`\n",
351363
"2. Edit `layer_keys` below and run\n",
352364
"3. If preprocessing was needed, move those cells to the **Preprocessing records** section above with a markdown header explaining what was done\n",
353-
"4. Revert this cell before committing \u2014 do not accumulate processed layer keys here"
365+
"4. Revert this cell before committing do not accumulate processed layer keys here"
354366
],
355367
"outputs": [],
356368
"execution_count": null
@@ -361,7 +373,7 @@
361373
"id": "new-working-cell",
362374
"metadata": {},
363375
"outputs": [],
364-
"source": "# WORKING CELL \u2014 edit layer_keys, run, then revert. Do not commit with real values.\nprocess_raster_layers(\n config_path=config_path,\n layer_keys=[\"your_layer_key_here\"],\n upload_layer_keys=[],\n)"
376+
"source": "# WORKING CELL edit layer_keys, run, then revert. Do not commit with real values.\nprocess_raster_layers(\n config_path=config_path,\n layer_keys=[\"your_layer_key_here\"],\n upload_layer_keys=[],\n)"
365377
}
366378
],
367379
"metadata": {

data-processing/notebooks/04_layers_step1.ipynb

Lines changed: 48 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,25 @@
44
"cell_type": "markdown",
55
"id": "ec48b40a",
66
"metadata": {},
7-
"source": "# Extract layers for step 1\n\nEvery story in the Impact Sphere includes a Step 1 layer — a national or regional overview that provides geographic context for the story. This layer always covers the full country or region, even when the story itself focuses on a specific city or area within it (e.g., a story about Nairobi still uses a Kenya-wide Step 1 layer; a story about West Africa uses a regional mosaic).\n\nThis notebook extracts Step 1 layers from one of four global datasets:\n- **WorldCover** — ESA global land cover map (10 m)\n- **WSF Building Area** — DLR World Settlement Footprint 3D building area\n- **Global Surface Water (GSW)** — JRC surface water occurrence\n- **WorldCereal** — ESA global cropland classification\n\nMore sources may be added in the future.\n\n## Workflows\n\nEach data source section provides two workflows:\n- **Single country** — clip or download the dataset for one country, then style and process\n- **Multi-country region** — clip or download per country (each clipped to its GADM boundary), merge into a regional mosaic, then style and process"
7+
"source": [
8+
"# Extract layers for step 1\n",
9+
"\n",
10+
"Every story in the Impact Sphere includes a Step 1 layer — a national or regional overview that provides geographic context for the story. This layer always covers the full country or region, even when the story itself focuses on a specific city or area within it (e.g., a story about Nairobi still uses a Kenya-wide Step 1 layer; a story about West Africa uses a regional mosaic).\n",
11+
"\n",
12+
"This notebook extracts Step 1 layers from one of four global datasets:\n",
13+
"- **WorldCover** — ESA global land cover map (10 m)\n",
14+
"- **WSF Building Area** — DLR World Settlement Footprint 3D building area\n",
15+
"- **Global Surface Water (GSW)** — JRC surface water occurrence\n",
16+
"- **WorldCereal** — ESA global cropland classification\n",
17+
"\n",
18+
"More sources may be added in the future.\n",
19+
"\n",
20+
"## Workflows\n",
21+
"\n",
22+
"Each data source section provides two workflows:\n",
23+
"- **Single country** — clip or download the dataset for one country, then style and process\n",
24+
"- **Multi-country region** — clip or download per country (each clipped to its GADM boundary), merge into a regional mosaic, then style and process"
25+
]
826
},
927
{
1028
"cell_type": "markdown",
@@ -22,7 +40,22 @@
2240
"id": "a25ef8df",
2341
"metadata": {},
2442
"outputs": [],
25-
"source": "import os\nimport sys\nfrom glob import glob\nfrom pathlib import Path\n\nimport rasterio as rio\nfrom rasterio.merge import merge\n\nsys.path.append(\"../src\")\n\nfrom data_processing.download_step1 import download_worldcover_for_country, download_gsw_for_country\nfrom data_processing.raster_processor import RasterProcessor\nfrom data_processing.utils import clip_raster_to_country_and_create_cog"
43+
"source": [
44+
"import os\n",
45+
"import sys\n",
46+
"from glob import glob\n",
47+
"from pathlib import Path\n",
48+
"\n",
49+
"import rasterio as rio\n",
50+
"from rasterio.enums import Resampling\n",
51+
"from rasterio.merge import merge\n",
52+
"\n",
53+
"sys.path.append(\"../src\")\n",
54+
"\n",
55+
"from data_processing.download_step1 import download_worldcover_for_country, download_gsw_for_country\n",
56+
"from data_processing.raster_processor import RasterProcessor\n",
57+
"from data_processing.utils import clip_raster_to_country_and_create_cog, resample_raster"
58+
]
2659
},
2760
{
2861
"cell_type": "markdown",
@@ -525,17 +558,7 @@
525558
"id": "5d89ee8c",
526559
"metadata": {},
527560
"outputs": [],
528-
"source": [
529-
"# WORKING CELL — set country, run, then revert.\n",
530-
"country = \"your_country\"\n",
531-
"\n",
532-
"# Clip raster to country and create COG\n",
533-
"raster_file = \"../data/processed/WorldCereal/2021_tc-annual_temporarycrops_classification_0.004deg.tif\"\n",
534-
"output_dir = \"../data/processed/WorldCereal/Clipped/\"\n",
535-
"\n",
536-
"cog_file = clip_raster_to_country_and_create_cog(raster_file, country, output_dir=output_dir)\n",
537-
"print(f\"Created COG: {cog_file}\")"
538-
]
561+
"source": "# WORKING CELL — set country, run, then revert.\ncountry = \"your_country\"\n\n# Clip raster to country and create COG\nraster_file = \"../data/processed/WorldCereal/2021_tc-annual_temporarycrops_classification_0.004deg.tif\"\noutput_dir = \"../data/processed/WorldCereal/Clipped/\"\n\ncog_file = clip_raster_to_country_and_create_cog(raster_file, country, output_dir=output_dir)\nprint(f\"Created COG: {cog_file}\")"
539562
},
540563
{
541564
"cell_type": "code",
@@ -545,13 +568,23 @@
545568
"outputs": [],
546569
"source": [
547570
"# Style and process raster\n",
571+
"# For large countries (e.g. India), resample first to avoid Mapbox 500KB tile limit at low zoom.\n",
572+
"# Set resample_factor > 1 to downsample; 1 = no resampling.\n",
573+
"resample_factor = 1\n",
574+
"\n",
548575
"print(f\"Processing WorldCereal for {country}...\")\n",
549576
"BASE_PATH = Path(\"../data/processed/WorldCereal/\")\n",
550577
"input_file = BASE_PATH / f\"Clipped/2021_tc-annual_temporarycrops_classification_0.004deg_{country.replace(' ', '_')}.tif\"\n",
551578
"style_file = BASE_PATH / \"temporarycrops.qml\"\n",
552579
"output_file = BASE_PATH / f\"Rasters/WorldCereal_{country.replace(' ', '_')}.tif\"\n",
553580
"layer_name = f\"{country.replace(' ', '_')}_WorldCereal\"\n",
554581
"\n",
582+
"if resample_factor > 1:\n",
583+
" resampled_file = input_file.parent / f\"{input_file.stem}_resampled{input_file.suffix}\"\n",
584+
" resample_raster(input_file, resampled_file, scale_factor=resample_factor, resampling_method=Resampling.nearest)\n",
585+
" input_file = resampled_file\n",
586+
" print(f\"Resampled by {resample_factor}x → {resampled_file}\")\n",
587+
"\n",
555588
"RasterProcessor(\n",
556589
" input_file,\n",
557590
" output_file=output_file,\n",
@@ -673,7 +706,7 @@
673706
],
674707
"metadata": {
675708
"kernelspec": {
676-
"display_name": "esa_gda_env",
709+
"display_name": "esa-gda",
677710
"language": "python",
678711
"name": "python3"
679712
},
@@ -687,7 +720,7 @@
687720
"name": "python",
688721
"nbconvert_exporter": "python",
689722
"pygments_lexer": "ipython3",
690-
"version": "3.9.23"
723+
"version": "3.13.11"
691724
}
692725
},
693726
"nbformat": 4,

data-processing/src/raster_config.yaml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -761,4 +761,21 @@ layers:
761761
style_file: "Pop_Accessibility_Classes.qml"
762762
output_file: "../data/processed/Health/Bhutan/Rasters/Bhutan_PopAccess_Hospitals.tif"
763763
layer_name: "Bhutan_PopAccess_Hospitals"
764-
max_zoom: 13
764+
max_zoom: 13
765+
766+
# India - Agriculture
767+
india_worldcover:
768+
base_path: "../data/raw/Agriculture/India/GDA-AGRI_UC1-India/ESA_WORLDCOVER/ESA_WorldCover_10m_2021_v200_N30E075_Map"
769+
input_file: "ESA_WorldCover_10m_2021_v200_N30E075_Map_resampled.tif"
770+
style_file: "../landcover.qml"
771+
output_file: "../data/processed/Agriculture/India/Rasters/India_Worldcover.tif"
772+
layer_name: "India_Worldcover"
773+
max_zoom: 11
774+
775+
india_soc_change:
776+
base_path: "../data/raw/Agriculture/India/GDA-AGRI_UC1-India/PREDICTIONS"
777+
input_file: "SOC_prediction_Himachal-Pradesh-croplands_change-2018-2023.tif"
778+
style_file: "SOC_change.qml"
779+
output_file: "../data/processed/Agriculture/India/Rasters/India_SOC_Change.tif"
780+
layer_name: "India_SOC_Change"
781+
max_zoom: 14

0 commit comments

Comments
 (0)