You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implements municipio_id normalization in R scripts and adds coverage audit. Enhances data integrity by ensuring consistent formatting of municipio_id values across datasets and introduces an automated audit process for municipal forecast coverage.
# Municipal Forecast ID Normalization (Paused Task)
2
2
3
-
_Last updated: 2025-11-19_
3
+
_Last updated: 2025-11-20_
4
4
5
5
## Status
6
-
- Nationwide municipal forecast run produced 65,737 rows for `2025-11-19`, but `municipio_id` values are stored without left padding and some contain stray whitespace/newlines.
7
-
- Direct comparisons against `data/input/municipalities.csv.gz` therefore flag entire provinces (e.g., Barcelona, Badajoz, Burgos) as missing even though data exists.
6
+
-`scripts/r/get_forecast_data_hybrid.R` pads and trims `municipio_id` values everywhere via `normalize_municipio_id()` (live since 2025-11-20).
7
+
-`update_municipal_forecasts_only.sh` (job 27127, shards 1–5) re-ran with the fix at 11:32 CET; cumulative file now holds 664,489 rows with correctly padded IDs.
8
+
- Coverage audit installed: shard 1 now runs `python3 scripts/python/audit_municipal_forecast_coverage.py` after each array completion. The audit exits non-zero if any non-excluded IDs are missing.
9
+
- Latest audit (2025-11-20): 8,129 reference municipios, 8,037 collected; shortfall limited to the known excluded sets below.
8
10
9
-
## Outstanding Work
10
-
1. Patch the municipal forecast collector(s) so `municipio_id` values are `str_trim`\+`str_pad(width = 5, pad = "0")` prior to persistence.
11
-
2. Regenerate today’s municipal forecasts after deploying the fix to validate that all ~8k municipalities collect successfully.
12
-
3. Re-run the audit script to confirm the differential drops to the expected handful of communal territories (53xxx codes, North African islets, etc.).
11
+
## Expected Gaps (excluded from coverage metrics)
13
12
14
-
## Notes
15
-
- Example bad value: `municipio_id = "8001"` (should be `08001`).
### New municipios without AEMET forecasts (monitor if they appear)
14
+
-`11903` — San Martín del Tesorillo
15
+
-`14901` — Fuente Carreteros
16
+
-`14902` — La Guijarrosa
17
+
-`18077` — Fornes
18
+
-`21902` — La Zarza-Perrunal
19
+
-`41904` — El Palmar de Troya
20
+
21
+
### Communal / parzonería / ledanía territories
22
+
-`53000`–`53083`
23
+
-`54001`–`54005`
24
+
25
+
The audit script ignores the IDs above for coverage calculations but will print a warning if any of them begin to appear in the AEMET output so we can revisit downstream handling.
- Forecast cumulative file is plain CSV at `data/output/daily_municipal_forecast.csv.gz` despite the `.gz` suffix.
49
+
- Municipal coverage audit runs automatically for SLURM array task 1; rerun manually with `python3 scripts/python/audit_municipal_forecast_coverage.py` if needed.
50
+
- Manual rewrite script (2025-11-20) remains in shell history should another one-off normalization ever be required.
0 commit comments