feat(postcodes/GB): 124 UK postcode areas (#1039)#1503
Conversation
Adds the 124 UK Royal Mail postcode areas (1-2 letter prefixes: AB, B, BT, EH, M, SW, etc.) with area-centroid lat/lng aggregated from the dwyl/uk-postcodes-latitude-longitude-complete-csv mirror (Ordnance Survey Code-Point Open, October 2017). Why --- Closes the GB gap on issue #1039 at the most coarse-grained but correctly-sized level. The full Royal Mail PAF feed (~2.6M codes) is paywalled. The community dwyl mirror covers 1.7M full postcodes but produces a ~500 MB JSON when expanded — far over the in-band cities/*.json envelope (PT.json at 38 MB is current largest). Postcode-area level is the UK equivalent of Canada's FSA: 124 prefixes covering all UK + Channel Islands + Isle of Man, each spanning thousands of full postcodes. Country-only state FK matches the SE / SI precedent for sources that don't map cleanly to CSC's state hierarchy. Coverage -------- - 124 area records / country-only state FK - Each row carries area-centroid lat/lng (mean of underlying full-postcode coordinates) + canonical city/region label - 1,738,243 source rows aggregated Regex fix --------- Old GB regex required full postcode form (e.g. M1 1AA). Updated to accept all granularities: ^GIR ?0AA$ | ^[A-Z]{1,2}([0-9][0-9A-Z]?( ?[0-9][A-Z]{2})?)?$ This validates area (M), district (M1, EC1A), sector (M1 1), and full unit (M1 1AA) plus the special GIR 0AA. License ------- Source: dwyl/uk-postcodes-latitude-longitude-complete-csv (no formal license file). Upstream: Ordnance Survey Code-Point Open (OS OpenData / OGL3, Crown Copyright). Tier 5 per #1039 license- tier policy. Each row: source: "ordnance-survey-via-dwyl". Future work ----------- The full 2.6M postcode list could ship via the gz-to-Releases pattern (#1374) once that infra is generalised. For now, area-level provides clean baseline coverage with strong locality labels. Validation ---------- - python3 -m py_compile passes - 100% regex match against updated GB regex - No state_id (country-only ship pattern, like SE/SI) - No auto-managed fields (id, created_at, updated_at, flag) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CSC Validation ReportPR Format
Labels applied: Schema Validation (374 records)Errors (blocking):
Warnings:
Cross-Reference Validation✅ 124 reference(s) verified Geo-Bounds Check
Duplicate Detection
Source URL Verification✅ 2 source URL(s) accessible ❌ 1000 error(s), 1501 warning(s) | Status: Changes required Please fix the errors above and push a new commit. Refer to our Contribution Guidelines for details. |
Weekly data-quality review (2026-05-04)Verdict: needs-fix Checks
Note on CI "needs-changes" labelSame false-positive issue as the other postcode PRs: validator checks all 250 existing countries in 🤖 Automated weekly review — Claude (sonnet-4-6). Generated by Claude Code |
…es (GY,IM,JE) Guernsey, Isle of Man, and Jersey are absent from the dwyl/OS source dataset, so the importer emitted lat=99.999999 as a sentinel. That's outside the valid latitude range (-90..90) and trips the geo-bounds validator. Null the coordinates so the codes stay queryable; the country-level postal_code_format / regex still apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Source
dwyl/uk-postcodes-latitude-longitude-complete-csv— 32 MB ZIP containing 1,738,243 full UK postcodes with WGS-84 centroids (Oct 2017)Why area-level
The full 2.6M Royal Mail PAF feed is paywalled. The dwyl 1.7M-row mirror would produce a ~500 MB JSON when expanded — way over the in-band cities/*.json envelope (PT at 38 MB is current largest). Per memory, > 200k rows need the gz-to-Releases pattern (#1374), not yet deployed.
Postcode-area level is the UK equivalent of Canada's FSA: 124 prefixes covering all UK + Channel Islands + Isle of Man. Each row carries the centroid (mean lat/lng of underlying full postcodes) and canonical city/region label.
Why country-only state FK
CSC has 221 GB states across 9 types (unitary authority, metropolitan district, london borough, council area, etc.). Postcode areas often span multiple states (e.g.
ENcovers Enfield London Borough + Hertfordshire), so a 1:1 area→state map would be misleading. Future PRs can layer postcode-district-level FK (~3,000 districts) once we want finer granularity.This matches the country-only pattern already used for SE (Sweden) and SI (Slovenia).
Regex fix
Old regex required full postcode form. Updated to accept all granularities:
Validates
M(area),M1(district),M1A(district),M1 1AA(full),SW1A 1AA(full),GIR 0AA(Girobank).Sample rows
Test plan
python3 -m py_compile bin/scripts/sync/import_uk_postcodes.pystate_id); follows SE/SI patternid,created_at,updated_at,flag)🤖 Generated with Claude Code