feat(postcodes): add postcodes table and infrastructure (#1039)#1398
Conversation
Introduces a separate `postcodes` table (Tier 4 architecture) for
storing postal codes as their own entity, FK'd to country (required)
and state/city (both nullable). Captures multi-postcode-per-state and
multi-state-per-postcode cases that flat-column shapes cannot.
This is a foundation-only PR. Country data lands in follow-up PRs
sourced from OpenPLZ (DACH), Wikidata (long tail), and per-country
official sources (US Census, India Post, Japan Post, La Poste,
Australia Post). GeoNames is deliberately not used.
Includes
- Phinx migration creating `postcodes` table with FKs and indexes
- Manual mirror in bin/db/schema.sql for review readability
- contributions/postcodes/ directory with field-shape README
- import_postcodes() in bin/scripts/sync/import_json_to_mysql.py;
gracefully no-ops when table or directory is absent
- Validator: postcodes entity recognised in .github/scripts/utils.js
- Cross-reference validator checks country_id, state_id, and matches
postcode against countries.postal_code_regex when defined
- ADR in .github/fixes-docs/FIX_1039_POSTCODES_TABLE.md covering the
Shape A/B/C/D decision, sourcing plan, and roll-out sequence
Out of scope (deferred to follow-up PRs):
- Country data files (no contributions/postcodes/{ISO2}.json yet)
- Export commands (Csv/Json/MongoDB/Plist/SqlServer/Xml/Yaml) β must
be updated to emit the postcodes table
- sync_mysql_to_json.py (reverse sync)
- Coordinate validator and duplicate detector for postcodes
Refs: #1039
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CSC Validation ReportPR Format
Schema ValidationSource URL Verificationβ 1 source URL(s) accessible β All checks passed | Status: Ready for review |
There was a problem hiding this comment.
Pull request overview
Adds foundational infrastructure for treating postcodes as a first-class entity (Tier 4) in the CSC database, including schema, importer support, and contribution/validation scaffolding.
Changes:
- Introduces a new
postcodestable via Phinx migration and a mirrored definition inbin/db/schema.sql. - Adds
contributions/postcodes/documentation and extends the MySQL importer to load per-country postcode JSON files when present. - Extends GitHub validation scripts to recognize and cross-check
postcodescontributions.
Reviewed changes
Copilot reviewed 4 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| contributions/postcodes/README.md | Documents the per-country postcode contribution JSON shape and sourcing plan. |
| bin/scripts/sync/import_json_to_mysql.py | Adds import_postcodes() and integrates it into the import sequence and summary output. |
| bin/db/schema.sql | Adds postcodes to schema drop order and defines the new postcodes table. |
| bin/db/migrations/20260425000000_create_postcodes_table.php | Adds Phinx migration to create postcodes table with FKs and indexes. |
| .github/scripts/validate-cross-reference.js | Adds cross-reference validation for postcodes (country/state + regex validation). |
| .github/scripts/utils.js | Adds postcodes entity schema/rules and entity-type detection. |
| .github/fixes-docs/FIX_1039_POSTCODES_TABLE.md | Adds an ADR-style doc describing the Tier 4 design and rollout plan. |
| @@ -0,0 +1,129 @@ | |||
| # FIX #1039 β `postcodes` Table (Tier 4 Architecture) | |||
|
|
|||
| **Issue:** [#1039 β Can we add a postcode for this?](https://github.com/dr5hn/the-countries-states-cities-database/issues/1039) | |||
There was a problem hiding this comment.
The issue link points to dr5hn/the-countries-states-cities-database, but this PR is in dr5hn/countries-states-cities-database. Update the URL so it points to the correct repositoryβs issue #1039.
| **Issue:** [#1039 β Can we add a postcode for this?](https://github.com/dr5hn/the-countries-states-cities-database/issues/1039) | |
| **Issue:** [#1039 β Can we add a postcode for this?](https://github.com/dr5hn/countries-states-cities-database/issues/1039) |
| if (entityType === 'postcodes') { | ||
| // Validate country_id exists (required FK) | ||
| if (record.country_id) { | ||
| const country = countryById.get(Number(record.country_id)); | ||
| if (!country) { | ||
| errors.push(`${prefix}: country_id ${record.country_id} does not exist`); | ||
| } else { | ||
| validCount++; | ||
| if (record.country_code && country.iso2) { | ||
| if (record.country_code.toUpperCase() !== country.iso2.toUpperCase()) { | ||
| errors.push( | ||
| `${prefix}: country_code "${record.country_code}" does not match country_id ${record.country_id} (expected "${country.iso2}")` | ||
| ); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Validate state_id exists if provided (optional FK) | ||
| if (record.state_id != null && states) { | ||
| const state = stateById.get(Number(record.state_id)); | ||
| if (!state) { | ||
| errors.push(`${prefix}: state_id ${record.state_id} does not exist`); | ||
| } else { | ||
| validCount++; | ||
| if (record.country_id && Number(state.country_id) !== Number(record.country_id)) { | ||
| errors.push( | ||
| `${prefix}: state_id ${record.state_id} ("${state.name}") belongs to country_id ${state.country_id}, not ${record.country_id}` | ||
| ); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Validate postcode format against country regex if defined | ||
| if (record.code && record.country_id) { | ||
| const country = countryById.get(Number(record.country_id)); | ||
| if (country && country.postal_code_regex) { | ||
| try { | ||
| const re = new RegExp(country.postal_code_regex); | ||
| if (!re.test(record.code)) { | ||
| errors.push( | ||
| `${prefix}: code "${record.code}" does not match postal_code_regex "${country.postal_code_regex}" of ${country.iso2}` | ||
| ); | ||
| } | ||
| } catch (e) { | ||
| // Invalid regex on the country side β skip silently rather than blocking PR | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The postcodes cross-reference validation checks country_id and state_id, but it never validates city_id even though postcodes.city_id is a foreign key. This can allow contributions with a non-existent city_id to pass CI and only fail later at import time. Add a city_id existence check (and ideally verify the city belongs to the declared country_id/state_id when present).
Findings flagged by copilot-pull-request-reviewer (post-merge cleanup): FIX_1039_SUMMARY.md (#1391): - "three groups" β "four groups" (sections AβD) - Section B count "~5" β "(4)"; Section C "~3" β "(2)" (match enumeration) - Source-of-updates: VC moved out of "British Overseas Territories" bucket β Saint Vincent and the Grenadines is a sovereign state, not a BOT; described separately as "national prefixed-code convention" - Issue URL: dropped erroneous `the-` prefix FIX_1039_POSTCODES_TABLE.md (#1398): - Issue URL: dropped erroneous `the-` prefix validate-cross-reference.js (#1398): - Add city_id existence check for postcodes (real CI gap β bad city_id refs would have passed validation and only failed at MySQL import) - Verify city.country_id matches declared country_id and city.state_id matches declared state_id when provided - Cities lazy-loaded per country (cached) to keep CI memory bounded Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦1403) Completes the deferred export-pipeline work from #1398. With the postcodes table now landed, every export command and the workflow itself learns to emit postcode data alongside the existing 5 tables. PHP commands (Symfony Console) - ExportJson: SELECT from postcodes (graceful skip if table missing), emit /json/postcodes.json - ExportCsv: add 'postcodes' to FILES; guard empty arrays so empty source files no longer crash on $csc[0] access - ExportXml / ExportYaml: add 'postcodes' to FILES; replace fragile ?: throw on empty arrays with explicit is_array() check - ExportSqlServer: add 'postcodes' to TABLES with full CREATE TABLE schema (FKs to countries/states/cities, nullable state/city) - ExportMongoDB: add 'postcodes' to COLLECTIONS plus processPostcodes() with country/state DBRef references and GeoJSON Point location Python helpers - export_plist.py: include postcodes.csv with missing-file guard so the script no-ops cleanly until first country PR lands - sync_mysql_to_json.py: new sync_postcodes() per-country file writer mirroring sync_cities; export_schema includes postcodes when present Workflow - postcode_count env var (graceful 0 if table absent) - mysqldump postcodes -> sql/postcodes.sql - pg_dump postcodes -> psql/postcodes.sql - mysql2sqlite postcodes -> sqlite/postcodes.sqlite3 - mongoimport gated on non-empty postcodes.json - gzip postcodes.sql in sql/ and psql/ when present - POSTCODE_COUNT exposed to Release body and PR body Behaviour with empty postcodes table - Importer/JSON/CSV/XML/YAML produce empty postcodes.json (or skip in CSV's case) without erroring - mongoimport skipped via jq length check - mysqldump still emits the (empty) DDL, so consumers can rely on the table existing in every export format Refs: #1039 Builds on: #1398 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the 13 Liechtenstein postal codes (range 9485-9498) covering the country's 11 municipalities. First country populated against the postcodes table introduced in #1398. Proves the file shape and validator path end-to-end. Mapping (state_code -> code -> locality): 01 Balzers -> 9496 Balzers 02 Eschen -> 9485 Nendeln, 9492 Eschen 03 Gamprin -> 9487 Bendern 04 Mauren -> 9486 Schaanwald, 9493 Mauren 05 Planken -> 9498 Planken 06 Ruggell -> 9491 Ruggell 07 Schaan -> 9494 Schaan 08 Schellenberg -> 9488 Schellenberg 09 Triesen -> 9495 Triesen 10 Triesenberg -> 9497 Triesenberg 11 Vaduz -> 9490 Vaduz source: "manual" β composed from common knowledge (LI postcodes are universally documented and stable); next country PRs will switch to automated pipelines (OpenPLZ, Wikidata, Census, etc.) once their adapters land. Validated: - All 13 records pass schema rules in .github/scripts/utils.js - All country_id/state_id FKs resolve - All state_code values match the corresponding state.iso2 - All codes match countries.postal_code_regex for LI (^(\\d{4})$) Refs: #1039 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦ (#1414) The Marshall Islands postal_code_regex required the +4 extension ('-####') to be present: Before: ^969\d{2}(-\d{4})$ After: ^969\d{2}(?:-\d{4})?$ This was inconsistent with how the same #####-#### format is encoded for VI (Virgin Islands) and PR (Puerto Rico), where the extension is optional. It also blocked legitimate 5-digit MH ZIPs (96960 Majuro, 96970 Ebeye) from passing the cross-reference validator's regex check introduced in #1398. Two changes in one regex: 1. Added '?' after the extension group β makes it optional 2. Changed '(...)' β '(?:...)' β non-capturing group, matching the convention used in VI/PR Validated: - 96960 (Majuro), 96970 (Ebeye) now match - 96960-1234 (full +4) still matches - Invalid codes (96960-12, 123456, 969XX) correctly rejected Refs: #1039 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) Adds the importer + first run for Italy. Uses the matteocontrini/comuni-json mirror of Istat's official Italian commune list with postcodes (CAP). 1. bin/scripts/sync/import_italy_postcodes.py β pipeline reading the community-maintained UTF-8 JSON. Each commune has a cap[] array (large cities like Rome have 80+ CAPs); pipeline expands one row per (cap, commune) and picks first commune alphabetically as canonical per code. State resolution is direct sigla -> state.iso2 match (RM=Rome, MI=Milan, etc.) with one alias bridge: Aosta uses sigla 'AO' but states.json has it as the 'Aosta Valley' autonomous region with iso2 '23'. 2. contributions/postcodes/IT.json β 4,678 unique CAPs covering all 7,904 comuni with 100% state_id resolution. Multi-CAP cities - Rome: 82 CAPs - Venice: 56 - Messina: 48 - Genoa: 47 - Milan: 42 - Each CAP gets one record pointing to the canonical commune name; this matches the Tier-4 "one row per code" contract from #1398. Validation (zero errors across 4,678 records) - All codes match countries.postal_code_regex (^(\\d{5})\$) - All FKs resolve, all state_codes agree with state.iso2 - No auto-managed fields present License & attribution - Upstream: Istat (CC-BY 3.0) - Mirror: github.com/matteocontrini/comuni-json - Each row: source: "istat" Refs: #1039 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Introduces a separate
postcodestable (Tier 4 architecture) per the discussion on #1039 β postcodes as their own entity, FK'd to country (required) and state/city (both nullable). Captures multi-postcode-per-state and multi-state-per-postcode realities that no flat-column shape can express losslessly.Foundation-only PR. Country data lands in follow-up PRs sourced from license-clean providers; GeoNames is deliberately excluded.
Companion to #1391
PR #1391 backfills country-level
postal_code_format/postal_code_regex(Tier 1, 12 countries). This PR is the Tier 4 superset infrastructure. They are independent; either can land first.Schema
What's in this PR
bin/db/migrations/20260425000000_create_postcodes_table.php)bin/db/schema.sqlcontributions/postcodes/directory with field-shape README and sourcing planimport_postcodes()added tobin/scripts/sync/import_json_to_mysql.pyβ no-ops gracefully when table or files absentpostcodes) in.github/scripts/utils.jswith required + optional + rulescountry_idandcountry_codeagreement,state_idbelongs to declared country,codevalidated againstcountries.postal_code_regex.github/fixes-docs/FIX_1039_POSTCODES_TABLE.mdcovering Shape A/B/C/D decision, sourcing plan (Combo B, GeoNames-free), and roll-out sequenceWhat's NOT in this PR (deferred)
contributions/postcodes/{ISO2}.jsonyet. Each country = follow-up PR.bin/Commands/Export*.php(Csv/Json/MongoDB/Plist/SqlServer/Xml/Yaml) still emit only the existing 5 tables. Adding postcodes to all 7 formats is a separate PR β mechanical but touches every format.sync_mysql_to_json.pyreverse sync.validate-coordinates.jsanddetect-duplicates.jspostcode-awareness β coordinates on postcodes are coarse centroids and duplicate semantics differ.Sourcing Plan (Combo B)
sourcecolumn on each row tracks attribution per record. Coverage projection: ~30β40% of world postcodes by row count (UK Royal Mail, Eircode, Deutsche Post are license-blocked, not effort-blocked).Test plan
php -l)python3 -m py_compile).github/scripts/utils.jsrequires without error;validate-cross-reference.jsparsesimport_postcodes()no-ops when table absent (tested logic path manually)worlddatabase (requires reviewer with MySQL)Roll-out sequence (suggested)
.gzdistribution per recent [Bug]: cities.csv and translations.csv.gz been removed from repositoryΒ #1374 pattern)Rollback
Migration adds one new table with no modifications to existing tables. Rollback is
DROP TABLE postcodes;plus deletingcontributions/postcodes/.Refs: #1039
π€ Generated with Claude Code