Skip to content

feat(postcodes): add postcodes table and infrastructure (#1039)#1398

Merged
dr5hn merged 1 commit into
masterfrom
feat/issue-1039-postcodes-table
Apr 25, 2026
Merged

feat(postcodes): add postcodes table and infrastructure (#1039)#1398
dr5hn merged 1 commit into
masterfrom
feat/issue-1039-postcodes-table

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented Apr 25, 2026

Summary

Introduces a separate postcodes table (Tier 4 architecture) per the discussion on #1039 β€” postcodes as their own entity, FK'd to country (required) and state/city (both nullable). Captures multi-postcode-per-state and multi-state-per-postcode realities that no flat-column shape can express losslessly.

Foundation-only PR. Country data lands in follow-up PRs sourced from license-clean providers; GeoNames is deliberately excluded.

Companion to #1391

PR #1391 backfills country-level postal_code_format / postal_code_regex (Tier 1, 12 countries). This PR is the Tier 4 superset infrastructure. They are independent; either can land first.

Schema

CREATE TABLE `postcodes` (
  `id`            int unsigned NOT NULL AUTO_INCREMENT,
  `code`          varchar(20) NOT NULL,
  `country_id`    mediumint unsigned NOT NULL,         -- FK countries.id
  `country_code`  char(2) NOT NULL,
  `state_id`      mediumint unsigned NULL,             -- FK states.id (nullable)
  `state_code`    varchar(255) NULL,
  `city_id`       mediumint unsigned NULL,             -- FK cities.id (nullable)
  `locality_name` varchar(255) NULL,
  `type`          varchar(32) NULL,                    -- full | outward | sector | district | area
  `latitude`      decimal(10,8) NULL,
  `longitude`     decimal(11,8) NULL,
  `source`        varchar(64) NULL,                    -- openplz | wikidata | census | ...
  `wikiDataId`    varchar(255) NULL,
  `created_at`    timestamp DEFAULT '2014-01-01 12:01:01',
  `updated_at`    timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `flag`          tinyint(1) DEFAULT '1',
  /* + FKs + indexes */
);

What's in this PR

  • βœ… Phinx migration (bin/db/migrations/20260425000000_create_postcodes_table.php)
  • βœ… Manual mirror in bin/db/schema.sql
  • βœ… contributions/postcodes/ directory with field-shape README and sourcing plan
  • βœ… import_postcodes() added to bin/scripts/sync/import_json_to_mysql.py β€” no-ops gracefully when table or files absent
  • βœ… Validator entity (postcodes) in .github/scripts/utils.js with required + optional + rules
  • βœ… Cross-reference validator: country_id and country_code agreement, state_id belongs to declared country, code validated against countries.postal_code_regex
  • βœ… ADR in .github/fixes-docs/FIX_1039_POSTCODES_TABLE.md covering Shape A/B/C/D decision, sourcing plan (Combo B, GeoNames-free), and roll-out sequence

What's NOT in this PR (deferred)

  • ❌ Country data β€” no contributions/postcodes/{ISO2}.json yet. Each country = follow-up PR.
  • ❌ Export commands β€” bin/Commands/Export*.php (Csv/Json/MongoDB/Plist/SqlServer/Xml/Yaml) still emit only the existing 5 tables. Adding postcodes to all 7 formats is a separate PR β€” mechanical but touches every format.
  • ❌ sync_mysql_to_json.py reverse sync.
  • ❌ validate-coordinates.js and detect-duplicates.js postcode-awareness β€” coordinates on postcodes are coarse centroids and duplicate semantics differ.

Sourcing Plan (Combo B)

Source License Countries
OpenPLZ API ODbL-1.0 ← matches repo DE, AT, CH, LI
Wikidata P281 CC-0 Long-tail backfill
US Census ZCTA Public domain US
India Post Open (gov.in) IN
Japan Post KEN_ALL Free JP
La Poste (data.gouv.fr) etalab-2.0 FR
Australia Post Boundaries CC-BY 4.0 AU
Statistics Canada FSA Open Government CA

source column on each row tracks attribution per record. Coverage projection: ~30–40% of world postcodes by row count (UK Royal Mail, Eircode, Deutsche Post are license-blocked, not effort-blocked).

Test plan

  • PHP migration lints clean (php -l)
  • Python importer compiles (python3 -m py_compile)
  • .github/scripts/utils.js requires without error; validate-cross-reference.js parses
  • import_postcodes() no-ops when table absent (tested logic path manually)
  • No existing files mutated except additive changes (schema.sql appendix, importer new method, validator new entity type)
  • Phinx migration runs cleanly against world database (requires reviewer with MySQL)
  • First country PR (Liechtenstein from OpenPLZ) opens cleanly against this branch

Roll-out sequence (suggested)

  1. This PR β€” foundation
  2. Liechtenstein (OpenPLZ, ~9 rows) β€” proves OpenPLZ adapter
  3. Switzerland, Austria, Germany (OpenPLZ wave)
  4. Iceland, Estonia, Luxembourg (small ones)
  5. Export commands PR (touches all 7 formats)
  6. India, France, Australia
  7. Japan, US (large; will likely need .gz distribution per recent [Bug]: cities.csv and translations.csv.gz been removed from repositoryΒ #1374 pattern)

Rollback

Migration adds one new table with no modifications to existing tables. Rollback is DROP TABLE postcodes; plus deleting contributions/postcodes/.

Refs: #1039

πŸ€– Generated with Claude Code

Introduces a separate `postcodes` table (Tier 4 architecture) for
storing postal codes as their own entity, FK'd to country (required)
and state/city (both nullable). Captures multi-postcode-per-state and
multi-state-per-postcode cases that flat-column shapes cannot.

This is a foundation-only PR. Country data lands in follow-up PRs
sourced from OpenPLZ (DACH), Wikidata (long tail), and per-country
official sources (US Census, India Post, Japan Post, La Poste,
Australia Post). GeoNames is deliberately not used.

Includes
- Phinx migration creating `postcodes` table with FKs and indexes
- Manual mirror in bin/db/schema.sql for review readability
- contributions/postcodes/ directory with field-shape README
- import_postcodes() in bin/scripts/sync/import_json_to_mysql.py;
  gracefully no-ops when table or directory is absent
- Validator: postcodes entity recognised in .github/scripts/utils.js
- Cross-reference validator checks country_id, state_id, and matches
  postcode against countries.postal_code_regex when defined
- ADR in .github/fixes-docs/FIX_1039_POSTCODES_TABLE.md covering the
  Shape A/B/C/D decision, sourcing plan, and roll-out sequence

Out of scope (deferred to follow-up PRs):
- Country data files (no contributions/postcodes/{ISO2}.json yet)
- Export commands (Csv/Json/MongoDB/Plist/SqlServer/Xml/Yaml) β€” must
  be updated to emit the postcodes table
- sync_mysql_to_json.py (reverse sync)
- Coordinate validator and duplicate detector for postcodes

Refs: #1039

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 25, 2026 14:19
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

CSC Validation Report

PR Format

  • βœ… Description provided
  • βœ… Data source linked
  • βœ… Issue linked (recommended for data changes)
  • βœ… Justification / context provided

Schema Validation

Source URL Verification

βœ… 1 source URL(s) accessible


βœ… All checks passed | Status: Ready for review

@dosubot dosubot Bot added the enhancement New feature or request label Apr 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds foundational infrastructure for treating postcodes as a first-class entity (Tier 4) in the CSC database, including schema, importer support, and contribution/validation scaffolding.

Changes:

  • Introduces a new postcodes table via Phinx migration and a mirrored definition in bin/db/schema.sql.
  • Adds contributions/postcodes/ documentation and extends the MySQL importer to load per-country postcode JSON files when present.
  • Extends GitHub validation scripts to recognize and cross-check postcodes contributions.

Reviewed changes

Copilot reviewed 4 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
contributions/postcodes/README.md Documents the per-country postcode contribution JSON shape and sourcing plan.
bin/scripts/sync/import_json_to_mysql.py Adds import_postcodes() and integrates it into the import sequence and summary output.
bin/db/schema.sql Adds postcodes to schema drop order and defines the new postcodes table.
bin/db/migrations/20260425000000_create_postcodes_table.php Adds Phinx migration to create postcodes table with FKs and indexes.
.github/scripts/validate-cross-reference.js Adds cross-reference validation for postcodes (country/state + regex validation).
.github/scripts/utils.js Adds postcodes entity schema/rules and entity-type detection.
.github/fixes-docs/FIX_1039_POSTCODES_TABLE.md Adds an ADR-style doc describing the Tier 4 design and rollout plan.

@@ -0,0 +1,129 @@
# FIX #1039 β€” `postcodes` Table (Tier 4 Architecture)

**Issue:** [#1039 β€” Can we add a postcode for this?](https://github.com/dr5hn/the-countries-states-cities-database/issues/1039)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue link points to dr5hn/the-countries-states-cities-database, but this PR is in dr5hn/countries-states-cities-database. Update the URL so it points to the correct repository’s issue #1039.

Suggested change
**Issue:** [#1039 β€” Can we add a postcode for this?](https://github.com/dr5hn/the-countries-states-cities-database/issues/1039)
**Issue:** [#1039 β€” Can we add a postcode for this?](https://github.com/dr5hn/countries-states-cities-database/issues/1039)

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +199
if (entityType === 'postcodes') {
// Validate country_id exists (required FK)
if (record.country_id) {
const country = countryById.get(Number(record.country_id));
if (!country) {
errors.push(`${prefix}: country_id ${record.country_id} does not exist`);
} else {
validCount++;
if (record.country_code && country.iso2) {
if (record.country_code.toUpperCase() !== country.iso2.toUpperCase()) {
errors.push(
`${prefix}: country_code "${record.country_code}" does not match country_id ${record.country_id} (expected "${country.iso2}")`
);
}
}
}
}

// Validate state_id exists if provided (optional FK)
if (record.state_id != null && states) {
const state = stateById.get(Number(record.state_id));
if (!state) {
errors.push(`${prefix}: state_id ${record.state_id} does not exist`);
} else {
validCount++;
if (record.country_id && Number(state.country_id) !== Number(record.country_id)) {
errors.push(
`${prefix}: state_id ${record.state_id} ("${state.name}") belongs to country_id ${state.country_id}, not ${record.country_id}`
);
}
}
}

// Validate postcode format against country regex if defined
if (record.code && record.country_id) {
const country = countryById.get(Number(record.country_id));
if (country && country.postal_code_regex) {
try {
const re = new RegExp(country.postal_code_regex);
if (!re.test(record.code)) {
errors.push(
`${prefix}: code "${record.code}" does not match postal_code_regex "${country.postal_code_regex}" of ${country.iso2}`
);
}
} catch (e) {
// Invalid regex on the country side β€” skip silently rather than blocking PR
}
}
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The postcodes cross-reference validation checks country_id and state_id, but it never validates city_id even though postcodes.city_id is a foreign key. This can allow contributions with a non-existent city_id to pass CI and only fail later at import time. Add a city_id existence check (and ideally verify the city belongs to the declared country_id/state_id when present).

Copilot uses AI. Check for mistakes.
@dr5hn dr5hn merged commit 264cf9a into master Apr 25, 2026
5 checks passed
@dr5hn dr5hn deleted the feat/issue-1039-postcodes-table branch April 25, 2026 14:30
dr5hn added a commit that referenced this pull request Apr 25, 2026
Findings flagged by copilot-pull-request-reviewer (post-merge cleanup):

FIX_1039_SUMMARY.md (#1391):
- "three groups" β†’ "four groups" (sections A–D)
- Section B count "~5" β†’ "(4)"; Section C "~3" β†’ "(2)" (match enumeration)
- Source-of-updates: VC moved out of "British Overseas Territories"
  bucket β€” Saint Vincent and the Grenadines is a sovereign state, not
  a BOT; described separately as "national prefixed-code convention"
- Issue URL: dropped erroneous `the-` prefix

FIX_1039_POSTCODES_TABLE.md (#1398):
- Issue URL: dropped erroneous `the-` prefix

validate-cross-reference.js (#1398):
- Add city_id existence check for postcodes (real CI gap β€” bad city_id
  refs would have passed validation and only failed at MySQL import)
- Verify city.country_id matches declared country_id and city.state_id
  matches declared state_id when provided
- Cities lazy-loaded per country (cached) to keep CI memory bounded

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dr5hn added a commit that referenced this pull request Apr 25, 2026
…1403)

Completes the deferred export-pipeline work from #1398. With the
postcodes table now landed, every export command and the workflow
itself learns to emit postcode data alongside the existing 5 tables.

PHP commands (Symfony Console)
- ExportJson: SELECT from postcodes (graceful skip if table missing),
  emit /json/postcodes.json
- ExportCsv: add 'postcodes' to FILES; guard empty arrays so empty
  source files no longer crash on $csc[0] access
- ExportXml / ExportYaml: add 'postcodes' to FILES; replace fragile
  ?: throw on empty arrays with explicit is_array() check
- ExportSqlServer: add 'postcodes' to TABLES with full CREATE TABLE
  schema (FKs to countries/states/cities, nullable state/city)
- ExportMongoDB: add 'postcodes' to COLLECTIONS plus processPostcodes()
  with country/state DBRef references and GeoJSON Point location

Python helpers
- export_plist.py: include postcodes.csv with missing-file guard so
  the script no-ops cleanly until first country PR lands
- sync_mysql_to_json.py: new sync_postcodes() per-country file writer
  mirroring sync_cities; export_schema includes postcodes when present

Workflow
- postcode_count env var (graceful 0 if table absent)
- mysqldump postcodes -> sql/postcodes.sql
- pg_dump postcodes -> psql/postcodes.sql
- mysql2sqlite postcodes -> sqlite/postcodes.sqlite3
- mongoimport gated on non-empty postcodes.json
- gzip postcodes.sql in sql/ and psql/ when present
- POSTCODE_COUNT exposed to Release body and PR body

Behaviour with empty postcodes table
- Importer/JSON/CSV/XML/YAML produce empty postcodes.json (or skip in
  CSV's case) without erroring
- mongoimport skipped via jq length check
- mysqldump still emits the (empty) DDL, so consumers can rely on the
  table existing in every export format

Refs: #1039
Builds on: #1398

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dr5hn added a commit that referenced this pull request Apr 25, 2026
Adds the 13 Liechtenstein postal codes (range 9485-9498) covering the
country's 11 municipalities. First country populated against the
postcodes table introduced in #1398. Proves the file shape and
validator path end-to-end.

Mapping (state_code -> code -> locality):
  01 Balzers       -> 9496 Balzers
  02 Eschen        -> 9485 Nendeln, 9492 Eschen
  03 Gamprin       -> 9487 Bendern
  04 Mauren        -> 9486 Schaanwald, 9493 Mauren
  05 Planken       -> 9498 Planken
  06 Ruggell       -> 9491 Ruggell
  07 Schaan        -> 9494 Schaan
  08 Schellenberg  -> 9488 Schellenberg
  09 Triesen       -> 9495 Triesen
  10 Triesenberg   -> 9497 Triesenberg
  11 Vaduz         -> 9490 Vaduz

source: "manual" β€” composed from common knowledge (LI postcodes are
universally documented and stable); next country PRs will switch to
automated pipelines (OpenPLZ, Wikidata, Census, etc.) once their
adapters land.

Validated:
- All 13 records pass schema rules in .github/scripts/utils.js
- All country_id/state_id FKs resolve
- All state_code values match the corresponding state.iso2
- All codes match countries.postal_code_regex for LI (^(\\d{4})$)

Refs: #1039

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dr5hn added a commit that referenced this pull request Apr 25, 2026
… (#1414)

The Marshall Islands postal_code_regex required the +4 extension
('-####') to be present:

  Before:  ^969\d{2}(-\d{4})$
  After:   ^969\d{2}(?:-\d{4})?$

This was inconsistent with how the same #####-#### format is encoded
for VI (Virgin Islands) and PR (Puerto Rico), where the extension is
optional. It also blocked legitimate 5-digit MH ZIPs (96960 Majuro,
96970 Ebeye) from passing the cross-reference validator's regex check
introduced in #1398.

Two changes in one regex:
1. Added '?' after the extension group β†’ makes it optional
2. Changed '(...)' β†’ '(?:...)' β†’ non-capturing group, matching the
   convention used in VI/PR

Validated:
- 96960 (Majuro), 96970 (Ebeye) now match
- 96960-1234 (full +4) still matches
- Invalid codes (96960-12, 123456, 969XX) correctly rejected

Refs: #1039

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dr5hn added a commit that referenced this pull request Apr 27, 2026
)

Adds the importer + first run for Italy. Uses the matteocontrini/comuni-json
mirror of Istat's official Italian commune list with postcodes (CAP).

1. bin/scripts/sync/import_italy_postcodes.py β€” pipeline reading the
   community-maintained UTF-8 JSON. Each commune has a cap[] array (large
   cities like Rome have 80+ CAPs); pipeline expands one row per (cap,
   commune) and picks first commune alphabetically as canonical per code.
   State resolution is direct sigla -> state.iso2 match (RM=Rome, MI=Milan,
   etc.) with one alias bridge: Aosta uses sigla 'AO' but states.json has
   it as the 'Aosta Valley' autonomous region with iso2 '23'.

2. contributions/postcodes/IT.json β€” 4,678 unique CAPs covering all 7,904
   comuni with 100% state_id resolution.

Multi-CAP cities
- Rome: 82 CAPs
- Venice: 56
- Messina: 48
- Genoa: 47
- Milan: 42
- Each CAP gets one record pointing to the canonical commune name; this
  matches the Tier-4 "one row per code" contract from #1398.

Validation (zero errors across 4,678 records)
- All codes match countries.postal_code_regex (^(\\d{5})\$)
- All FKs resolve, all state_codes agree with state.iso2
- No auto-managed fields present

License & attribution
- Upstream: Istat (CC-BY 3.0)
- Mirror: github.com/matteocontrini/comuni-json
- Each row: source: "istat"

Refs: #1039

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants