feat(postcodes/TW): 371 Chunghwa Post codes (#1039)#1501
Conversation
Adds Taiwan's 3-digit postal-area codes from the eagle-tw-open-data mirror (Chunghwa Post / Ministry of Interior open data). Why --- Closes the TW gap on issue #1039. The 3-digit area codes are the historic Chunghwa Post format and remain the most-cited form across Taiwan government open data. Coverage -------- - 371 codes / 100% state FK resolution - All 22 CSC TW states covered State FK strategy ----------------- Source has Chinese (city+district) and English (district + city/county) labels. Importer parses the trailing 'X City' / 'X County' from the English column and resolves via 22-entry ENGLISH_TO_ISO2. Edge cases ---------- - Source typo 'Taoyuan City City' (1 row) -> TAO - Disputed islands without CSC iso2 entry mapped via SPECIAL_LABEL_TO_ISO2: 'Diaoyutai' (Senkaku/Diaoyu) -> ILA (ROC administers under Yilan) 'Dongsha Islands, Nanhai Islands' -> KHH (Pratas, under Kaohsiung) 'Nansha Islands, Nanhai Islands' -> KHH (Spratlys, under Kaohsiung) Encoding -------- Source ships **CP950 / Big5** β UTF-8 read produces mojibake. Importer explicitly decodes as cp950. Regex fix --------- Before this PR, countries.json had TW regex `^\d{5}$` (5-digit) which never matched the dataset's 3-digit codes (or Chunghwa's modern 6-digit 3+3 codes). Updated to `^\d{3}(\d{2,3})?$` to accept all three generations: 3-digit (this dataset), 5-digit (intermediate), 6-digit (2020+ canonical). License ------- GPL-3.0 (unusual for data; redistribution permitted with attribution; flagged here per #1039 license-tier policy). Each row carries `source: "chunghwa-post-via-eagle-tw-open-data"`. Validation ---------- - python3 -m py_compile passes - 100% regex match (^\d{3}(\d{2,3})?$) - 100% state_id valid + state.country_id == 216 + state_code agrees - No auto-managed fields (id, created_at, updated_at, flag) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CSC Validation ReportPR Format
Labels applied:
|
Weekly data-quality review (2026-05-04)Verdict: needs-discussion Checks
Discussion item
The regex itself ( Note on CI "needs-changes" labelSame false-positive issue as the other postcode PRs: validator checks all 250 existing countries in π€ Automated weekly review β Claude (sonnet-4-6). Generated by Claude Code |
|
Merging as-is. The |
Summary
^\d{5}$β^\d{3}(\d{2,3})?$to accept the three Taiwan postcode generationsSource
flying-itmen-eagle/eagle-tw-open-dataβ community mirror of Taiwan government open data feeds (GPL-3 β flagged per Can we add a postcode for this?Β #1039 license-tier policy)taiwan_postal_code_information.csv(CP950 / Big5 β UTF-8 read produces mojibake)Regex fix
Taiwan has used three postcode generations:
10010001100002This dataset is the 3-digit form. Old regex would have failed all three modern forms.
Edge cases handled
'Taoyuan City City''Diaoyutai''Dongsha Islands, Nanhai Islands''Nansha Islands, Nanhai Islands'Distribution (top 5)
All 22 CSC TW states covered.
Test plan
python3 -m py_compile bin/scripts/sync/import_taiwan_postcodes.py^\d{3}(\d{2,3})?$id,created_at,updated_at,flag)π€ Generated with Claude Code