feat(postcodes/CA): 1,645 Canada FSAs (#1039)#1502
Conversation
Adds Canada Post Forward Sortation Areas (3-character postcode prefixes) sourced from Statistics Canada via the inkjet/pypostalcode MIT-licensed mirror. Why --- Closes the CA gap on issue #1039. Canada Post's bulk PAF feed (~870k full 6-char codes) is paywalled. inkjet's StatsCan-derived 1,645 FSA list is the cleanest publicly redistributable source. Coverage -------- - 1,645 FSAs / 100% state FK - All 13 Canadian provinces and territories covered - Each row carries FSA centroid lat/lng + StatsCan English locality description State FK strategy ----------------- Direct province-name match + 2-entry alias map: source 'Northwest Territory' -> CSC 'Northwest Territories' source 'Nunavut Territory' -> CSC 'Nunavut' Regex fix --------- Before this PR, countries.json had CA regex requiring full 6-char postcodes (`@#@ #@#`). FSAs (3-char) would have been rejected. Updated to make the LDU portion optional: ^([ABCE...]\d[ABCE...])(?: ?(\d[ABCE...]\d))?$ Both 'T0A' and 'M5V 3A8' now validate. License ------- inkjet/pypostalcode is MIT. Upstream is Statistics Canada (StatsCan Open Licence β free redistribution permitted with attribution). Each row carries `source: "statscan-fsa-via-inkjet"`. Future ------ The full ~870k 6-character postcode list is available via ccnixon/postalcodes (no formal license, geocoder.ca-derived). It would generate a ~150 MB JSON which exceeds the in-band cities size envelope (PT.json at 38 MB is current largest). Deferred to a future #1039 PR using the gz-to-Releases pattern (#1374) if needed. Validation ---------- - python3 -m py_compile passes - 100% regex match against updated CA regex - 100% state_id valid + state.country_id == 39 + state_code agrees - No auto-managed fields (id, created_at, updated_at, flag) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CSC Validation ReportPR Format
Labels applied:
|
Weekly data-quality review (2026-05-04)Verdict: needs-fix Checks
Note on CI "needs-changes" labelSame false-positive issue as the other postcode PRs: validator checks all 250 existing countries in π€ Automated weekly review β Claude (sonnet-4-6). Generated by Claude Code |
H0H is a real Canadian FSA reserved by Canada Post for letters to Santa Claus at the North Pole. The source dataset has no real lat/lng for it; the importer fell through to (90, 0) which trips the geo- bounds validator. Null the coordinates instead of dropping the row, so the FSA stays queryable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Source
inkjet/pypostalcodeβ MIT-licensedca_postalcodes.csv(~100 KB, 1,645 rows)Why FSA-only (not full 6-char)
Canada Post's bulk PAF feed (~870k full postcodes) is paywalled. Available public mirrors are either GeoNames-derived (excluded by maintainer instruction) or unlicensed scrapes of geocoder.ca. inkjet's StatsCan-sourced FSA list is the cleanest publicly redistributable Canada postcode data.
The full 6-char list can be added in a future PR via
ccnixon/postalcodes(~889k codes, no formal license, geocoder.ca-derived). That would produce a ~150 MB JSON which exceeds the in-band cities size envelope (PT at 38 MB is current largest), so it would need the gz-to-Releases pattern (#1374) β deferred for now.Regex fix
Old regex required full 6-char postcodes β FSAs would have been rejected:
New regex makes the LDU portion optional:
Both
T0AandM5V 3A8now validate.State FK strategy
Direct province-name match + 2-entry alias map for
'Northwest Territory'β CSC'Northwest Territories'and'Nunavut Territory'β CSC'Nunavut'.Distribution
Test plan
python3 -m py_compile bin/scripts/sync/import_canada_postcodes.pyid,created_at,updated_at,flag)π€ Generated with Claude Code