Skip to content

feat(postcodes/CA): 1,645 Canada FSAs (#1039)#1502

Merged
dr5hn merged 2 commits into
masterfrom
feat/postcodes-canada
May 5, 2026
Merged

feat(postcodes/CA): 1,645 Canada FSAs (#1039)#1502
dr5hn merged 2 commits into
masterfrom
feat/postcodes-canada

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented May 2, 2026

Summary

  • Imports Canada Post 3-character Forward Sortation Areas (1,645 FSAs) for Can we add a postcode for this?Β #1039
  • 100% state FK resolution across all 13 provinces and territories
  • Each row carries FSA centroid lat/lng + StatsCan English locality description
  • Fixes CA regex to make the LDU portion optional (FSA-only also validates)

Source

  • inkjet/pypostalcode β€” MIT-licensed
  • Upstream: Statistics Canada Forward Sortation Area Boundary File, 2011 Census (StatsCan Open Licence, free redistribution)
  • File: ca_postalcodes.csv (~100 KB, 1,645 rows)

Why FSA-only (not full 6-char)

Canada Post's bulk PAF feed (~870k full postcodes) is paywalled. Available public mirrors are either GeoNames-derived (excluded by maintainer instruction) or unlicensed scrapes of geocoder.ca. inkjet's StatsCan-sourced FSA list is the cleanest publicly redistributable Canada postcode data.

The full 6-char list can be added in a future PR via ccnixon/postalcodes (~889k codes, no formal license, geocoder.ca-derived). That would produce a ~150 MB JSON which exceeds the in-band cities size envelope (PT at 38 MB is current largest), so it would need the gz-to-Releases pattern (#1374) β€” deferred for now.

Regex fix

Old regex required full 6-char postcodes β€” FSAs would have been rejected:

^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ]) ?(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$

New regex makes the LDU portion optional:

^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ])(?: ?(\d[ABCEGHJKLMNPRSTVWXYZ]\d))?$

Both T0A and M5V 3A8 now validate.

State FK strategy

Direct province-name match + 2-entry alias map for 'Northwest Territory' β†’ CSC 'Northwest Territories' and 'Nunavut Territory' β†’ CSC 'Nunavut'.

Distribution

iso2 province rows
ON Ontario 525
QC Quebec 421
BC British Columbia 191
AB Alberta 154
NB New Brunswick 111
NS Nova Scotia 77
MB Manitoba 66
SK Saskatchewan 49
NL Newfoundland and Labrador 35
PE Prince Edward Island 7
NU Nunavut 3
NT Northwest Territories 3
YT Yukon 3

Test plan

  • python3 -m py_compile bin/scripts/sync/import_canada_postcodes.py
  • All 1,645 codes match updated CA regex
  • 100% state_id valid; state.country_id == 39; state_code == state.iso2
  • No auto-managed fields (id, created_at, updated_at, flag)
  • Idempotent merge (re-run produces no diff)

πŸ€– Generated with Claude Code

Adds Canada Post Forward Sortation Areas (3-character postcode
prefixes) sourced from Statistics Canada via the inkjet/pypostalcode
MIT-licensed mirror.

Why
---
Closes the CA gap on issue #1039. Canada Post's bulk PAF feed
(~870k full 6-char codes) is paywalled. inkjet's StatsCan-derived
1,645 FSA list is the cleanest publicly redistributable source.

Coverage
--------
- 1,645 FSAs / 100% state FK
- All 13 Canadian provinces and territories covered
- Each row carries FSA centroid lat/lng + StatsCan English locality
  description

State FK strategy
-----------------
Direct province-name match + 2-entry alias map:
  source 'Northwest Territory' -> CSC 'Northwest Territories'
  source 'Nunavut Territory'   -> CSC 'Nunavut'

Regex fix
---------
Before this PR, countries.json had CA regex requiring full 6-char
postcodes (`@#@ #@#`). FSAs (3-char) would have been rejected. Updated
to make the LDU portion optional:
  ^([ABCE...]\d[ABCE...])(?: ?(\d[ABCE...]\d))?$

Both 'T0A' and 'M5V 3A8' now validate.

License
-------
inkjet/pypostalcode is MIT. Upstream is Statistics Canada (StatsCan
Open Licence β€” free redistribution permitted with attribution).
Each row carries `source: "statscan-fsa-via-inkjet"`.

Future
------
The full ~870k 6-character postcode list is available via
ccnixon/postalcodes (no formal license, geocoder.ca-derived). It
would generate a ~150 MB JSON which exceeds the in-band cities
size envelope (PT.json at 38 MB is current largest). Deferred to a
future #1039 PR using the gz-to-Releases pattern (#1374) if needed.

Validation
----------
- python3 -m py_compile passes
- 100% regex match against updated CA regex
- 100% state_id valid + state.country_id == 39 + state_code agrees
- No auto-managed fields (id, created_at, updated_at, flag)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. enhancement New feature or request labels May 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

CSC Validation Report

PR Format

  • βœ… Description provided
  • βœ… Data source linked
  • βœ… Issue linked (recommended for data changes)
  • βœ… Justification / context provided

Labels applied: data:countries, data:postcodes, large-contribution

⚠️ Large Contribution

This PR contains 1895 records. Large contributions require manual review.

Schema Validation (1895 records)

Errors (blocking):

  • ❌ contributions/countries/countries.json: Record 1 ("Afghanistan"): "id" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 1 ("Afghanistan"): "created_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 1 ("Afghanistan"): "updated_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 1 ("Afghanistan"): "flag" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 2 ("Aland Islands"): "id" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 2 ("Aland Islands"): "created_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 2 ("Aland Islands"): "updated_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 2 ("Aland Islands"): "flag" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 3 ("Albania"): "id" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 3 ("Albania"): "created_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 3 ("Albania"): "updated_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 3 ("Albania"): "flag" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 4 ("Algeria"): "id" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 4 ("Algeria"): "created_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 4 ("Algeria"): "updated_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 4 ("Algeria"): "flag" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 5 ("American Samoa"): "id" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 5 ("American Samoa"): "created_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 5 ("American Samoa"): "updated_at" must not be included (auto-managed)
  • ❌ contributions/countries/countries.json: Record 5 ("American Samoa"): "flag" must not be included (auto-managed)
  • ...and 980 more errors

Warnings:

  • ⚠️ contributions/countries/countries.json: Record 1 ("Afghanistan"): unknown field "population"
  • ⚠️ contributions/countries/countries.json: Record 1 ("Afghanistan"): unknown field "gdp"
  • ⚠️ contributions/countries/countries.json: Record 1 ("Afghanistan"): unknown field "area_sq_km"
  • ⚠️ contributions/countries/countries.json: Record 1 ("Afghanistan"): unknown field "postal_code_format"
  • ⚠️ contributions/countries/countries.json: Record 1 ("Afghanistan"): unknown field "postal_code_regex"
  • ⚠️ contributions/countries/countries.json: Record 2 ("Aland Islands"): unknown field "population"
  • ⚠️ contributions/countries/countries.json: Record 2 ("Aland Islands"): unknown field "gdp"
  • ⚠️ contributions/countries/countries.json: Record 2 ("Aland Islands"): unknown field "area_sq_km"
  • ⚠️ contributions/countries/countries.json: Record 2 ("Aland Islands"): unknown field "postal_code_format"
  • ⚠️ contributions/countries/countries.json: Record 2 ("Aland Islands"): unknown field "postal_code_regex"
  • ...and 1240 more warnings

Cross-Reference Validation

βœ… 3290 reference(s) verified

Geo-Bounds Check

βœ… All 1644 coordinate(s) within expected country bounds

Duplicate Detection

  • ⚠️ contributions/countries/countries.json: Record 1 ("Afghanistan") appears to be a duplicate of existing "Afghanistan" (id: 1, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 2 ("Aland Islands") appears to be a duplicate of existing "Aland Islands" (id: 2, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 3 ("Albania") appears to be a duplicate of existing "Albania" (id: 3, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 4 ("Algeria") appears to be a duplicate of existing "Algeria" (id: 4, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 5 ("American Samoa") appears to be a duplicate of existing "American Samoa" (id: 5, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 6 ("Andorra") appears to be a duplicate of existing "Andorra" (id: 6, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 7 ("Angola") appears to be a duplicate of existing "Angola" (id: 7, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 8 ("Anguilla") appears to be a duplicate of existing "Anguilla" (id: 8, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 9 ("Antarctica") appears to be a duplicate of existing "Antarctica" (id: 9, distance: 0.0km)
  • ⚠️ contributions/countries/countries.json: Record 10 ("Antigua and Barbuda") appears to be a duplicate of existing "Antigua and Barbuda" (id: 10, distance: 0.0km)

Source URL Verification

βœ… 2 source URL(s) accessible


❌ 1000 error(s), 1500 warning(s) | Status: Changes required

Please fix the errors above and push a new commit. Refer to our Contribution Guidelines for details.

Copy link
Copy Markdown
Owner Author

dr5hn commented May 4, 2026

Weekly data-quality review (2026-05-04)

Verdict: needs-fix

Checks

  • Schema: βœ… Change to contributions/countries/countries.json makes the CA regex LDU portion optional (correct for FSA-only codes). No forbidden fields introduced. New contributions/postcodes/CA.json records correctly omit id, flag, created_at, updated_at. (CI errors are false positives β€” see note below.)
  • FK integrity: βœ… All 1,645 FSA records resolve to valid CA provinces/territories (per CI cross-reference pass).
  • Coordinates: ❌ One record has coordinates (90.0, 0.0) β€” the geographic North Pole, which is not in Canada.
    • contributions/postcodes/CA.json: Record 371 (flagged by CI geo-bounds check)
    • This appears to be a sentinel/null value from the source data (ca_postalcodes.csv). The importer does not filter zero-lat/zero-lon or pole coordinates. This record should be dropped or its coordinates corrected before merge.
  • Wikidata: N/A.
  • Naming convention: N/A.

Note on CI "needs-changes" label

Same false-positive issue as the other postcode PRs: validator checks all 250 existing countries in countries.json for forbidden fields that legitimately exist on pre-existing records. This is unrelated to the actual CA regex change, which is correct.

πŸ€– Automated weekly review β€” Claude (sonnet-4-6).


Generated by Claude Code

H0H is a real Canadian FSA reserved by Canada Post for letters to
Santa Claus at the North Pole. The source dataset has no real lat/lng
for it; the importer fell through to (90, 0) which trips the geo-
bounds validator. Null the coordinates instead of dropping the row,
so the FSA stays queryable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dr5hn dr5hn merged commit 8f718ac into master May 5, 2026
1 check passed
@dr5hn dr5hn deleted the feat/postcodes-canada branch May 5, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:countries data:postcodes enhancement New feature or request large-contribution needs-changes size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant