Skip to content

feat(postcodes/TZ): 3,684 NAPA Tanzania codes (#1039)#1506

Merged
dr5hn merged 1 commit into
masterfrom
feat/postcodes-tanzania
May 5, 2026
Merged

feat(postcodes/TZ): 3,684 NAPA Tanzania codes (#1039)#1506
dr5hn merged 1 commit into
masterfrom
feat/postcodes-tanzania

Conversation

@dr5hn
Copy link
Copy Markdown
Owner

@dr5hn dr5hn commented May 2, 2026

Summary

  • Imports 3,684 Tanzania National Postal Authority (NAPA) 5-digit postcodes for Can we add a postcode for this? #1039
  • 100% state FK resolution across 24 mainland regions
  • Supersedes the earlier-considered meshackjr/Tanzania-Postal-Codes-SQL (1/31 regions)

Source

  • Msuluzya/TanzaniaRegions — MIT-licensed
  • File: json/tanzania-regions.json (~545 KB)
  • Upstream: Tanzania National Postal Authority (NAPA) public lookup

Research-doc correction

The research doc Tier B note for Tanzania read Only Dar es Salaam (1/31 regions). That was based on meshackjr/Tanzania-Postal-Codes-SQL. Direct probe of the GitHub ecosystem found Msuluzya's MIT-licensed comprehensive mirror covering 24+ regions.

Source structure

4-level nest: Region → District → Ward → places (5-digit codes). Each ward also carries streets / villages arrays for representative locality names.

State FK strategy

Direct region-name match handles 18 of 24 covered regions. 10-entry SOURCE_TO_CSC_NAME alias map handles the Swahili island variants:

  • Kaskazini PembaPemba North
  • Kusini PembaPemba South
  • Kaskazini UngujaZanzibar North
  • Kusini UngujaZanzibar South
  • Mjini MagharibiZanzibar West

…plus English-variant aliases for 'Zanzibar Central/South''Zanzibar South' and 'Zanzibar Urban/West''Zanzibar West'.

Coverage gaps

  • 7 CSC regions absent: Dar es Salaam + 5 Zanzibar/Pemba island regions (source ships empty places arrays for all of them).
  • Arusha skipped: Source's places array there contains village names instead of postcodes (1,432 entries). Strict regex check (^\d{5}$) correctly filters them — no malformed data leaks through.

Distribution (top 5)

iso2 region rows
25 Tanga 245
16 Morogoro 213
03 Dodoma 209
24 Tabora 206
05 Kagera 198

Test plan

  • python3 -m py_compile bin/scripts/sync/import_tanzania_postcodes.py
  • All 3,684 codes match ^\d{5}$
  • 100% state_id valid; state.country_id == 218; state_code == state.iso2
  • No auto-managed fields (id, created_at, updated_at, flag)
  • Idempotent merge (re-run produces no diff)

🤖 Generated with Claude Code

Adds Tanzania's 5-digit National Postal Authority (NAPA) postcodes
from the Msuluzya/TanzaniaRegions MIT-licensed mirror.

Why
---
Closes the TZ gap on issue #1039. The previously-tracked
meshackjr/Tanzania-Postal-Codes-SQL covered only Dar es Salaam (1
of 31 regions). Msuluzya covers 24 mainland regions with full
ward/locality hierarchy.

Coverage
--------
- 3,684 codes / 100% state FK
- 24 of 31 CSC TZ regions covered

Source structure
----------------
4-level nest: Region -> District -> Ward -> places (5-digit codes).
Some wards also have streets/villages arrays for representative
locality names.

State FK strategy
-----------------
Direct region-name match + 10-entry SOURCE_TO_CSC_NAME alias map
handling Swahili region labels:
  Kaskazini Pemba -> Pemba North
  Kusini Pemba    -> Pemba South
  Kaskazini Unguja -> Zanzibar North
  Kusini Unguja    -> Zanzibar South
  Mjini Magharibi -> Zanzibar West
plus English-variant aliases for 'Zanzibar Central/South' ->
'Zanzibar South' and 'Zanzibar Urban/West' -> 'Zanzibar West'.

Coverage gaps
-------------
- 7 CSC regions absent (no rows): Dar es Salaam + 5 Zanzibar/Pemba
  island regions (source ships empty `places` arrays for all of
  them) + Arusha (source's `places` array there contains village
  names, not postcodes — 1,432 entries skipped at regex check).
- The Arusha source-data shape inconsistency is a known upstream
  quirk; the strict regex check correctly filters out the malformed
  data without polluting valid rows.

License
-------
Msuluzya/TanzaniaRegions: MIT.
Upstream: Tanzania National Postal Authority (NAPA).
Each row: source: "tanzania-napa-via-msuluzya"

Validation
----------
- python3 -m py_compile passes
- 100% regex match (^\d{5}$)
- 100% state_id valid + state.country_id == 218 + state_code agrees
- No auto-managed fields (id, created_at, updated_at, flag)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 2, 2026
@dosubot dosubot Bot added the enhancement New feature or request label May 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

CSC Validation Report

PR Format

  • ✅ Description provided
  • ✅ Data source linked
  • ✅ Issue linked (recommended for data changes)
  • ✅ Justification / context provided

Labels applied: data:postcodes, large-contribution

⚠️ Large Contribution

This PR contains 3684 records. Large contributions require manual review.

Schema Validation (3684 records)

✅ All records passed validation

Cross-Reference Validation

✅ 7368 reference(s) verified

Source URL Verification

✅ 2 source URL(s) accessible


All checks passed | Status: Ready for review

@dr5hn dr5hn merged commit 9efe843 into master May 5, 2026
1 check passed
@dr5hn dr5hn deleted the feat/postcodes-tanzania branch May 5, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:postcodes enhancement New feature or request large-contribution ready-for-review size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant