Skip to content

feat(scripts): La Poste postcode importer for overseas territories (#1039)#1427

Closed
dr5hn wants to merge 1 commit into
masterfrom
feat/postcodes-laposte-pipeline
Closed

feat(scripts): La Poste postcode importer for overseas territories (#1039)#1427
dr5hn wants to merge 1 commit into
masterfrom
feat/postcodes-laposte-pipeline

Conversation

@dr5hn

@dr5hn dr5hn commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

Adds bin/scripts/sync/import_laposte_postcodes.py — an etalab-2.0-compliant pipeline that ingests La Poste's official base-officielle-des-codes-postaux CSV (~39k rows from data.gouv.fr) and writes contributions/postcodes/{ISO2}.json files for the 11 French overseas territories.

This is the infrastructure PR. A follow-up PR runs the script with the live CSV and ships a single bulk import covering ~250 overseas postcodes — turning the manual 1–4 codes-per-PR grind into one mechanical PR.

How it works

  1. Streams laposte_hexasmal.csv with a prefix-based classifier:
    97133 → BL    972 → MQ    976 → YT    987 → PF
    97150 → MF    973 → GF    986 → WF    988 → NC
    97500 → PM    974 → RE
    971 (other) → GP
    
  2. Resolves country_id from countries.json by ISO2.
  3. Resolves state_id by exact case-insensitive name match against states.json. Conservative — leaves state_id null when no confident match exists, rather than guessing.
  4. Merges with existing curated files: existing codes preserved, new codes appended. Idempotent.
  5. Sets source: "laposte" on every new row for license attribution.

What it does NOT do

  • ❌ Touch metropolitan France (~36k communes, separate scope).
  • ❌ Overwrite curated locality_name or state_id values on existing manual rows (BL/GP/MQ/RE/etc. PRs already merged stay intact).
  • ❌ Network fetch — the CSV must be downloaded out-of-band by the maintainer:
    curl -L -o /tmp/laposte_hexasmal.csv \
      https://www.data.gouv.fr/fr/datasets/r/d9faa17b-ee8b-414e-8a5f-95fde9ff0e80

License

Test plan

  • python3 -m py_compile clean
  • --help documents flags
  • Dry-run against a 32-row fixture correctly classifies 30 overseas + 2 metro (excluded), with FK resolution working for 11/30 records (the rest correctly leave state_id null when no exact match exists)
  • Merge logic preserves existing curated entries by code; new codes are appended
  • Maintainer downloads the live CSV (~6 MB) and runs the importer

Expected output (after running on live data)

ISO2 Approx records
GP ~32 communes
MQ ~34
GF ~22
RE ~24
YT ~17
WF ~3
PF ~98
NC ~33
PM 1
BL 1
MF 1

Total: ~250 codes — a single follow-up PR delivers all of these.

Refs: #1039

#1039)

Adds bin/scripts/sync/import_laposte_postcodes.py — an etalab-2.0
compliant pipeline that reads La Poste's official base-officielle-des-
codes-postaux CSV (~39,000 rows from data.gouv.fr) and writes
contributions/postcodes/{ISO2}.json files for the 11 French overseas
territories.

What it does
- Streams laposte_hexasmal.csv with a prefix-based classifier
  (97133->BL, 97150->MF, 97500->PM, 971->GP, 972->MQ, 973->GF, 974->RE,
  976->YT, 986->WF, 987->PF, 988->NC)
- Resolves country_id from countries.json by ISO2
- Resolves state_id by exact case-insensitive name match against
  states.json (conservative — leaves null when no confident match)
- Merges with existing curated files: existing codes preserved,
  new codes appended. Idempotent.
- Sets source="laposte" on every new row for license attribution

What it does NOT do
- Touch metropolitan France (~36,000 communes, separate scope)
- Overwrite curated locality_name or state_id values on existing
  manual rows (BL/GP/MQ/RE/etc. PRs already merged stay intact)
- Network fetch — the CSV must be downloaded out-of-band:
    curl -L -o /tmp/laposte_hexasmal.csv \
      https://www.data.gouv.fr/fr/datasets/r/d9faa17b-ee8b-414e-8a5f-95fde9ff0e80

License
- Source: La Poste / data.gouv.fr (etalab-2.0)
- Each generated row records source="laposte" so attribution can be
  programmatically assembled at export time

Validated locally
- Compiles clean (python3 -m py_compile)
- Dry-run against a 32-row fixture correctly classifies 30 overseas +
  2 metro (excluded), with FK resolution working for 11/30 records
  (the rest correctly leave state_id null when no exact match exists)

After this PR lands, a follow-up PR runs the script with the live CSV
and ships a single bulk-import covering ~250 overseas postcodes —
turning the manual 4-codes-per-PR grind into one mechanical PR.

Refs: #1039

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 07:06
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 27, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants