feat(scripts): La Poste postcode importer for overseas territories (#1039)#1427
Closed
dr5hn wants to merge 1 commit into
Closed
feat(scripts): La Poste postcode importer for overseas territories (#1039)#1427dr5hn wants to merge 1 commit into
dr5hn wants to merge 1 commit into
Conversation
#1039) Adds bin/scripts/sync/import_laposte_postcodes.py — an etalab-2.0 compliant pipeline that reads La Poste's official base-officielle-des- codes-postaux CSV (~39,000 rows from data.gouv.fr) and writes contributions/postcodes/{ISO2}.json files for the 11 French overseas territories. What it does - Streams laposte_hexasmal.csv with a prefix-based classifier (97133->BL, 97150->MF, 97500->PM, 971->GP, 972->MQ, 973->GF, 974->RE, 976->YT, 986->WF, 987->PF, 988->NC) - Resolves country_id from countries.json by ISO2 - Resolves state_id by exact case-insensitive name match against states.json (conservative — leaves null when no confident match) - Merges with existing curated files: existing codes preserved, new codes appended. Idempotent. - Sets source="laposte" on every new row for license attribution What it does NOT do - Touch metropolitan France (~36,000 communes, separate scope) - Overwrite curated locality_name or state_id values on existing manual rows (BL/GP/MQ/RE/etc. PRs already merged stay intact) - Network fetch — the CSV must be downloaded out-of-band: curl -L -o /tmp/laposte_hexasmal.csv \ https://www.data.gouv.fr/fr/datasets/r/d9faa17b-ee8b-414e-8a5f-95fde9ff0e80 License - Source: La Poste / data.gouv.fr (etalab-2.0) - Each generated row records source="laposte" so attribution can be programmatically assembled at export time Validated locally - Compiles clean (python3 -m py_compile) - Dry-run against a 32-row fixture correctly classifies 30 overseas + 2 metro (excluded), with FK resolution working for 11/30 records (the rest correctly leave state_id null when no exact match exists) After this PR lands, a follow-up PR runs the script with the live CSV and ships a single bulk-import covering ~250 overseas postcodes — turning the manual 4-codes-per-PR grind into one mechanical PR. Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
bin/scripts/sync/import_laposte_postcodes.py— an etalab-2.0-compliant pipeline that ingests La Poste's official base-officielle-des-codes-postaux CSV (~39k rows from data.gouv.fr) and writescontributions/postcodes/{ISO2}.jsonfiles for the 11 French overseas territories.This is the infrastructure PR. A follow-up PR runs the script with the live CSV and ships a single bulk import covering ~250 overseas postcodes — turning the manual 1–4 codes-per-PR grind into one mechanical PR.
How it works
laposte_hexasmal.csvwith a prefix-based classifier:country_idfromcountries.jsonby ISO2.state_idby exact case-insensitive name match againststates.json. Conservative — leavesstate_idnull when no confident match exists, rather than guessing.source: "laposte"on every new row for license attribution.What it does NOT do
locality_nameorstate_idvalues on existing manual rows (BL/GP/MQ/RE/etc. PRs already merged stay intact).License
source: "laposte"so attribution can be programmatically assembled at export time (thesourcecolumn is already in the postcodes schema from feat(postcodes): add postcodes table and infrastructure (#1039) #1398).Test plan
python3 -m py_compileclean--helpdocuments flagsstate_idnull when no exact match exists)Expected output (after running on live data)
Total: ~250 codes — a single follow-up PR delivers all of these.
Refs: #1039