fix(IT): drop 6 duplicate pairs flagged by remap (#1349 follow-up)#1399
Closed
dr5hn wants to merge 1 commit into
Closed
fix(IT): drop 6 duplicate pairs flagged by remap (#1349 follow-up)#1399dr5hn wants to merge 1 commit into
dr5hn wants to merge 1 commit into
Conversation
Dropped the legacy half of each pair, keeping the ISTAT-canonical (or
English-name, per repo convention) record:
id 58976 'Pozzaglio' -> kept 58977 'Pozzaglio ed Uniti'
id 61329 'Torino' -> kept 61575 'Turin'
id 61530 'Trinità d\'Agultu' -> kept 61531 'Trinità d\'Agultu e Vignola'
id 139215 'Inverno' -> kept 139216 'Inverno e Monteleone'
id 139523 'Limite' -> kept 136799 'Capraia e Limite'
id 140714 'Napoli' -> kept 140713 'Naples'
Two pairs are intentionally NOT touched and require maintainer review,
since neither record carries the ISTAT-canonical merged name:
- MN: 'Sermide' (id 60744) + 'Felonica' (id 138474)
canonical comune is 'Sermide e Felonica' (since 2017).
- PV: 'Corteolona' (id 138065) + 'Genzone' (id 138905)
canonical comune is 'Corteolona e Genzone' (since 2018).
Stacks on top of #1395 and #1397.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
Owner
Author
Weekly data-quality review (2026-04-27)Verdict: clean Checks
Advisory (non-blocking)
🤖 Automated weekly review — Claude (sonnet-4-6). Generated by Claude Code |
dr5hn
added a commit
that referenced
this pull request
Apr 27, 2026
…#1397/#1399) The remap is a behavior change for downstream consumers — region-level state_code queries (e.g. Sicily=82, Lombardy=25) now return empty arrays because cities live under provinces/metropolitan cities, not regions. Documents the traversal pattern (states.parent_id) needed for region-aggregate queries so users know how to migrate.
There was a problem hiding this comment.
Pull request overview
Removes six known duplicate Italy city records (flagged during the #1349 remap follow-up) to keep contributions/cities/IT.json consistent and reduce duplicate comune entries.
Changes:
- Dropped 6 duplicate city records from
contributions/cities/IT.json(keeping the canonical/English-name counterparts). - Added a dedicated fix script (
italy_dedup_flagged_pairs.py) that verifies preconditions and performs the drops (idempotent via “already gone” skips).
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| contributions/cities/IT.json | Deletes the 6 duplicate city records listed in the PR description. |
| bin/scripts/fixes/italy_dedup_flagged_pairs.py | Adds an idempotent fix script with safety checks to perform the same targeted deletions. |
Owner
Author
dr5hn
added a commit
that referenced
this pull request
Apr 27, 2026
…#1397/#1399) The remap is a behavior change for downstream consumers — region-level state_code queries (e.g. Sicily=82, Lombardy=25) now return empty arrays because cities live under provinces/metropolitan cities, not regions. Documents the traversal pattern (states.parent_id) needed for region-aggregate queries so users know how to migrate.
dr5hn
added a commit
that referenced
this pull request
Apr 27, 2026
…n) (#1352 PR-C) (#1392) * feat(postcodes/DK): bulk-import 1,089 codes via DAWA (#1039) Adds Danish postcodes via DAWA (Danmarks Adressers Web API) — public sector data published under CC-0 by SDFI/Dataforsyningen. 1. bin/scripts/sync/import_denmark_postcodes.py — pipeline that fetches /kommuner to build a kommune-code -> region-name map, then resolves each /postnumre record's region via its first kommune. Maps the 5 Danish region names to states.json iso2 codes: Region Hovedstaden -> 84 (called "Denmark" in states.json) Region Sjælland -> 85 (Zealand) Region Syddanmark -> 83 (Southern Denmark) Region Midtjylland -> 82 (Central Denmark) Region Nordjylland -> 81 (North Denmark) 2. contributions/postcodes/DK.json — 1,089 codes covering all 5 regions with 100% state_id + 100% coordinate resolution. Validation (zero errors) - All codes match countries.postal_code_regex (^(\\d{4})\$) - All FKs resolve, all state_codes agree with state.iso2 License & attribution - Source: SDFI / Dataforsyningen DAWA (CC-0) - Each row: source: "dawa" Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(postcodes/IS): bulk-import 195 codes via iceaddr (#1039) Adds Icelandic postcodes via the sveinbjornt/iceaddr Python package which embeds the canonical postcode metadata under MIT licence. 1. bin/scripts/sync/import_iceland_postcodes.py — pipeline that dynamically imports the iceaddr POSTCODES dict and resolves each code's region via prefix range to states.json iso2 1-8 (Statistics Iceland's NUTS-3 boundaries: 1xx-2xx Capital, 3xx Western, 4xx Westfjords, 5xx Northwestern, 6xx Northeastern, 7xx Eastern, 8xx-9xx Southern). 2. contributions/postcodes/IS.json — 195 records with 100% state_id resolution. Locality names combine stadur_nf + lysing (e.g. "Reykjavík, Miðborg"). License & attribution - Source: iceaddr (MIT) embedding Pósturinn data - Each row: source: "iceaddr" Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(postcodes/SK+RO+SI): batch-import 15,585 codes via 3 community mirrors (#1039) Bundles three small-to-medium European countries with confirmed redistributable postcode mirrors into a single batch importer. 1. bin/scripts/sync/import_eu_batch1_postcodes.py — pipeline that ingests three different shapes (SK JSON, RO CSV, SI CSV) and writes per-country JSON files. ASCII-folding + dash-to-space normalisation handles the Romanian Caraș-Severin / Bistrița-Năsăud cases where the CSV uses spaces and states.json uses hyphens. 2. contributions/postcodes/SK.json — 1,312 records (100% state via KRAJ -> states.iso2 direct match) 3. contributions/postcodes/RO.json — 13,751 records (100% state via ASCII-folded judet name match; all 6 Bucharest sectors mapped to 'B') 4. contributions/postcodes/SI.json — 522 records, country-only by design (source has no municipality info; SI postcodes don't map cleanly to administrative regions) Validation (zero errors) - All codes match countries.postal_code_regex - All FKs resolve, all state_codes agree with state.iso2 License & attribution - SK source: github.com/FeroVolar/PSC-JSON (community Slovenská pošta data) - RO source: github.com/alexionegit/coduripostaleRomaniaPS - SI source: github.com/dlabs/postcode_si (community Posta Slovenije data) Refs: #1039 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add notable callout for IT city→province remap (#1395/#1397/#1399) The remap is a behavior change for downstream consumers — region-level state_code queries (e.g. Sicily=82, Lombardy=25) now return empty arrays because cities live under provinces/metropolitan cities, not regions. Documents the traversal pattern (states.parent_id) needed for region-aggregate queries so users know how to migrate. * docs: multi-level territories policy (FR overseas, dual representation) (#1352 PR-C) Adds MULTI_LEVEL_TERRITORIES.md documenting why 12 French overseas territories (and analogous US/CN/NO entities) appear simultaneously as ISO 3166-1 countries and as ISO 3166-2 subdivisions of their parent state. Captures the maintainer's Option C decision on #1352: keep both representations because (1) downstream API/SDK consumers filter on country_code, (2) ISO 3166-1 lists them as countries, and (3) the breaking change is unjustified for a labelling concern. Cross-links the new policy doc from .claude/CLAUDE.md (Important Rules) and README.md (contributing section). No data changes. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #1395 → #1397. Targets
fix/issue-1349-italy-native-fieldso the diff stays focused; rebase to master once the upstream PRs land.Refs #1349 — implements the maintainer cleanup from the duplicate list in the city remap.
Drops (6 records)
City count: 9,947 → 9,941.
NOT dropped — flagged for maintainer
Two pairs require manual handling because neither half carries the modern ISTAT-canonical name:
Canonical since 2017: Sermide e Felonica. Recommended: rename id 60744 to "Sermide e Felonica", set its
nativeaccordingly, drop id 138474.Canonical since 2018: Corteolona e Genzone. Recommended: rename id 138065 to "Corteolona e Genzone", set
native, drop id 138905.These weren't auto-handled because renaming is irreversible and shouldn't be done without explicit signoff on the chosen kept-id.
Preconditions
The script
italy_dedup_flagged_pairs.pyverifies that every drop-target's current name matches the expected name and every keep-target exists with the expected name before mutating IT.json. If the IT data has shifted underneath, the script aborts with exit code 2 — no silent drops.Test plan
python3 bin/scripts/fixes/italy_dedup_flagged_pairs.py --dry-runreportsDropped this run: 0after this PR (idempotent).jq 'length' contributions/cities/IT.json→ 9941.jq '[.[] | select(.id == 58976 or .id == 61329 or .id == 61530 or .id == 139215 or .id == 139523 or .id == 140714)] | length' contributions/cities/IT.json→ 0.validate-schema,validate-cross-reference,validate-coordinates,detect-duplicatespass.🤖 Generated with Claude Code