Skip to content

Commit a30f8ea

Browse files
dr5hnclaude
andcommitted
docs: multi-level territories policy (FR overseas, dual representation) (#1352 PR-C)
Adds MULTI_LEVEL_TERRITORIES.md documenting why 12 French overseas territories (and analogous US/CN/NO entities) appear simultaneously as ISO 3166-1 countries and as ISO 3166-2 subdivisions of their parent state. Captures the maintainer's Option C decision on #1352: keep both representations because (1) downstream API/SDK consumers filter on country_code, (2) ISO 3166-1 lists them as countries, and (3) the breaking change is unjustified for a labelling concern. Cross-links the new policy doc from .claude/CLAUDE.md (Important Rules) and README.md (contributing section). No data changes. Refs: #1352 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent abc9b43 commit a30f8ea

3 files changed

Lines changed: 167 additions & 1 deletion

File tree

.claude/CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ python3 bin/scripts/sync/import_json_to_mysql.py --host $DB_HOST --user $DB_USER
191191
- Run `normalize_json.py` to pre-assign IDs (optional)
192192
- Document fixes in `.github/fixes-docs/FIX_<issue_number>_SUMMARY.md` (ONE file per issue)
193193
- When adding states + cities: run JSON→MySQL→JSON between tasks for ID assignment
194+
- For overseas / dual-ISO territories (e.g. FR overseas, US territories, CN SARs), see [MULTI_LEVEL_TERRITORIES.md](../MULTI_LEVEL_TERRITORIES.md) before changing country/state records
194195

195196
**DO NOT:**
196197
- Edit auto-generated dirs: `json/`, `csv/`, `xml/`, `yml/`, `sql/`, etc.

MULTI_LEVEL_TERRITORIES.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Multi-Level Territories Policy
2+
3+
> **Status:** Active policy. Established as part of [#1352](https://github.com/dr5hn/countries-states-cities-database/issues/1352) (France data, PR-C).
4+
> **Scope:** Explains why some geographical entities appear simultaneously as ISO 3166-1 *countries* and as ISO 3166-2 *subdivisions* of another country in this database, and how downstream consumers should reason about it.
5+
6+
## Background
7+
8+
A handful of overseas / autonomous territories are listed by ISO 3166 at **two levels** at once:
9+
10+
- **ISO 3166-1** assigns them their own two-letter country code (e.g. `MQ` for Martinique).
11+
- **ISO 3166-2** also lists them as subdivisions of a parent state (e.g. `FR-MQ` as a subdivision of France).
12+
13+
This is not an accident or a bug in the standard — it reflects political reality. Martinique *is* an integral region of the French Republic (its residents vote in French national elections, use the euro, and are EU citizens), but it *also* has independent representation at certain international bodies, its own internet TLD, its own currency code in some historical contexts, and so on.
14+
15+
When ISO models a place at two levels, this database does too.
16+
17+
## Policy
18+
19+
**Both representations are kept in sync. Neither is canonical; both are first-class.**
20+
21+
For each multi-level territory:
22+
23+
1. There is a row in `contributions/countries/countries.json` (with its own `id`, `iso2`, `iso3`).
24+
2. There is a row in `contributions/states/states.json` whose `country_code` points at the **parent** state (e.g. `FR`, `US`, `CN`), and whose `iso2` / `state_code` matches the territory.
25+
3. Cities under the territory live in `contributions/cities/<TERRITORY_ISO2>.json` (e.g. `MQ.json`), and reference both their `country_id` (= the territory) and their `state_id` (= the territory-as-subdivision-of-parent).
26+
27+
## The 12 French Overseas Territories
28+
29+
These are the territories covered by this policy under France. All 12 appear as both `FR` subdivisions and as standalone ISO 3166-1 countries.
30+
31+
| ISO 3166-1 | ISO 3166-2 / INSEE | Name (English) | `countries.id` | `states.id` | State `type` |
32+
| :--------- | :----------------- | :----------------------------------- | -------------: | ----------: | :------------------------------------------ |
33+
| `GF` | `FR-GF` / `973` | French Guiana | 76 | 4822 | overseas region |
34+
| `PF` | `FR-PF` | French Polynesia | 77 | 4824 | overseas collectivity |
35+
| `TF` | `FR-TF` | French Southern and Antarctic Lands | 78 | 5065 | overseas territory |
36+
| `GP` | `FR-GP` / `971` | Guadeloupe | 88 | 4829 | overseas region |
37+
| `MQ` | `FR-MQ` / `972` | Martinique | 138 | 4827 | overseas region |
38+
| `YT` | `FR-YT` / `976` | Mayotte | 141 | 4797 | overseas region |
39+
| `NC` | `FR-NC` | New Caledonia | 157 | 5538 | overseas collectivity with special status |
40+
| `RE` | `FR-RE` / `974` | Réunion | 180 | 4823 | overseas region |
41+
| `PM` | `FR-PM` | Saint Pierre and Miquelon | 187 | 4821 | overseas collectivity |
42+
| `BL` | `FR-BL` | Saint-Barthélemy | 189 | 4794 | overseas collectivity |
43+
| `MF` | `FR-MF` | Saint-Martin (French part) | 190 | 4809 | overseas collectivity |
44+
| `WF` | `FR-WF` | Wallis and Futuna | 243 | 4810 | overseas collectivity |
45+
46+
> The five **DROM** (Départements et régions d'outre-mer) — `GF`, `GP`, `MQ`, `RE`, `YT` — currently use **INSEE numeric codes** (`971``976`) as their `state_code` in `states.json`, while the overseas collectivities use the ISO 3166-2 alphabetic codes. Aligning the DROM to ISO 3166-2 alphabetic codes is tracked separately and is **out of scope** for this policy doc.
47+
48+
## Why we model both (rationale)
49+
50+
1. **ISO 3166 compliance.** Both representations are present in the standard. Removing either side would make the database fail a strict ISO conformance check that consumers commonly run.
51+
2. **Downstream-consumer compatibility.** A large portion of API and package consumers — including [`@countrystatecity/countries`](https://www.npmjs.com/package/@countrystatecity/countries), [`countrystatecity-countries` (PyPI)](https://pypi.org/project/countrystatecity-countries/), and the [REST API](https://countrystatecity.in/) — filter and key off `country_code`. Code in the wild does things like `country_code === 'MQ'` to fetch all Martinique cities. Deleting `MQ` as a country would break those queries silently.
52+
3. **Routing & locale data.** Phone codes, currencies, TLDs, and timezones often differ between an overseas territory and its metropolitan parent (e.g. `NC` uses `XPF`, not `EUR`; `PF` is `UTC-10`/`-9:30`/`-9`, not `UTC+1`). The country-level row carries that metadata.
53+
4. **Geographic reality at the city level.** Cities physically located in Saint-Denis (Réunion) cannot be in the same bounding box as Paris. The country-level partition keeps coordinate validation (`.github/scripts/validate-coordinates.js`) honest.
54+
5. **Reversibility.** Keeping both is additive. A future maintainer who decides to collapse one side can do so cleanly. The reverse — restoring deleted records and back-filling foreign keys across 153k+ cities — is not cleanly reversible.
55+
56+
In short: removing the country-level record (Option A in #1352) is a **breaking change** to fix a **labelling concern**, and the cost/benefit doesn't justify it.
57+
58+
## How downstream consumers should query
59+
60+
Pick the model that matches the question being asked.
61+
62+
### "Give me everything in the French Republic" (metropolitan + DROM + collectivities)
63+
64+
Use the `FR` country, then traverse via `state_id`:
65+
66+
```sql
67+
SELECT c.*
68+
FROM cities c
69+
JOIN states s ON c.state_id = s.id
70+
WHERE s.country_code = 'FR'; -- includes all 12 overseas territories
71+
```
72+
73+
This works because every overseas territory has a state row whose `country_code = 'FR'`.
74+
75+
### "Give me only Martinique" (the territory in isolation)
76+
77+
Filter by the territory's own ISO 3166-1 country code:
78+
79+
```sql
80+
SELECT * FROM cities WHERE country_code = 'MQ';
81+
```
82+
83+
This is the form most API/SDK consumers already use, and it is the form this policy is designed to preserve.
84+
85+
### "Give me metropolitan France only" (exclude overseas)
86+
87+
Exclude the 12 overseas codes explicitly. The metropolitan vs. overseas split is a political/administrative distinction, not a data-model distinction:
88+
89+
```sql
90+
SELECT * FROM cities
91+
WHERE country_code = 'FR'
92+
AND state_code NOT IN ('GF','PF','TF','GP','MQ','YT','NC','RE','PM','BL','MF','WF',
93+
'971','972','973','974','976'); -- INSEE for DROM
94+
```
95+
96+
> A future cleanup may add a `metropolitan` boolean or `is_overseas` flag on `states` to make this query simpler. Not in scope here.
97+
98+
## Precedent: this is not new
99+
100+
The same dual-representation already applies to several other countries in this database. The `FR` work in #1352 brings France in line with the existing pattern.
101+
102+
| Parent | Territory | `countries.iso2` | `states.id` | Notes |
103+
| :----- | :----------------------------------------- | :--------------- | ----------: | :-------------------------------------------- |
104+
| `CN` | Hong Kong SAR | `HK` | 2267 | special administrative region |
105+
| `CN` | Macau SAR | `MO` | 2266 | special administrative region |
106+
| `US` | Puerto Rico | `PR` | 1449 | outlying area |
107+
| `US` | Guam | `GU` | 1412 | outlying area |
108+
| `US` | American Samoa | `AS` | 1424 | outlying area |
109+
| `US` | Northern Mariana Islands | `MP` | 1431 | outlying area |
110+
| `US` | U.S. Virgin Islands | `VI` | 1413 | outlying area |
111+
| `US` | U.S. Minor Outlying Islands | `UM` | 1432 | outlying area |
112+
| `NO` | Svalbard | `SJ` (shared) | 1013 | arctic region; Jan Mayen is state `1026` |
113+
114+
### Known gaps (territories *only* modeled as countries)
115+
116+
For honest scoping, the following ISO 3166-1 entries are *not* currently dual-modeled as states of their administering country in this database. They may be candidates for similar treatment in the future, but are **out of scope** for this policy:
117+
118+
| Code | Name | Administered by |
119+
| :----- | :------------------------------------ | :------------------- |
120+
| `GL` | Greenland | Denmark (`DK`) |
121+
| `FO` | Faroe Islands | Denmark (`DK`) |
122+
| `AW` | Aruba | Netherlands (`NL`) |
123+
| `CW` | Curaçao | Netherlands (`NL`) |
124+
| `SX` | Sint Maarten (Dutch part) | Netherlands (`NL`) |
125+
| `BQ` | Bonaire, Sint Eustatius and Saba | Netherlands (`NL`) |
126+
| `AX` | Åland Islands | Finland (`FI`) |
127+
| `BV` | Bouvet Island | Norway (`NO`) |
128+
| `CC` | Cocos (Keeling) Islands | Australia (`AU`) |
129+
| `CX` | Christmas Island | Australia (`AU`) |
130+
| `NF` | Norfolk Island | Australia (`AU`) |
131+
| `HM` | Heard Island and McDonald Islands | Australia (`AU`) |
132+
| `GG` | Guernsey | British Crown |
133+
| `JE` | Jersey | British Crown |
134+
| `IM` | Isle of Man | British Crown |
135+
136+
These cases differ from the FR/US/CN ones because each carries its own political nuances (degree of autonomy, EU status, currency union, treaty arrangements). A blanket dual-modeling rule is **not** being adopted here. Each future case should be decided on its own merits, ideally tracked against an explicit issue.
137+
138+
## Future considerations
139+
140+
### Option A (full reclassify) — out of scope, documented for posterity
141+
142+
The original reporter on [#1352](https://github.com/dr5hn/countries-states-cities-database/issues/1352) suggested that Martinique (and by extension the other overseas territories) should be **only** a French region — i.e., delete the country-level row.
143+
144+
We considered this. It was rejected for the rationale-section reasons above. If a future maintainer revisits this decision, the migration would need:
145+
146+
1. A deprecation notice across all consumers (NPM, PyPI, REST API, Export Tool) with a 6+ month lead time.
147+
2. A back-fill migration that rewrites every `cities.country_id` / `cities.country_code` reference for the 12 territories.
148+
3. A versioning strategy on the database (e.g. `world.v2.sql`) so that consumers pinned to the old shape don't break.
149+
4. Updates to `country-bounds.json` and the coordinate validator to handle the new "FR includes both metropolitan and tropical" bounding box (otherwise every Martinique city would fail validation).
150+
5. Coordinated releases with [csc-export-tool](https://github.com/dr5hn/csc-export-tool), [countrystatecity-countries (NPM)](https://github.com/dr5hn/countrystatecity-countries), and [countrystatecity-pypi](https://github.com/dr5hn/countrystatecity-pypi).
151+
152+
That work is large and risky, and the **current dual-representation is internally consistent and ISO-compliant**. There is no functional defect today; only a labelling preference.
153+
154+
### Smaller follow-ups that *are* in scope (separate issues / PRs)
155+
156+
- Align the DROM `state_code` from INSEE numeric (`971``976`) to ISO 3166-2 alphabetic (`FR-GF`, `FR-GP`, …).
157+
- Consider an `is_overseas` or `is_dependency` boolean on `states` so the metropolitan/overseas split is a simple filter rather than an explicit `IN (…)` list.
158+
- Decide, case-by-case, whether to dual-model any of the "known gaps" above (Greenland, Aruba, etc.).
159+
160+
## Cross-references
161+
162+
- Issue: [#1352 — France data: missing cities and regions misclassified](https://github.com/dr5hn/countries-states-cities-database/issues/1352)
163+
- Maintainer docs: [`.claude/CLAUDE.md`](.claude/CLAUDE.md) — see the *Important Rules* section.
164+
- Contributor docs: [`contributions/README.md`](contributions/README.md)
165+
- Related ISO standards: [ISO 3166-1](https://www.iso.org/iso-3166-country-codes.html) (countries), [ISO 3166-2](https://www.iso.org/standard/72483.html) (subdivisions).

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,7 @@ Use our web tool to browse, search, and submit data change requests with a strea
319319
}
320320
```
321321

322-
📖 **Full guide**: [contributions/README.md](contributions/README.md) | [Contribution Guidelines](.github/CONTRIBUTING.md) | [Maintainer Docs](.claude/CLAUDE.md)
322+
📖 **Full guide**: [contributions/README.md](contributions/README.md) | [Contribution Guidelines](.github/CONTRIBUTING.md) | [Maintainer Docs](.claude/CLAUDE.md) | [Multi-Level Territories Policy](MULTI_LEVEL_TERRITORIES.md) (overseas / dual-ISO entities)
323323

324324
**Note:** Only edit JSON in `contributions/` - GitHub Actions auto-generates all export formats!
325325

0 commit comments

Comments
 (0)