|
| 1 | +# Multi-Level Territories Policy |
| 2 | + |
| 3 | +> **Status:** Active policy. Established as part of [#1352](https://github.com/dr5hn/countries-states-cities-database/issues/1352) (France data, PR-C). |
| 4 | +> **Scope:** Explains why some geographical entities appear simultaneously as ISO 3166-1 *countries* and as ISO 3166-2 *subdivisions* of another country in this database, and how downstream consumers should reason about it. |
| 5 | +
|
| 6 | +## Background |
| 7 | + |
| 8 | +A handful of overseas / autonomous territories are listed by ISO 3166 at **two levels** at once: |
| 9 | + |
| 10 | +- **ISO 3166-1** assigns them their own two-letter country code (e.g. `MQ` for Martinique). |
| 11 | +- **ISO 3166-2** also lists them as subdivisions of a parent state (e.g. `FR-MQ` as a subdivision of France). |
| 12 | + |
| 13 | +This is not an accident or a bug in the standard — it reflects political reality. Martinique *is* an integral region of the French Republic (its residents vote in French national elections, use the euro, and are EU citizens), but it *also* has independent representation at certain international bodies, its own internet TLD, its own currency code in some historical contexts, and so on. |
| 14 | + |
| 15 | +When ISO models a place at two levels, this database does too. |
| 16 | + |
| 17 | +## Policy |
| 18 | + |
| 19 | +**Both representations are kept in sync. Neither is canonical; both are first-class.** |
| 20 | + |
| 21 | +For each multi-level territory: |
| 22 | + |
| 23 | +1. There is a row in `contributions/countries/countries.json` (with its own `id`, `iso2`, `iso3`). |
| 24 | +2. There is a row in `contributions/states/states.json` whose `country_code` points at the **parent** state (e.g. `FR`, `US`, `CN`), and whose `iso2` / `state_code` matches the territory. |
| 25 | +3. Cities under the territory live in `contributions/cities/<TERRITORY_ISO2>.json` (e.g. `MQ.json`), and reference both their `country_id` (= the territory) and their `state_id` (= the territory-as-subdivision-of-parent). |
| 26 | + |
| 27 | +## The 12 French Overseas Territories |
| 28 | + |
| 29 | +These are the territories covered by this policy under France. All 12 appear as both `FR` subdivisions and as standalone ISO 3166-1 countries. |
| 30 | + |
| 31 | +| ISO 3166-1 | ISO 3166-2 / INSEE | Name (English) | `countries.id` | `states.id` | State `type` | |
| 32 | +| :--------- | :----------------- | :----------------------------------- | -------------: | ----------: | :------------------------------------------ | |
| 33 | +| `GF` | `FR-GF` / `973` | French Guiana | 76 | 4822 | overseas region | |
| 34 | +| `PF` | `FR-PF` | French Polynesia | 77 | 4824 | overseas collectivity | |
| 35 | +| `TF` | `FR-TF` | French Southern and Antarctic Lands | 78 | 5065 | overseas territory | |
| 36 | +| `GP` | `FR-GP` / `971` | Guadeloupe | 88 | 4829 | overseas region | |
| 37 | +| `MQ` | `FR-MQ` / `972` | Martinique | 138 | 4827 | overseas region | |
| 38 | +| `YT` | `FR-YT` / `976` | Mayotte | 141 | 4797 | overseas region | |
| 39 | +| `NC` | `FR-NC` | New Caledonia | 157 | 5538 | overseas collectivity with special status | |
| 40 | +| `RE` | `FR-RE` / `974` | Réunion | 180 | 4823 | overseas region | |
| 41 | +| `PM` | `FR-PM` | Saint Pierre and Miquelon | 187 | 4821 | overseas collectivity | |
| 42 | +| `BL` | `FR-BL` | Saint-Barthélemy | 189 | 4794 | overseas collectivity | |
| 43 | +| `MF` | `FR-MF` | Saint-Martin (French part) | 190 | 4809 | overseas collectivity | |
| 44 | +| `WF` | `FR-WF` | Wallis and Futuna | 243 | 4810 | overseas collectivity | |
| 45 | + |
| 46 | +> The five **DROM** (Départements et régions d'outre-mer) — `GF`, `GP`, `MQ`, `RE`, `YT` — currently use **INSEE numeric codes** (`971`–`976`) as their `state_code` in `states.json`, while the overseas collectivities use the ISO 3166-2 alphabetic codes. Aligning the DROM to ISO 3166-2 alphabetic codes is tracked separately and is **out of scope** for this policy doc. |
| 47 | +
|
| 48 | +## Why we model both (rationale) |
| 49 | + |
| 50 | +1. **ISO 3166 compliance.** Both representations are present in the standard. Removing either side would make the database fail a strict ISO conformance check that consumers commonly run. |
| 51 | +2. **Downstream-consumer compatibility.** A large portion of API and package consumers — including [`@countrystatecity/countries`](https://www.npmjs.com/package/@countrystatecity/countries), [`countrystatecity-countries` (PyPI)](https://pypi.org/project/countrystatecity-countries/), and the [REST API](https://countrystatecity.in/) — filter and key off `country_code`. Code in the wild does things like `country_code === 'MQ'` to fetch all Martinique cities. Deleting `MQ` as a country would break those queries silently. |
| 52 | +3. **Routing & locale data.** Phone codes, currencies, TLDs, and timezones often differ between an overseas territory and its metropolitan parent (e.g. `NC` uses `XPF`, not `EUR`; `PF` is `UTC-10`/`-9:30`/`-9`, not `UTC+1`). The country-level row carries that metadata. |
| 53 | +4. **Geographic reality at the city level.** Cities physically located in Saint-Denis (Réunion) cannot be in the same bounding box as Paris. The country-level partition keeps coordinate validation (`.github/scripts/validate-coordinates.js`) honest. |
| 54 | +5. **Reversibility.** Keeping both is additive. A future maintainer who decides to collapse one side can do so cleanly. The reverse — restoring deleted records and back-filling foreign keys across 153k+ cities — is not cleanly reversible. |
| 55 | + |
| 56 | +In short: removing the country-level record (Option A in #1352) is a **breaking change** to fix a **labelling concern**, and the cost/benefit doesn't justify it. |
| 57 | + |
| 58 | +## How downstream consumers should query |
| 59 | + |
| 60 | +Pick the model that matches the question being asked. |
| 61 | + |
| 62 | +### "Give me everything in the French Republic" (metropolitan + DROM + collectivities) |
| 63 | + |
| 64 | +Use the `FR` country, then traverse via `state_id`: |
| 65 | + |
| 66 | +```sql |
| 67 | +SELECT c.* |
| 68 | +FROM cities c |
| 69 | +JOIN states s ON c.state_id = s.id |
| 70 | +WHERE s.country_code = 'FR'; -- includes all 12 overseas territories |
| 71 | +``` |
| 72 | + |
| 73 | +This works because every overseas territory has a state row whose `country_code = 'FR'`. |
| 74 | + |
| 75 | +### "Give me only Martinique" (the territory in isolation) |
| 76 | + |
| 77 | +Filter by the territory's own ISO 3166-1 country code: |
| 78 | + |
| 79 | +```sql |
| 80 | +SELECT * FROM cities WHERE country_code = 'MQ'; |
| 81 | +``` |
| 82 | + |
| 83 | +This is the form most API/SDK consumers already use, and it is the form this policy is designed to preserve. |
| 84 | + |
| 85 | +### "Give me metropolitan France only" (exclude overseas) |
| 86 | + |
| 87 | +Exclude the 12 overseas codes explicitly. The metropolitan vs. overseas split is a political/administrative distinction, not a data-model distinction: |
| 88 | + |
| 89 | +```sql |
| 90 | +SELECT * FROM cities |
| 91 | +WHERE country_code = 'FR' |
| 92 | + AND state_code NOT IN ('GF','PF','TF','GP','MQ','YT','NC','RE','PM','BL','MF','WF', |
| 93 | + '971','972','973','974','976'); -- INSEE for DROM |
| 94 | +``` |
| 95 | + |
| 96 | +> A future cleanup may add a `metropolitan` boolean or `is_overseas` flag on `states` to make this query simpler. Not in scope here. |
| 97 | +
|
| 98 | +## Precedent: this is not new |
| 99 | + |
| 100 | +The same dual-representation already applies to several other countries in this database. The `FR` work in #1352 brings France in line with the existing pattern. |
| 101 | + |
| 102 | +| Parent | Territory | `countries.iso2` | `states.id` | Notes | |
| 103 | +| :----- | :----------------------------------------- | :--------------- | ----------: | :-------------------------------------------- | |
| 104 | +| `CN` | Hong Kong SAR | `HK` | 2267 | special administrative region | |
| 105 | +| `CN` | Macau SAR | `MO` | 2266 | special administrative region | |
| 106 | +| `US` | Puerto Rico | `PR` | 1449 | outlying area | |
| 107 | +| `US` | Guam | `GU` | 1412 | outlying area | |
| 108 | +| `US` | American Samoa | `AS` | 1424 | outlying area | |
| 109 | +| `US` | Northern Mariana Islands | `MP` | 1431 | outlying area | |
| 110 | +| `US` | U.S. Virgin Islands | `VI` | 1413 | outlying area | |
| 111 | +| `US` | U.S. Minor Outlying Islands | `UM` | 1432 | outlying area | |
| 112 | +| `NO` | Svalbard | `SJ` (shared) | 1013 | arctic region; Jan Mayen is state `1026` | |
| 113 | + |
| 114 | +### Known gaps (territories *only* modeled as countries) |
| 115 | + |
| 116 | +For honest scoping, the following ISO 3166-1 entries are *not* currently dual-modeled as states of their administering country in this database. They may be candidates for similar treatment in the future, but are **out of scope** for this policy: |
| 117 | + |
| 118 | +| Code | Name | Administered by | |
| 119 | +| :----- | :------------------------------------ | :------------------- | |
| 120 | +| `GL` | Greenland | Denmark (`DK`) | |
| 121 | +| `FO` | Faroe Islands | Denmark (`DK`) | |
| 122 | +| `AW` | Aruba | Netherlands (`NL`) | |
| 123 | +| `CW` | Curaçao | Netherlands (`NL`) | |
| 124 | +| `SX` | Sint Maarten (Dutch part) | Netherlands (`NL`) | |
| 125 | +| `BQ` | Bonaire, Sint Eustatius and Saba | Netherlands (`NL`) | |
| 126 | +| `AX` | Åland Islands | Finland (`FI`) | |
| 127 | +| `BV` | Bouvet Island | Norway (`NO`) | |
| 128 | +| `CC` | Cocos (Keeling) Islands | Australia (`AU`) | |
| 129 | +| `CX` | Christmas Island | Australia (`AU`) | |
| 130 | +| `NF` | Norfolk Island | Australia (`AU`) | |
| 131 | +| `HM` | Heard Island and McDonald Islands | Australia (`AU`) | |
| 132 | +| `GG` | Guernsey | British Crown | |
| 133 | +| `JE` | Jersey | British Crown | |
| 134 | +| `IM` | Isle of Man | British Crown | |
| 135 | + |
| 136 | +These cases differ from the FR/US/CN ones because each carries its own political nuances (degree of autonomy, EU status, currency union, treaty arrangements). A blanket dual-modeling rule is **not** being adopted here. Each future case should be decided on its own merits, ideally tracked against an explicit issue. |
| 137 | + |
| 138 | +## Future considerations |
| 139 | + |
| 140 | +### Option A (full reclassify) — out of scope, documented for posterity |
| 141 | + |
| 142 | +The original reporter on [#1352](https://github.com/dr5hn/countries-states-cities-database/issues/1352) suggested that Martinique (and by extension the other overseas territories) should be **only** a French region — i.e., delete the country-level row. |
| 143 | + |
| 144 | +We considered this. It was rejected for the rationale-section reasons above. If a future maintainer revisits this decision, the migration would need: |
| 145 | + |
| 146 | +1. A deprecation notice across all consumers (NPM, PyPI, REST API, Export Tool) with a 6+ month lead time. |
| 147 | +2. A back-fill migration that rewrites every `cities.country_id` / `cities.country_code` reference for the 12 territories. |
| 148 | +3. A versioning strategy on the database (e.g. `world.v2.sql`) so that consumers pinned to the old shape don't break. |
| 149 | +4. Updates to `country-bounds.json` and the coordinate validator to handle the new "FR includes both metropolitan and tropical" bounding box (otherwise every Martinique city would fail validation). |
| 150 | +5. Coordinated releases with [csc-export-tool](https://github.com/dr5hn/csc-export-tool), [countrystatecity-countries (NPM)](https://github.com/dr5hn/countrystatecity-countries), and [countrystatecity-pypi](https://github.com/dr5hn/countrystatecity-pypi). |
| 151 | + |
| 152 | +That work is large and risky, and the **current dual-representation is internally consistent and ISO-compliant**. There is no functional defect today; only a labelling preference. |
| 153 | + |
| 154 | +### Smaller follow-ups that *are* in scope (separate issues / PRs) |
| 155 | + |
| 156 | +- Align the DROM `state_code` from INSEE numeric (`971`–`976`) to ISO 3166-2 alphabetic (`FR-GF`, `FR-GP`, …). |
| 157 | +- Consider an `is_overseas` or `is_dependency` boolean on `states` so the metropolitan/overseas split is a simple filter rather than an explicit `IN (…)` list. |
| 158 | +- Decide, case-by-case, whether to dual-model any of the "known gaps" above (Greenland, Aruba, etc.). |
| 159 | + |
| 160 | +## Cross-references |
| 161 | + |
| 162 | +- Issue: [#1352 — France data: missing cities and regions misclassified](https://github.com/dr5hn/countries-states-cities-database/issues/1352) |
| 163 | +- Maintainer docs: [`.claude/CLAUDE.md`](.claude/CLAUDE.md) — see the *Important Rules* section. |
| 164 | +- Contributor docs: [`contributions/README.md`](contributions/README.md) |
| 165 | +- Related ISO standards: [ISO 3166-1](https://www.iso.org/iso-3166-country-codes.html) (countries), [ISO 3166-2](https://www.iso.org/standard/72483.html) (subdivisions). |
0 commit comments