Skip to content

Latest commit

Β 

History

History
316 lines (260 loc) Β· 14.8 KB

File metadata and controls

316 lines (260 loc) Β· 14.8 KB

Contributing to Country State City Database

πŸ‘πŸŽ‰ First off, thanks for taking the time to contribute! πŸŽ‰πŸ‘

The following is a set of guidelines for contributing to Contributing to Country State City Database. These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request.

How Can I Contribute?

🎯 Recommended: Using JSON Contribution Files (Easy for Everyone!)

We've made contributing easier! You can now edit simple JSON files organized by country:

Adding or Editing Cities

  • Navigate to contributions/cities/ directory
  • Find your country file (e.g., US.json for United States, IN.json for India)
  • Add or edit cities in easy-to-read JSON format

Adding a NEW city (omit the id field):

{
    "name": "New City",
    "state_id": 1416,
    "state_code": "CA",
    "country_id": 233,
    "country_code": "US",
    "latitude": "37.77490000",
    "longitude": "-122.41940000",
    "timezone": "America/Los_Angeles"
}

Editing an EXISTING city (keep the id field):

{
    "id": 1234,
    "name": "Updated City Name",
    "state_id": 1416,
    ...
}

Adding or Editing Countries

  • Edit contributions/countries/countries.json
  • Omit id field for new countries

Adding or Editing States

  • Edit contributions/states/states.json
  • Omit id field for new states

πŸ“– For detailed instructions, see contributions/README.md

After Making Changes

Simply create a pull request! You don't need to run any build scripts locally.

What happens next:

  1. Maintainers review your JSON changes
  2. GitHub Actions automatically imports to MySQL (IDs are auto-assigned)
  3. All export formats are regenerated from the MySQL database
  4. Your PR is updated with all export files

Important for Contributors:

  • βœ… DO: Edit JSON files in contributions/ directory
  • ❌ DON'T: Edit SQL, CSV, XML, YAML, or other export files (auto-generated)
  • ❌ DON'T: Edit GeoJSON or TOON format files (auto-generated from database)
  • ❌ DON'T: Run build scripts or exports locally (GitHub Actions handles this)
  • πŸ”’ MySQL workflow: Reserved for repository maintainers only

Understanding Export Formats

All data you contribute via JSON is automatically exported to 11+ formats:

  • Core Formats: JSON, MySQL, PostgreSQL, SQLite, SQL Server, MongoDB, XML, YAML, CSV
  • Geographic Format: GeoJSON (RFC 7946 standard for mapping applications)
  • AI-Optimized Format: TOON (Token-Oriented Object Notation - reduces LLM token usage by ~40%)
  • Optional Formats: DuckDB (via manual conversion from SQLite)

You don't need to worry about these formats - they're automatically generated from the MySQL database!

Glance at Table Structure

regions.sql

Column Data type Explanation Required
id integer Unique ID - omit for new regions (auto-assigned) Auto
name string The official name of the region. Use WikiData or Wikipedia or some other legitimate source. required
translations text JSON object with region name translations
created_at timestamp Optional - Creation timestamp (ISO 8601 format). If omitted, database uses default value.
updated_at timestamp Optional - Last update timestamp (ISO 8601 format). If omitted, database auto-updates.
flag boolean Optional - Auto-managed by system, defaults to 1. Contributors can omit this field.
wikiDataId string The unique ID from wikiData.org

subregions.sql

Column Data type Explanation Required
id integer Unique ID - omit for new subregions (auto-assigned) Auto
name string The official name of the subregion. Use WikiData or Wikipedia or some other legitimate source. required
translations text JSON object with subregion name translations
region_id integer Unique id of region from regions.sql required
created_at timestamp Optional - Creation timestamp (ISO 8601 format). If omitted, database uses default value.
updated_at timestamp Optional - Last update timestamp (ISO 8601 format). If omitted, database auto-updates.
flag boolean Optional - Auto-managed by system, defaults to 1. Contributors can omit this field.
wikiDataId string The unique ID from wikiData.org

countries.sql

Column Data type Explanation Required
id integer Unique ID - omit for new countries (auto-assigned) Auto
name string The official name of the country. Use WikiData or Wikipedia or some other legitimate source. required
iso3 string ISO3166-1 alpha-3 code of the country
numeric_code string Numeric ISO code - Source ISO Official Site
iso2 string ISO3166-1 alpha-2 code of the country
phonecode string Phone code of the country
capital string Capital city of the country
currency string Currency code of the country
currency_name string Currency name of the country
currency_symbol string Currency symbol of the country
tld string Domain code of the country
native string Native name of the country
population integer Population of the country - Wikipedia
gdp integer GDP of the country - Wikipedia
region string A region where country belongs to
region_id integer Unique id of region from regions.sql
subregion string A subregion where country belongs to
subregion_id integer Unique id of subregion from subregions.sql
nationality string Nationality/demonym of the country
timezones text JSON array of timezones
translations text JSON object with country name translations
latitude decimal Latitude coordinates
longitude decimal Longitude coordinates
emoji string A flag emoji icon
emojiU string A flag emoji unicode characters
created_at timestamp Optional - Creation timestamp (ISO 8601 format). If omitted, database uses default value.
updated_at timestamp Optional - Last update timestamp (ISO 8601 format). If omitted, database auto-updates.
flag boolean Optional - Auto-managed by system, defaults to 1. Contributors can omit this field.
wikiDataId string The unique ID from wikiData.org

states.sql

Column Data type Explanation Required
id integer Unique ID - omit for new states (auto-assigned) Auto
name string The official name of the state. Use WikiData or Wikipedia or some other legitimate source. required
state_code string State/province code (e.g., "CA" for California) required
country_id integer Unique id of parent country from countries.sql required
country_code string ISO2 code of the parent country required
fips_code string ISO-3166-2 subdivision code for the state
iso2 string ISO2 code of the parent state
iso3166_2 string ISO 3166-2 subdivision code
type string Type of state (province, state, region, etc.)
level integer Administrative level of the subdivision
parent_id integer ID of parent administrative division
native string Native name of the state
population integer Population of the state - Wikipedia
latitude decimal Latitude coordinates
longitude decimal Longitude coordinates
timezone string IANA timezone identifier (e.g., America/New_York)
translations text JSON object with name translations
created_at timestamp Optional - Creation timestamp (ISO 8601 format). If omitted, database uses default value.
updated_at timestamp Optional - Last update timestamp (ISO 8601 format). If omitted, database auto-updates.
flag boolean Optional - Auto-managed by system, defaults to 1. Contributors can omit this field.
wikiDataId string The unique ID from wikiData.org

cities.sql

Column Data type Explanation Required
id integer Unique ID - omit for new cities (auto-assigned) Auto
name string The official name of the city. Use WikiData or Wikipedia or some other legitimate source. required
state_id integer Unique id of parent state from states.sql required
state_code string ISO code of the parent state required
country_id integer Unique id of parent country from countries.sql required
country_code string ISO2 code of the parent country required
latitude decimal Latitude coordinates required
longitude decimal Longitude coordinates required
native string Native name of the city
population integer Population of the city - Wikipedia
type string Type of settlement (city, town, village, etc.)
level integer Administrative level
parent_id integer ID of parent administrative division
timezone string IANA timezone identifier (e.g., America/New_York) - REQUIRED for all cities
translations text JSON object with name translations
created_at timestamp Optional - Creation timestamp (ISO 8601 format). If omitted, database uses default value.
updated_at timestamp Optional - Last update timestamp (ISO 8601 format). If omitted, database auto-updates.
flag boolean Optional - Auto-managed by system, defaults to 1. Contributors can omit this field.
wikiDataId string The unique ID from wikiData.org

Data Quality Guidelines

Required Data Standards

Timezone Information (Critical!)

  • 100% of cities MUST have valid IANA timezone identifiers
  • Use tools like TimeZoneDB or GeoNames to find correct timezones
  • Format: Continent/City (e.g., America/New_York, Europe/London, Asia/Tokyo)
  • Why it matters: This database maintains 100% timezone coverage - don't break it!

Coordinates Accuracy

  • Use precise decimal coordinates (minimum 5 decimal places recommended)
  • Verify coordinates using Google Maps, OpenStreetMap, or official sources
  • Format:
    • Latitude: -90 to +90 (negative = South, positive = North)
    • Longitude: -180 to +180 (negative = West, positive = East)

Naming Conventions

  • Use official, commonly recognized names in English
  • Add native names in the native field
  • Use proper capitalization (e.g., "New York" not "new york")
  • Avoid abbreviations unless officially used (e.g., "St." in "St. Louis" is acceptable)

Data Sources

Recommended Sources (in priority order):

  1. Official Government Websites - Most authoritative
  2. WikiData (wikidata.org) - Structured, multilingual data
  3. Wikipedia - Well-sourced, community-verified
  4. GeoNames (geonames.org) - Comprehensive geographic database
  5. OpenStreetMap - Community-maintained geographic data

Always include source in your PR description!

Common Mistakes to Avoid

❌ Don't Do This:

  • Adding cities without timezone information
  • Using approximate coordinates (e.g., country center for city location)
  • Copying data without verification
  • Adding duplicate entries (check first!)
  • Using non-standard timezone names (e.g., "PST" instead of "America/Los_Angeles")

βœ… Do This Instead:

  • Research proper IANA timezone for each city
  • Use precise coordinates for the city center or main landmark
  • Verify data from multiple reliable sources
  • Search existing data before adding new entries
  • Use official IANA timezone database format

Population Data (Optional but Recommended)

When adding population data:

  • Use recent census data or official estimates
  • Include source year if possible in PR description
  • Round to reasonable precision (avoid false precision)
  • Format: Integer (e.g., 1000000 not "1,000,000")

How to Find Foreign Keys

Finding State IDs:

# Search in contributions/states/states.json
grep -A 5 '"name": "California"' contributions/states/states.json

Finding Country IDs:

# Search in contributions/countries/countries.json
grep -A 5 '"name": "United States"' contributions/countries/countries.json

Or use the CSC Update Tool which automatically looks up IDs for you!

Pull Request Guidelines

Before Submitting

  • Data verified from authoritative sources
  • Timezone validated using IANA timezone database
  • Coordinates checked on a map
  • No duplicate entries
  • Source included in PR description
  • Only JSON files in contributions/ edited

PR Description Template

## Summary
[Brief description of changes]

## Type of Change
- [ ] New city/state/country
- [ ] Update existing data
- [ ] Fix incorrect data
- [ ] Add missing fields

## Data Sources
- Source 1: [URL]
- Source 2: [URL]

## Checklist
- [ ] Timezones verified
- [ ] Coordinates verified
- [ ] Data sources cited
- [ ] No duplicate entries

Review Process

  1. Automated Checks: GitHub Actions validates JSON format
  2. Data Import: Your changes are imported to MySQL
  3. Export Generation: All 11 formats regenerated
  4. Maintainer Review: Human review of data quality and sources
  5. Merge: Changes go live in next release!

Need Help?

Tools & Resources

Questions?

Recognition

All contributors are recognized in our README and commit history. Thank you for helping maintain the most comprehensive open geographical database! 🌍