A comprehensive dataset documenting why things in Madrid are named the way they are — streets, metro stations, Cercanias stations, Metro Ligero stops, neighbourhoods, districts, plazas, and more.
Phase 1: Research (complete) — Identified 140+ sources worldwide: datasets, books, academic papers, blogs, institutional programs. Phase 2: Dataset Design (complete) — Schema designed, all CSV files structured and validated. Phase 3: Data Collection (in progress) — 451 hand-curated entries with etymology (all transport lines covered) + 1,427 programmatic entries from OSM + 1,559 from Wikidata.
├── README.md # This file
├── SOURCES.md # Master catalogue of all 140+ sources
├── data/
│ ├── schema.md # Dataset schema documentation
│ ├── madrid_metro_stations.csv # 243 Metro stations with etymology (ALL lines 1-12 + Ramal)
│ ├── madrid_cercanias_stations.csv # 77 Cercanias stations with etymology (all lines)
│ ├── madrid_metro_ligero_stations.csv # 48 Metro Ligero + Tranvia stations (ALL lines)
│ ├── madrid_districts.csv # All 21 distritos with etymology
│ ├── madrid_neighbourhoods.csv # 20 key barrios with etymology
│ ├── madrid_plazas_parks.csv # 14 plazas, parks, monuments
│ ├── madrid_streets.csv # 30 seed streets with etymology
│ ├── osm/
│ │ ├── madrid_streets_etymology.json # Raw OSM Overpass data (6.5 MB)
│ │ └── madrid_osm_etymology_processed.csv # 1,427 unique streets with Wikidata IDs
│ └── wikidata/
│ └── madrid_streets_named_after.csv # 1,559 streets with biographical data
├── docs/ # Interactive web visualization (GitHub Pages)
│ ├── index.html
│ ├── app.js
│ ├── style.css
│ └── data/ # Prebuilt JSON (generated by scripts/build_site.py)
├── research/
│ ├── street-name-origins.md # Street name datasets worldwide (73 sources)
│ ├── transport-stations.md # Metro/Cercanias/ML deep research (67 KB)
│ ├── correspondencia-con-la-historia.md # 55 RAH biographical metro stations
│ ├── toponymy.md # Neighbourhood/district toponymy
│ ├── international.md # International projects & models
│ ├── physical-books-guide.md # Short book lookup guide for non-verified stations
│ ├── probable-reference-queue.md # Batch queue for probable station references
│ ├── station-etymology-status.md # Coverage tracking per station
│ ├── uncertain-stations-deep-research.md # Deep research on contested etymologies
│ ├── wikidata-audit-report.md # Wikidata QID verification results
│ └── gmaps-place-id-audit.md # Google Maps place ID audit
└── scripts/
├── build_site.py # Build docs/data JSON from CSV sources
├── fetch_gmaps_place_ids.py # Fetch Google Maps place IDs for stations
├── fetch_osm_etymology.sh # Query OSM Overpass for etymology data
├── query_wikidata.sh # Query Wikidata SPARQL for "named after"
└── process_osm_data.py # Deduplicate OSM ways into unique streets
| File | Entries | Coverage |
|---|---|---|
madrid_metro_stations.csv |
243 | ALL Metro stations (lines 1-12 + Ramal), all with etymology |
madrid_cercanias_stations.csv |
77 | Cercanias Madrid stations (all lines), all with etymology |
madrid_metro_ligero_stations.csv |
48 | ALL ML1/ML2/ML3 + Tranvia de Parla stations |
madrid_districts.csv |
21 | All 21 distritos |
madrid_neighbourhoods.csv |
20 | Key barrios |
madrid_plazas_parks.csv |
14 | Major plazas, parks, monuments |
madrid_streets.csv |
30 | Seed streets (hand-curated) |
osm_etymology_processed.csv |
1,427 | Unique streets with Wikidata etymology IDs (from OSM) |
wikidata_streets_named_after.csv |
1,559 | Streets with full biographical "named after" data |
| Total hand-curated | 453 | Transport stations + districts + neighbourhoods + streets |
| Total with any etymology | ~2,000 | (OSM and Wikidata datasets overlap significantly) |
No single open dataset exists for Madrid name origins. Paris, Barcelona, and Donostia have structured municipal etymology datasets. Madrid does not. This project fills that gap.
| Book | Author | Year | What |
|---|---|---|---|
| Metro de Madrid: Por que sus estaciones se llaman asi? | Jose Felipe Alonso Fernandez-Checa | 2023 | Focused Metro station-name cross-check; physical/bookshop item |
| El alma del suburbano madrileno (565pp) | Olivares & Molina / Comunidad de Madrid | 2025 | First 42 stations 1919-1944 (FREE PDF at COAM) |
| 100 Anos de la Linea Norte-Sur: De Cuatro Caminos a Sol | Antonio Martinez Moreno | 2019 | Line 1 first 8 stations (Kindle available) |
| De Sol al Puente de Vallecas | Antonio Martinez Moreno | 2021 | Line 1 southern extension |
| Metro de Madrid 1919-2009: Noventa anos de historia (544pp) | Aurora Moya Rodriguez | 2009 | Definitive network history |
| Toponimia madrilena: proceso evolutivo (2 vols + CD) | Luis Miguel Aparisi Laporta | 2001 | 20,000+ Madrid toponyms (streets + places) |
| Los nombres de las calles de Madrid | Maria Isabel Gea Ortigas | 1993-2020 | 1,000+ street name origins |
| Historias de las calles de Madrid (456pp) | Jose Luis Rodriguez-Checa | 2021 | 959 streets with origins |
Metro de Madrid + Real Academia de la Historia collaboration (2015-2016). Biographical panels installed in 55 confirmed stations (of 58 announced) named after historical figures, with QR codes to RAH's Diccionario Biografico Espanol. Full documentation in research/correspondencia-con-la-historia.md.
See SOURCES.md for the complete catalogue of 140+ sources.
| Type | Count | Examples |
|---|---|---|
| place | 91 | Bilbao, Colombia, Ibiza, Oporto, Alcorcon Central, Ciudad de los Angeles |
| person | 66 | Goya, Tirso de Molina, Paco de Lucia, Miguel Hernandez, La Elipa |
| descriptive | 44 | Cuatro Caminos, Almendrales, Arroyofresno, Arroyo Culebro |
| historical | 21 | Sol, Embajadores, Tribunal, Casa del Reloj, La Peseta |
| religious | 17 | Noviciado, Iglesia, Santo Domingo, San Nicasio, Pan Bendito |
| event | 3 | Callao, Tetuan, Aviacion Espanola |
Corrections and additions are welcome. If you find a mistake in an etymology, a missing station, or an unsourced claim:
- Report an error: Open a GitHub issue with the entry name and what needs fixing. Please include a source if you have one.
- Suggest a new entry: Open an issue or pull request. New entries should follow the schema and cite at least one source in the
sourcefield. - Improve an existing entry: Pull requests to the CSV files are welcome. Use the
confidencefield to reflect certainty:verified(primary source consulted),probable(multiple secondary sources),uncertain(contested or speculative).
All contributions are released under the same CC-BY-SA 4.0 license.
Dataset: CC-BY-SA 4.0 (compatible with OSM/Wikidata sources)