Skip to content

make sure output has articles first, then redirects for each article#90

Merged
mtmail merged 1 commit intoosm-search:masterfrom
mtmail:sort-output-by-type-too
Nov 14, 2025
Merged

make sure output has articles first, then redirects for each article#90
mtmail merged 1 commit intoosm-search:masterfrom
mtmail:sort-output-by-type-too

Conversation

@mtmail
Copy link
Copy Markdown
Contributor

@mtmail mtmail commented Nov 14, 2025

So far the output was sorted by title only, so the city of Athens, Greece (Q1524) printed

zgrep ^en wikimedia_importance.tsv.gz | grep Q1524$
en    r    Ahtens    0.7868465180438662    Q1524
en    r    Aktiki    0.7868465180438662    Q1524
en    r    Atenás    0.7868465180438662    Q1524
en    r    Athenae    0.7868465180438662    Q1524
en    r    Athenai    0.7868465180438662    Q1524
en    r    Athēnai    0.7868465180438662    Q1524
en    r    Athence    0.7868465180438662    Q1524
en    r    Athenes    0.7868465180438662    Q1524
en    r    Athénes    0.7868465180438662    Q1524
en    r    Athènes    0.7868465180438662    Q1524
en    r    Athenian    0.7868465180438662    Q1524
en    r    Athenians    0.7868465180438662    Q1524
en    a    Athens    0.7868465180438662    Q1524
en    r    Athens,_Attica    0.7868465180438662    Q1524
en    r    Athens_Basin    0.7868465180438662    Q1524
en    r    Athens_city_center    0.7868465180438662    Q1524

Nominatim's tools/refresh.py imports only one (the first) title per wikidata_id.

In this case that would be a common typo, there was never an article for Ahtens in wikipedia. Earlier this year it was Aktiki, a historical name from hundreds of years ago.

image

We want to make sure the article comes first:

en	a	Athens	0.7868465180438662	Q1524
en	r	Ahtens	0.7868465180438662	Q1524
en	r	Aktiki	0.7868465180438662	Q1524
en	r	Atenás	0.7868465180438662	Q1524
en	r	Athenae	0.7868465180438662	Q1524
en	r	Athenai	0.7868465180438662	Q1524
en	r	Athēnai	0.7868465180438662	Q1524
en	r	Athence	0.7868465180438662	Q1524

I would've added more logic into Nominatim's import logic (`tools/refresh.py) but Nominatim doesn't require the type column.

        The file must be a gzipped CSV and have the following columns:
        language, title, importance, wikidata_id

        Other columns may be present but will be ignored.

@mtmail mtmail merged commit 9613b03 into osm-search:master Nov 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant