Skip to content

[Bug]: Duplicate WikidataId for different cities #1318

@LodiAleardo

Description

@LodiAleardo

Issue Type

Incorrect Data (wrong information)

Location (if applicable)

World

What's wrong?

I've found a lot of duplicates wikidataID, I've found a few with a wrong wikidataID but a real one exists (so it should not be none)

duplicates.txt

As an example Q3116994 matched 48 entry in LC.json (or in cities.csv). I've used this code to generate the dump file

data_folder = "countries-states-cities-database/contributions/cities"
all_cities = []

for filename in os.listdir(data_folder):
    if filename.endswith(".json"):  # load only JSON files
        with open(os.path.join(data_folder, filename), "r", encoding="utf-8") as f:
            content = json.load(f)
            all_cities.extend(content)

wikidata_viewed = set()

for idx, city in enumerate(all_cities):
    if city["wikiDataId"] is None:
        continue
    if city["wikiDataId"] in wikidata_viewed:
        print("Duplicate wikidata id:", city["wikiDataId"])
        continue
    wikidata_viewed.add(city["wikiDataId"])
    continue

What should it be?

In my opinion if we could not get the correct data None/null should be a better wikidataID, not duplicates

Source (optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions