-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Issue Type
Incorrect Data (wrong information)
Location (if applicable)
World
What's wrong?
I've found a lot of duplicates wikidataID, I've found a few with a wrong wikidataID but a real one exists (so it should not be none)
As an example Q3116994 matched 48 entry in LC.json (or in cities.csv). I've used this code to generate the dump file
data_folder = "countries-states-cities-database/contributions/cities"
all_cities = []
for filename in os.listdir(data_folder):
if filename.endswith(".json"): # load only JSON files
with open(os.path.join(data_folder, filename), "r", encoding="utf-8") as f:
content = json.load(f)
all_cities.extend(content)
wikidata_viewed = set()
for idx, city in enumerate(all_cities):
if city["wikiDataId"] is None:
continue
if city["wikiDataId"] in wikidata_viewed:
print("Duplicate wikidata id:", city["wikiDataId"])
continue
wikidata_viewed.add(city["wikiDataId"])
continueWhat should it be?
In my opinion if we could not get the correct data None/null should be a better wikidataID, not duplicates
Source (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working