This repository was archived by the owner on Nov 12, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 131
This repository was archived by the owner on Nov 12, 2024. It is now read-only.
docs: understanding locations #554
Copy link
Copy link
Open
Description
Good day,
I'm trying to understand the context of place_id in various files. I know that place_id is just an identifier but I have encountered some puzzling things. Before I dive deep into my questions I will start light by asserting my beliefs about the data and how it is joined together. If there are incorrect beliefs please correct them:
- google-research/open-covid-19-data was started before this repo
░░▒█ ~ (main|?1) [2|1]🦋 curl -sS https://api.github.com/repos/GoogleCloudPlatform/covid-19-open-data | grep created_at
"created_at": "2020-07-23T23:43:51Z",
▓█░▒ ~ (main|?1) [0|0]🥞 curl -sS https://api.github.com/repos/google-research/open-covid-19-data | grep created_at
"created_at": "2020-05-21T03:35:01Z",
- The use of place_id was initially driven by the search_trends_symptoms dataset https://github.com/google-research/open-covid-19-data/search?q=place_id
- Not all place_ids in
mobility.csvare expected to be found inaggregated.csv - Not all place_ids in
aggregated.csvare expected to be found inmobility.csv
How does mobility.csv relate to Global_Mobility_Report.csv ?
They seem to be talking about exactly the same thing...
- https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/docs/table-mobility.md
- https://www.google.com/covid19/mobility/data_documentation.html
But it seems like they are different data products entirely:
sqlite-utils memory Global_Mobility_Report.csv "select count(distinct place_id) from t1"
[{"count(distinct place_id)": 13249}]
sqlite-utils memory mobility.csv "select count(distinct location_key) from t1"
[{"count(distinct location_key)": 7351}]
as well as with aggregated.csv:
xsv select place_id aggregated.csv | sort --unique > aggregated_place_ids.csv
xsv select place_id Global_Mobility_Report.csv | sort --unique > Global_Mobility_Report_place_ids.csv
combine aggregated_place_ids.csv not Global_Mobility_Report_place_ids.csv | count
14283
combine Global_Mobility_Report_place_ids.csv not aggregated_place_ids.csv | count
5913
Metadata
Metadata
Assignees
Labels
No labels