Imports a cleaned dataset and associated data sources, variables, and data points into the MySQL database.
Example usage:
from standard_importer import import_dataset
dataset_dir = "worldbank_wdi"
dataset_namespace = "worldbank_wdi@2021.05.25"
import_dataset.main(dataset_dir, dataset_namespace)
import_dataset.main(...) expects a set of CSV files to exist in {DATASET_DIR}/output/ (e.g. worldbank_wdi/output):
distinct_countries_standardized.csvdatasets.csvsources.csvvariables.csvdatapoints/data_points_{VARIABLE_ID}.csv(onedata_points_{VARIABLE_ID}.csvfile for each variable invariables.csv)
Inside the dataset directory (e.g. vdem), data must be located in an output directory, with the following structure:
(see worldbank_wdi/output for an example)
This file lists all entities present in the data, so that new entities can be created if necessary. Located in output/distinct_countries_standardized.csv:
name: name of the entity.
Located in output/datasets.csv:
id: temporary dataset ID for loading processname: name of the Grapher dataset
Located in output/sources.csv:
id: temporary source ID for loading processname: name of the sourcedescription: JSON object withdataPublishedBy(string),dataPublisherSource(string),link(string),retrievedDate(string),additionalInfo(string)dataset_id: foreign key matching each source with a dataset ID
Located in output/variables.csv:
dataset_id: foreign key matching each variable with a dataset IDsource_id: foreign key matching each variable with a source IDid: temporary variable ID for loading processname: name of the variabledescription: long description of the variablecode: original variable code used by the data sourceunit: unit of measurementshort_unit: short unit of measurement, for chart axis displaytimespan: timespan covered by the variablecoverage: type of geographical coveragedisplay: JSON object that defines how the variable should be displayedoriginal_metadata: JSON object representing original uncleaned metadata from the data source
Located in output/datapoints/datapoints_{VARIABLE_ID}.csv:
{VARIABLE_ID}in the file name is a foreign key matching values with a temporary variable ID invariables.csvcountry: location of the observationyear: year of the observationvalue: value of the observation