-
Notifications
You must be signed in to change notification settings - Fork 0
API structure
There are several different IMF agencies from which you can request resources:
unique(unlist(purrr::map(all_agencies_codelists$data$codelists,function (x) x$agency))) [1] "SDMX" "IMF.STA" "IMF" "ISORA" "IMF.RES"
[6] "IMF.SPR" "IMF_STA" "IMF.STA.DS" "IAEG-SDGs" "IMF.FAD"
[11] "IMF.EUR" "IMF.MCD" "SURVEY" "IMF.APD" "IMF.WHD"
[16] "IMF.MCM" "IMF.AFR" "SYSTEM" "IMF.HRD"The two main ones seem to be "IMF.STA" and "IMF". In general, resources shared by many datasets seem to live in "IMF", while resources specific to a particular dataset to live in "IMF.STA" or one of the other subagencies.
Most of the "dataflows" (which are basically aggregations of datasets, with their definitions) live in "IMF.STA", but there are a few under other agencies:
all_dataflows <- imf_perform_request("structure/dataflow/all/*/+")[["data"]][["dataflows"]]
table(sapply(all_dataflows,function(x) x$agencyID))IMF.AFR IMF.APD IMF.FAD IMF.MCD IMF.MCM IMF.RES IMF.STA IMF.WHD ISORA
1 1 5 1 1 9 48 1 3Datastructure definitions, or DSDs, live in the same agency as their corresponding dataflows. There is one DSD per dataflow. Their identifiers consist of the prefix "DSD_" followed by the dataflow id. So, for example, we can get the DSD for the GFS (Government Finance Statistics) like this:
gfs_dsd <- imf_perform_request("structure/datastructure/all/DSD_GFS/+")[["data"]][["dataStructures"]]This gives us a list of named lists, where each named list has the same ten keys:
names(gfs_dsd[[1]]) [1] "annotations" "id"
[3] "name" "names"
[5] "description" "descriptions"
[7] "version" "agencyID"
[9] "dataStructureComponents" "metadata"The annotations key has the date when the dataset was last updated, and metadata has a foreign key reference to a metadata identifier we may want to look at:
gfs_dsd[[1]]$metadata[1] "urn:sdmx:org.sdmx.infomodel.metadatastructure.MetadataStructure=IMF:MSD_REF_IMF_DATASET(2.0+.0)"Otherwise, we're mostly interested in dataStructureComponents:
names(gfs_dsd[[1]]$dataStructureComponents)[1] "attributeList" "dimensionList" "groups" "measureList"- The
attributeListlists foreign key references to "concept schemes" relevant to the dataset, like "IMF.STA:CS_GFS(12.0+.0).VALUATION". -
groupsseems to associate the "INDICATOR" dimension with the "GFS_GRP", though I'm not sure how this is meant to be used. -
measureListgives us the id of the measured outcome variable (usually "OBS_VALUE") and a foreign key reference to the corresponding conceptIdentity ("IMF:CS_MASTER_SYSTEM(1.0+.0).OBS_VALUE"). -
dimensionListis the primary thing we're interested in. Here's where we'll find thedimensionsandtimeDimensions(that is, the column names) of the dataset:
jsonlite::toJSON(gfs_dsd[[1]]$dataStructureComponents$dimensionList$timeDimensions[[1]], pretty=T){
"annotations": [],
"id": ["TIME_PERIOD"],
"conceptIdentity": ["urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=IMF:CS_MASTER_SYSTEM(1.0+.0).TIME_PERIOD"],
"localRepresentation": {
"format": {
"dataType": ["ObservationalTimePeriod"]
}
},
"position": [6],
"type": ["TimeDimension"]
}jsonlite::toJSON(gfs_dsd[[1]]$dataStructureComponents$dimensionList$dimensions[[1]], pretty=T){
"annotations": [],
"id": ["COUNTRY"],
"conceptIdentity": ["urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=IMF:CS_MASTER_DATA(1.0+.0).COUNTRY"],
"position": [0],
"type": ["Dimension"],
"conceptRoles": []
}Note the concept scheme foreign keys, which will probably be necessary to get the codelist corresponding to each dimension. You'd think that the dimension list would expose a foreign key directly to the corresponding codelist, but that would be too easy. 🙄 Unfortunately, we can't get the codelist by using the references query parameter, either. The following query works, but it returns the same data as if we had omitted the query parameter:
imf_perform_request("structure/datastructure/IMF.STA/DSD_GFS/+/COUNTRY",query_params=c("references" = "codelist"))Likely, we can only fetch references that are directly linked in the returned data (mostly concept schemes), not references that are two or three steps removed (such as any codelists that may be linked within the parent dataflow or linked concept schemes).
There are codelists under various agencies, with the majority under "IMF.STA", and "IMF" being the next largest grouping:
all_codelists <- imf_perform_request("structure/codelist/all/")[["data"]][["codelists"]]
table(sapply(all_codelists, function(x) x$agencyID)) IAEG-SDGs IMF IMF_STA IMF.AFR IMF.APD IMF.EUR IMF.FAD
17 57 5 3 2 3 17
IMF.HRD IMF.MCD IMF.MCM IMF.RES IMF.SPR IMF.STA IMF.STA.DS
1 3 2 35 9 224 29
IMF.WHD ISORA SDMX SURVEY SYSTEM
2 7 10 5 1The codelists under "IMF" seem to be shared by multiple datasets. Their id field is the dimension name prefixed by "CL_":
imf_codelists <- imf_perform_request("structure/codelist/IMF/")[["data"]][["codelists"]]
sapply(imf_codelists, function(x) x$id) [1] "CL_SEX" "CL_OVERLAP"
[3] "CL_FREQ" "CL_HS_2022"
[5] "CL_INDEX_TYPE" "CL_CURRENCY"
[7] "CL_METHODOLOGY" "CL_COFOG"
[9] "CL_PUB_LABELS" "CL_PRICES"
[11] "CL_CIVIL_STATUS" "CL_TRANSACTION_TYPE"
[13] "CL_TOPIC" "CL_EXRATE"
[15] "CL_UNIT_MULT" "CL_ACCOUNTING_ENTRY"
[17] "CL_GFS_STO" "CL_S_ADJUSTMENT"
[19] "CL_FUNCTIONAL_CAT" "CL_COICOP_1999"
[21] "CL_INSTR_ASSET" "CL_SEC_CLASSIFICATION"
[23] "CL_TRANSFORMATION" "CL_FI_MATURITY"
[25] "CL_CLASSIFICATION_TYPE" "CL_COUNTRY"
[27] "CL_DEPARTMENT" "CL_OBS_STATUS"
[29] "CL_ACCOUNTS" "CL_MFS_INSTR"
[31] "CL_ACTIVITY_ISIC4" "CL_CONTENT_TYPE"
[33] "CL_INT_TTC" "CL_UNIT_VINTAGE"
[35] "CL_DECIMALS" "CL_REPORTING_PERIOD_TYPE"
[37] "CL_SECTOR" "CL_FSENTRY"
[39] "CL_COMMODITY" "CL_DERIVATION_TYPE"
[41] "CL_STATISTICAL_MEASURES" "CL_VALUATION"
[43] "CL_ACCESS_SHARING_LEVEL" "CL_NA_STO"
[45] "CL_TIME_OF_RECORDING" "CL_FA_INDICATORS"
[47] "CL_INT_ACC_ITEM" "CL_PUBLISHER_TYPE"
[49] "CL_CONF_STATUS" "CL_ORGANIZATION"
[51] "CL_COICOP_2018" "CL_TRADE_FLOW"
[53] "CL_LANGUAGE" "CL_SOC_CONCEPTS"
[55] "CL_UNIT" "CL_GENDER"
[57] "CL_REVISION_TYPE"In contrast, the codelists owned by "IMF.STA" use the "CL_" prefix followed by the dataset id, before the underscore-separated dimension name:
imf_sta_codelists <- imf_perform_request("structure/codelist/IMF.STA/")[["data"]][["codelists"]]
sapply(imf_sta_codelists, function(x) x$id)[1:20] [1] "CL_NSDP_NSDP_INDICATOR" "CL_ITG_INDICATOR"
[3] "CL_GS_ATF" "CL_PPI_INDICATOR"
[5] "CL_CCI_INDICATOR" "CL_FSI_CONSOLIDATION_BASIS"
[7] "CL_NDGAIN_COUNTRY" "CL_UNFCCC_TYPE_OF_TRANSFORMATION"
[9] "CL_GS_CGI" "CL_WPGTA_NIPO_COUNTRY"
[11] "CL_EQSITC_DISSEMINATION_UNIT_CODES" "CL_ED_DISSEMINATION_INDICATORS"
[13] "CL_GS_HEALTH" "CL_IMTS_COUNTRY"
[15] "CL_ITG_TYPE_OF_TRANSFORMATION" "CL_WPGTA_NIPO_INITIAL_ASSESSMENT"
[17] "CL_FA_INDICATOR" "CL_MFS_FMP_INDICATOR"
[19] "CL_CO2E_INDICATOR" "CL_FA_COUNTRY"Annoyingly, not all agencies honor ge:/le: filtering by TIME_PERIOD. It appears that IMF.STA supports this, while other agencies don't.