Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions R/codelist.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Country Code Translation Data Frame (Cross-Sectional)
#'
#' A data frame used internally by the [countrycode()] function. `countrycode` can use any valid code as destination, but only some codes can be used as origin.

Check warning on line 3 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=3,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 160 characters.
#'
#' ## Origin and Destination
#'
Expand All @@ -9,12 +9,13 @@
#' - `country.name.de`: country name (German)
#' - `country.name.fr`: country name (French)
#' - `country.name.it`: country name (Italian)
#' - `country.name.es`: country name (Spanish)
#' - `cowc`: Correlates of War character
#' - `cown`: Correlates of War numeric
#' - `dhs`: Demographic and Health Surveys Program
#' - `ecb`: European Central Bank
#' - `eurostat`: Eurostat
#' - `fao`: Food and Agriculture Organization of the United Nations numerical code

Check warning on line 18 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=18,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 82 characters.
#' - `fips`: FIPS 10-4 (Federal Information Processing Standard)
#' - `gaul`: Global Administrative Unit Layers
#' - `genc2c`: GENC 2-letter code
Expand All @@ -22,7 +23,7 @@
#' - `genc3n`: GENC numeric code
#' - `gwc`: Gleditsch & Ward character
#' - `gwn`: Gleditsch & Ward numeric
#' - `imf`: International Monetary Fund (Warning: The IMF generally uses ISO codes. These codes are WEO-related, but may be inconsistently used in the wild.)

Check warning on line 26 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=26,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 157 characters.
#' - `ioc`: International Olympic Committee
#' - `iso2c`: ISO-2 character
#' - `iso3c`: ISO-3 character
Expand All @@ -41,8 +42,8 @@
#'
#' ## Destination only
#'
#' - `cldr.*`: 600+ country name variants from the UNICODE CLDR project (e.g., "cldr.short.en").

Check warning on line 45 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=45,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 96 characters.
#' Inspect the [`cldr_examples`][countrycode::cldr_examples] data.frame for a full list of

Check warning on line 46 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=46,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 92 characters.
#' available country names and examples.
#' - `ar5`: IPCC's regional mapping used both in the Fifth Assessment Report
#' (AR5) and for the Reference Concentration Pathways (RCP)
Expand All @@ -50,7 +51,7 @@
#' - `cow.name`: Correlates of War country name
#' - `currency`: ISO 4217 currency name
#' - `eurocontrol_pru`: European Organisation for the Safety of Air Navigation
#' - `eurocontrol_statfor`: European Organisation for the Safety of Air Navigation

Check warning on line 54 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=54,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 83 characters.
#' - `eu28`: Member states of the European Union (as of December 2015),
#' without special territories
#' - `icao.region`: International Civil Aviation Organization region
Expand All @@ -60,7 +61,7 @@
#' - `iso4217n`: ISO 4217 currency numeric code
#' - `p4.name`: Polity IV country name
#' - `region`: 7 Regions as defined in the World Bank Development Indicators
#' - `region23`: 23 Regions as used to be in the World Bank Development Indicators (legacy)

Check warning on line 64 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=64,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 91 characters.
#' - `telephone`: ITU-T E.164 country codes for telecommunication
#' - `un.name.ar`: United Nations Arabic country name
#' - `un.name.en`: United Nations English country name
Expand All @@ -83,7 +84,7 @@
#' conversion dictionary, this forces us to make arbitrary choices with respect
#' to some entities (e.g., Western Germany, Vietnam, Serbia). `countrycode`
#' includes a reconciled dataset in panel format,
#' [`codelist_panel`][countrycode::codelist_panel]. Instead of converting code, we recommend

Check warning on line 87 in R/codelist.R

View workflow job for this annotation

GitHub Actions / lint

file=R/codelist.R,line=87,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 92 characters.
#' that users dealing with panel data "left-merge" their data into this panel
#' dictionary.
#'
Expand Down
8 changes: 5 additions & 3 deletions R/countrycode.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@
#' [`codelist_panel`][codelist_panel] data.frame as a base into which they can
#' merge their other data. This data.frame includes most relevant code, and is
#' already "reconciled" to ensure that each political unit is only represented
#' by one row in any given year. From there, it is just a matter of using [merge()]

Check warning on line 22 in R/countrycode.R

View workflow job for this annotation

GitHub Actions / lint

file=R/countrycode.R,line=22,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 83 characters.
#' to combine different datasets which use different codes.
#'
#' @param sourcevar Vector which contains the codes or country names to be
#' converted (character or factor)
#' @param origin A string which identifies the coding scheme of origin (e.g., `"iso3c"`). See

Check warning on line 27 in R/countrycode.R

View workflow job for this annotation

GitHub Actions / lint

file=R/countrycode.R,line=27,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 93 characters.
#' [`codelist`][codelist] for a list of available codes.
#' @param destination A string or vector of strings which identify the coding
#' scheme of destination (e.g., `"iso3c"` or `c("cowc", "iso3c")`). See
Expand Down Expand Up @@ -101,11 +101,13 @@
"genc2c", "genc3c", "genc3n", "gwc", "gwn", "imf", "ioc", "iso2c",
"iso3c", "iso3n", "p5c", "p5n", "p4c", "p4n", "un", "un_m49", "unicode.symbol",
"unhcr", "unpd", "vdem", "wb", "wb_api2c", "wb_api3c", "wvs",
"country.name.en.regex", "country.name.de.regex", "country.name.fr.regex", "country.name.it.regex")
"country.name.en.regex", "country.name.de.regex", "country.name.fr.regex",
"country.name.it.regex", "country.name.es.regex")
attr(dictionary, "origin_regex") <- c("country.name.de.regex",
"country.name.en.regex",
"country.name.fr.regex",
"country.name.it.regex")
"country.name.it.regex",
"country.name.es.regex")
} else {
dictionary <- custom_dict
}
Expand All @@ -115,7 +117,7 @@
if (origin == 'country.name') {
origin <- 'country.name.en'
}
if (origin %in% c('country.name.en', 'country.name.de', 'country.name.it', 'country.name.fr')) {
if (origin %in% c('country.name.en', 'country.name.de', 'country.name.it', 'country.name.fr', 'country.name.es')) {
origin <- paste0(origin, '.regex')
}
destination[destination == "country.name"] <- 'country.name.en'
Expand Down
Binary file modified data/codelist.rda
Binary file not shown.
Binary file modified data/codelist_panel.rda
Binary file not shown.
59,150 changes: 29,575 additions & 29,575 deletions dictionary/codelist_panel_without_cldr.csv

Large diffs are not rendered by default.

586 changes: 293 additions & 293 deletions dictionary/codelist_without_cldr.csv

Large diffs are not rendered by default.

586 changes: 293 additions & 293 deletions dictionary/data_regex.csv

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions man/codelist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions tests/testthat/test-corner-cases.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ test_that("netherlands", {
expect_equal(
countrycode("Caraibi olandesi", "country.name.it", "country.name.en"),
"Caribbean Netherlands")
expect_equal(
countrycode("Caribe holandes", "country.name.es", "country.name.en"),
"Netherlands Antilles")
})


Expand Down
22 changes: 22 additions & 0 deletions tests/testthat/test-regex-internal.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ test_that("Italian regex vs. CLDR", {
expect_equal(x, codelist$cldr.name.it)
})

test_that("Spanish regex vs. CLDR", {
x <- countrycode(codelist$cldr.name.es, "country.name.es", "cldr.name.es")
expect_equal(x, codelist$cldr.name.es)
})

test_that("German regex vs. CLDR", {
x <- countrycode(codelist$cldr.name.de, "country.name.de", "cldr.name.de")
expect_equal(x, codelist$cldr.name.de)
Expand All @@ -33,4 +38,21 @@ test_that("French regex vs. CLDR", {
expect_equal(x, codelist$cldr.name.fr)
})

test_that("Spanish regex vs. CLDR 419", {
x <- countrycode(codelist$cldr.name.es_419, "country.name.es", "cldr.name.es_419")
expect_equal(x, codelist$cldr.name.es_419)
})

test_that("Spanish regex vs. CLDR short", {
x <- countrycode(codelist$cldr.short.es, "country.name.es", "cldr.short.es")
expect_equal(x, codelist$cldr.short.es)
})

test_that("Spanish regex vs. CLDR variant", {
x <- countrycode(codelist$cldr.variant.es, "country.name.es", "cldr.variant.es")
expect_equal(x, codelist$cldr.variant.es)
})
test_that("Spanish regex vs. UN", {
x <- countrycode(codelist$un.name.es, "country.name.es", "un.name.es")
expect_equal(x, codelist$un.name.es)
})
5 changes: 5 additions & 0 deletions tests/testthat/test-regex-special.R
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ test_that('Micronesia is not Federated States of Micronesia', {
expect_equal(countrycode('Mikronesien', 'country.name.de', 'iso3c', warn = FALSE), NA_character_)
expect_equal(countrycode('Micron\u00e9sie', 'country.name.fr', 'iso3c', warn = FALSE), NA_character_)
expect_equal(countrycode('Micronesia', 'country.name.it', 'iso3c', warn = FALSE), NA_character_)
expect_equal(countrycode('Micronesia', 'country.name.es', 'iso3c', warn = FALSE), NA_character_)
# unambiguous full English names → FSM
expect_equal(iso3c_of('Federated States of Micronesia'), 'FSM')
expect_equal(iso3c_of('Micronesia, Federated States of'), 'FSM')
Expand All @@ -162,6 +163,10 @@ test_that('Micronesia is not Federated States of Micronesia', {
expect_equal(countrycode('Stati Federati di Micronesia', 'country.name.it', 'iso3c'), 'FSM')
expect_equal(countrycode('Micronesia (Stati Federati di)', 'country.name.it', 'iso3c'), 'FSM')
expect_equal(countrycode('FS Micronesia', 'country.name.it', 'iso3c'), 'FSM')
# Spanish: qualified forms → FSM, FS abbreviation → FSM
expect_equal(countrycode('Estados Federados de Micronesia', 'country.name.es', 'iso3c'), 'FSM')
expect_equal(countrycode('Micronesia (Estados Federados de)', 'country.name.es', 'iso3c'), 'FSM')
expect_equal(countrycode('FS Micronesia', 'country.name.es', 'iso3c'), 'FSM')
})


Expand Down
Loading