GitHub - AIandGlobalDevelopmentLab/DHS-harmonisation: Wrapper to Hans Ekbrand's globallivingconditions R package for harmonising DHS survey data

This code collects and aggregates data from all DHS surveys (that the user has access to via https://dhsprogram.com/data/). It compiles all DHS surveys into a single dataset at the person-level. We call this dataset the harmonised dataset. This dataset contains the international wealth index (IWI) for each DHS household, along with survey design information and some other key DHS variables. This person-level dataset can be aggregated to produce cluster-level IWI estimates, which are used as the training/test labels for our satellite image prediction algorithms.

This code is basically just a wrapper function for Hans's package globallivingconditions which downloads and harmonises data from the DHS. There is also some draft code for bootstrapping to account for survey uncertainty.

The data which this package produces is documented in this repo: https://github.com/AIandGlobalDevelopmentLab/DHS-Data

TODOs:

See the repo's issues.

Set up:

Step 1:

Make a new file (in the same folder as this README) auth.R which declares five variables:

username - the username of your DHS account
pass - your DHS password
downloaded_DHS_data_file_path - where you want your DHS raw data to be stored. downloaded_DHS_data_file_path is passed as the argument directory to the download.and.harmonise function.
harmonised_DHS_data_file_path - where you want your harmonised data to be stored, by default. (If the argument cacheFolderPath to getDHSData is NULL, then a new folder will automatically be created in harmonised_DHS_data_file_path. The cached and harmonised data of download.and.harmonize are stored in this folder. (This folder is the working directory in which download.and.harmonize is started.))
repo_file_path - the head directory of your Git repository (i.e. where all of the R code is stored). And which sets your working directory to the folder repo_file_path. (Update: using a working directory is now depreciated in this project, since it is brittle. There is still some internal handling of working directories, since this is necessary for download.and.harmonise to run. But all of my code uses absolute file paths.)

Example 'auth.R' file:

username <- "my_username"
pass <- "a_password"

downloaded_DHS_data_file_path <- "~/DHS_Data"
harmonised_DHS_data_file_path <- "~/git/poverty-prediction/harmonised"
repo_file_path <- "~/git/poverty-prediction"
setwd(repo_file_path)

Step 2:

Run the R script auth.R

Run the R script set-up.R. (You only need to do this once -- although it doesn't hurt to run it multiple times.) This downloads the QoG dataset and saves it in the appropriate folder.

Step 3

Follow the instructions in main.R. Alternatively -- for a completely automated solution, on an Alvis apptainer -- run sbatch makeAndRunApptainer/minimalAlvisRun.sh. For more details see this readme.

Notes:

Because of how the paralellisation is implemented, I think it is much faster to run this code in a non-interactive mode of R (i.e. off terminal, not in R Studio) and on a unix-like system (including OSX).
You will need >16Gb of RAM (Hans estimates 40Gb) to run this code. (Assuming you are using the default settings -- countries = NULL and waves = NULL -- so that you are downloading all the DHS data.)
Some files are largely irrelevant:
- There are some debugging/validation files (debugging.R, debugging_helpers.R and finding bugs for Hans.R) which are very rough code, and were trying to debug Hans's globallivingconditions package, and validate the data outputted by this package.
- The bootstrap files (modified_boot_function.R, bootstrap_helpers.R , bootstrap_functions.R and bootstrap.R) are draft code for implementing the idea of bootstrapping over clusters to account for uncertainty due to the DHS survey procedure.
- Pragya data.R contains some code to answer a question Pragya had when we were discussing collaborating with her (in 2023-ish, from memory?)

Dataset variables:

country.code.ISO.3166.alpha.3 is the country code given by [this standard]{https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes). (This is different to the country codes used by the DHS.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TODOs:

Set up:

Step 1:

Example 'auth.R' file:

Step 2:

Step 3

Notes:

Dataset variables:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
makeAndRunApptainer		makeAndRunApptainer
makeApptainerTesting(Old)		makeApptainerTesting(Old)
oldDebugging		oldDebugging
plots		plots
.gitignore		.gitignore
Pragya data.R		Pragya data.R
README.md		README.md
bootstrap.R		bootstrap.R
bootstrap_functions.R		bootstrap_functions.R
bootstrap_helpers.R		bootstrap_helpers.R
debugging.R		debugging.R
debugging_helpers.R		debugging_helpers.R
finding bugs for Hans.R		finding bugs for Hans.R
get_DHS_data.R		get_DHS_data.R
get_DHS_data_functions.R		get_DHS_data_functions.R
global_vars.R		global_vars.R
helpers.R		helpers.R
main.R		main.R
modified_boot_function.R		modified_boot_function.R
set-up.R		set-up.R
survey_info.csv		survey_info.csv

AIandGlobalDevelopmentLab/DHS-harmonisation

Folders and files

Latest commit

History

Repository files navigation

TODOs:

Set up:

Step 1:

Example 'auth.R' file:

Step 2:

Step 3

Notes:

Dataset variables:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages