This is code for investigating applicability of reinforcement learning (RL) to environmental health, specifically issuance of heat alerts in the United States. The associated paper can be found here. Additional information on the observational dataset we use can be found at the end of this document.
The RL simulator we develop here is available in the Python package weather2alert.
conda env create -f envs/rl/env-linux.yaml
conda activate heatrl
Versions of primary software used: Python version 3.10.9; cuda version 12.0.1; R version 4.2.2
***Start here if you have access to the health data***
In the directory heat_alerts/data_processing:
- Merging mortality data and heat alerts data: Explore_merged_data.R
- Processing county-level covariates: Extract_land_area.R, Get_county_Census_data.R, Prep_DoE_zones.R, Process_election_data.R
- Merge together: Merge-finalize_county_data.R
- Include hospitalizations: Get_hosp_data.R
- Add more covariates to address confounding: More_confounders.R
- Finalize and select only counties with population > 65,000: Single_county_prep_data.R
Then run the script heat_alerts/scripts/prepare_bayesian_model_data.py to get everything in the right format for the Bayesian model and gym environment. This will create a bunch of files under the folder data/processed.
The files will look like:
data/processed/
├── states.parquet # time-varying features, num rows = num fips x num days
├── actions.parquet # alert or not
├── spatial_feats.parquet # spatial features, num rows = num fips
├── fips2idx.json # mapping from fips to row index in spatial_feats.parquet
├── location_indices.json # location (fips index) f each row of states.parquet
├── offset.parquet # offset for the poisson regresison, it is the mean number of hospitalizations for that summer
These files are formatted to be ready for ML/RL; parquet files can be opened from both Python and R efficiently.
Details of this model can be found in heat_alerts/bayesian_model/bayesian_model_updated.md
The bulk of the code for this model is in heat_alerts/bayesian_model/pyro_heat_alert.py
- Run train_bayesian_model.py using Hydra arguments. Configurations are in the conf directory. For example, to get the model used in our paper, which we determine to be the most robust without too many constraints:
python train_bayesian_model.py training=full_fast constrain=mixed model.name="FF_mixed"
Note: whatever name the bayesian model is saved under should be pasted into the corresponding file in the conf/online_rl/sb3/r_model/ directory, for instance in the case above, in the mixed_constraints.yaml file we would write "guide_ckpt: ckpts/FF_mixed_guide.pt"
See here for an introduction to Hydra.
- Evaluate the accuracy of these predictions (i.e.
$R^2$ ) and identify counties with (a) high number of alerts and (b) high estimated (variance of) effectiveness of heat alerts with the script heat_alerts/bayesian_model/Evaluate_pyro_R.R
***Start here if you just have access to the simulator, not the health data***
We can validate the bayesian rewards model using the following scripts in the heat_alerts/bayesian_model/ directory:
- Run Validate_model.py with --type="initial" and --model_name="FF_mixed"
- Train a model on these sampled outcomes using the flag sample_Y=true (again using the script train_bayesian_model.py)
- Run Validate_model.py again with --type="validation" and the name of the new model trained on the fake data
- Calculate coverage using Validate_model_intervals.R
The gym environment is detailed in several scripts within the directory heat_alerts/online_rl:
- env.py contains the overall mechanics of stepping through and resetting the gym environment.
- datautils.py contains several functions for data formatting, which is performed before calling the environment instantiation.
- callbacks.py contains the calculation of custom metrics that we wish to save from each episode through the environment and can view using tensorboard -- used primarily during RL training.
Note: in the following sections, if you're using the provided shell (.sh) scripts, you will need to adjust the size of the job array depending on how many you run at once.
- Run heat_alerts/online_rl/Online_print_evals.R then run_jobs/Online_evals.sh -- if you're using slurm, the command to run the latter is "sbatch"
- Process these results using heat_alerts/scripts/Benchmark_evals.R
To train an RL model, use the script train_online_rl_sb3.py -- note that there are many possible arguments, passed using Hydra / config files.
To reproduce the analyses in the paper, adjust the algorithms and naming prefixes manually in the following steps:
- Tune hyperparameters for RL for each county by running heat_alerts/online_rl/Online_print_RLs.R followed by run_jobs/Online_RL_short.sh.
- Process these results using heat_alerts/scripts/Final_tuning_evals.R to obtain the final evaluations of the best models.
- Table of descriptive statistics: heat_alerts/scripts/Summary_stats_table.R
- Map of counties indicating which were used in the simulator and which were used in the RL analysis: heat_alerts/scripts/Map_counties.R
- Plot of coefficients sampled from the Bayesian rewards model posterior: heat_alerts/bayesian_model/Make_plots.py
- Obtain the plot illustrating convergence of the loss from tensorboard (from terminal run "tensorboard --logdir logs/[cfg.model.name]")
- Supplementary "time series" plot of observed quantile of heat index, modeled baseline NOHR hospitalizations, and modeled alert effectiveness (assuming no past alerts) across days of summer: (i) get data from heat_alerts/scripts/Time_series_plots.py, then (ii) make plots with heat_alerts/scripts/Make_TS_plots.R
- Main RL results table, supplementary table of county characteristics, and supplementary table of RL results using the per-alert (compare_to_zero) metric: heat_alerts/scripts/Make_tables_for_paper.R
- To obtain an approximate confidence interval for the absolute number of NOHR hospitalizations saved, use heat_alerts/scripts/Approx_CI.R
- Visualizations of different heat alert policies for individual counties: heat_alerts/scripts/Mini-case-studies.R
- Boxplot and histograms of day-of-summer and alert streak lengths: heat_alerts/scripts/Make_plots_for_paper_FINAL.R
- CART analysis / plots: heat_alerts/scripts/Investigate_systematic_diffs.R -- manually change the set of CART covariates on lines 113-114 depending on whether you're looking at only alert-related variables or the set of more conventionally interpretable covariates
We start with a US county-level dataset spanning 2006-2016 (warm months) which has been used in past studies of heat alert effectiveness. Main variables in this dataset are daily values of maximum ambient heat index, heat alerts issued by the National Weather Service, and the number of in-patient fee-for-service Medicare hospitalizations for causes associated with extreme heat in past studies. We additionally compile other datasets to help characterize variability in the health impacts of extreme heat and heat alerts, such as sociodemographics and regional climate zone classifications. References for these datasets and past studies are in our paper.
The main analysis in this paper uses a gym environment (simulator of environmental variables and health outcomes) that we created based on the observed data. This simulator is publicly available. However, the health data that were used to create the model of the health outcomes are highly sensitive, and are only available to researchers with qualifying private servers. Access can be requested via application to the Centers for Medicare and Medicaid Services (see https://www.resdac.org/research-identifiable-files-rif-requests).