This repository contains the replication files for the paper "Deep Learning with DAG".
- Folder: blau_duncan_1967: Contains scripts for results in Section 5.1.
- Folder: zhou_2019: Contains scripts for results in Section 5.2.
- Folder: MCEs: Contains scripts for results in Appendix A.
Contains scripts for results in Section 5.1.
DS1_1962_ssv_txt: Original data from Blau and Duncan (1967) and our reanalysis. Download here.blau_duncan_1967_clean.py: Cleans and converts data to CSV format.blau_duncan_1967.py: Processes data, trains models, and estimates point estimates.
Scripts for bootstrapping to generate confidence intervals.
blau_bootstrap.py: Generates and processes bootstrap samples, trains models, and computes estimates in parallel.init_blau_bootstrap.py: Initiatesblau_bootstrap.py.blau_bootstrap.sbatch: Slurm script for runninginit_blau_bootstrap.pyon high performance computing cluster (HPC).standardization.py: Standardizes all computed results.sum_blau_bootstrap_<ATE_XY/ATE_UY/NDENIE>.py: Computes 90% confidence intervals on the average total effect of X on Y/ average total effect of U on Y/ natural direct and indirect effects of X on Y mediated by U.plot_blau_bootstrap_<ATE_XY/ATE_UY/NDENIE>.py: Plots point estimates and 90% confidence interval on the average total effect of X on Y/ average total effect of U on Y/ natural direct and indirect effects of X on Y mediated by U.
Scripts for sensitive analysis on the average total effect of U on Y.
blau_duncan_1967_sensitivity.py: Processes data, trains models, and estimates point estimates on the average total effects of U on Y with sensitivity analysisplot_blau_duncan_ATE_UY_sens.py: Plots point estimates under sensitive analysis.
Contains scripts for results in Section 5.2.
nlsy79_samples.RData: Original data for Zhou (2019) and our reanalysis. Download here.zhou_2019_clean.R: Cleans and converts data to CSV format.zhou_2019.py: Processes data, trains models, and estimates point estimates.
Scripts for bootstrapping to generate confidence intervals.
zhou_bootstrap.py: Generates and processes bootstrap samples, trains models, and computes estimates in parallel.init_zhou_bootstrap.py: Initiateszhou_bootstrap.py.zhou_bootstrap.sbatch: Slurm script for runninginit_zhou_bootstrap.pyon HPC.sum_zhou_bootstrap.py: Computes 90% confidence intervals.
Contains scripts for generating results in Appendix A.
MCE.sbatch: Slurm script for initiating relevant Python scripts on HPC.
Contains scripts to test robustness of cGNF estimates to errors in the assumed DAG, replicating results from Appendix A.2.
Each folder from Exp1 to Exp5 contains the replication files for experiments 1 to 5.
MCE.py: Generates and processes Monte Carlo samples, trains models, and computes estimates in parallel.loop.py: Initiates the execution ofMCE.py.sum.py: Computes summary statistics.
Scripts for Monte Carlo experiments testing cGNF performance, replicating results from Appendix A.3.
MCE_<1/2/3>.py: Generates and processes Monte Carlo samples, trains models, and computes estimates in parallel.init_MCE_<1/2/3>.py: Initiates the respectiveMCE_<1/2/3>.pyscript.sum_MCE_<1/2/3>.py: Computes summary statistics.plot_MCE_<1/2/3>.py: Plots summary statistics.est_<ATE/NDENIE/PSE>.py: Calculates the true values of average total effects, natural direct and indirect effects, and path-specific effects.
Scripts for testing the confidence interval coverage rate of cGNF, replicating results from Appendix A.4.
MCE_bootstrap.py: Generates and processes bootstrap samples based on Monte Carlo samples, trains models, and computes estimates in parallel.bootstrap_for_MCE.py: InitiatesMCE_bootstrap.py.MCE_for_bootstrap.py: Generates Monte Carlo samples.template_sbatch.sh: Slurm script for runningbootstrap_for_MCE.pyon HPC.submit_jobs.sh: Initiatestemplate_sbatch.shon HPC.sum_MCE_bootstrap.py: Computes 90% confidence intervals.plot_MCE_bootstrap.py: Plots 90% confidence intervals.
Scripts for testing the robustness of cGNF to variations in architecture and hyper-parameter settings, replicating results from Appendix A.5.
Hyperparameters_<1/2>_<a, ..., e>.py: Produces Monte Carlo samples.init_hyperparameters_<1/2>.py: Initiates respective hyperparameter scripts.sum_hyperparameters_<1/2>.py: Computes summary statistics.plot_hyperparameters_<1/2>.py: Plots summary statistics.
Scripts comparing the performance of cGNFs to debiased machine learning (DML) methods — specifically, a augmented inverse probability weighting estimator with nuisance functions estimated via random forests (AIPW-RF). These scripts replicate results from Appendix A.6.
The folders Compare_AIPW_10cat and Compare_AIPW_binary replicate results comparing cGNF and AIPW-RF estimators.
In each folder:
MCE.py: Generates and processes Monte Carlo samples, trains models, and computes estimates in parallel.loop.py: Initiates the execution ofMCE.py.sum.py: Computes summary statistics.plot.py: Plots summary statistics.
The folder cGNF_AIPW contains scripts for estimating cGNF with AIPW:
MCE_point.py: Generates and processes Monte Carlo samples, trains models, and computes estimates in parallel for each MCE dataset.loop_point.py: InitiatesMCE_point.py.MCE_dml.py: Generates and processes Monte Carlo samples, trains models, and computes estimates in parallel to construct the AIPW-cGNF estimator.loop_dml.py: InitiatesMCE_dml.py.sim_parallel_dml.py: A beta version of the simulation function that estimates nuisance functions in the AIPW-cGNF estimator.sum.py: Computes the AIPW-cGNF estimator.template_sbatch.sh: Slurm script for runningMCE_dml.pyon HPC.submit_jobs.sh: Initiatestemplate_sbatch.shon HPC.