Hi all,
Thank you for creating and maintaining such a wonderful package!
I would like to use the grf package to study treatment effect heterogeneity in an observational dataset on telework uptake and mental health (my publication public would be non-statisticians), which would be good to translate the package to applied scientists. We have ~5,000 observations and ~50 pre-treatment covariates (many categorical variables expanded to dummies). Missingness is ~10%, so we generated >20 multiply imputed datasets (via mice).
I’d appreciate guidance on best practice for combining results across imputations:
ATE/CATE/RATE across imputations
- Is the recommended approach to fit a separate causal_forest() on each imputed dataset, compute ATE (average_treatment_effect), CATE predictions (predict(..., estimate.variance=TRUE)), and heterogeneity summaries (e.g., RATE / TOC), and then pool the resulting estimands across imputations (e.g. Rubin-style)?
- Or is there a preferred alternative (e.g., stacking imputations with weights, etc.) in the context of grf through merge_forests?
Randomness / sample splitting consistency across imputations
- Since causal_forest() involves randomness (subsampling, honesty, etc.), would you recommend fixing the same train/test split across imputations when generating CATE predictions for evaluation/plots? Or do we do it per imputation?
Variable importance across imputations
-
I noted that variable_importance() can differ across imputations (with certain variables that always appear in the same causal_forests). Do you have any recommendations for summarizing this across imputed datasets? For example, should we normalize importances within each forest and report mean/median + stability metrics (top-k frequency) across imputations?
-
Lastly, in certain codes, I see that you utilize a train - test split (e.g. ijmpr code), if I understand correctly, this is to evaluate the fit of the causal forest? Is there some intuition to see when this is necessary as there is already a split due to honesty? Across the different examples, I see different approaches but I fail to see the reasoning behind it.
I have already implemented the “one forest per imputation + pooling” workflow for ATE/CATE/RATE, but I’d be very grateful for any suggestions!
Kind regards,
Eduardo
Hi all,
Thank you for creating and maintaining such a wonderful package!
I would like to use the grf package to study treatment effect heterogeneity in an observational dataset on telework uptake and mental health (my publication public would be non-statisticians), which would be good to translate the package to applied scientists. We have ~5,000 observations and ~50 pre-treatment covariates (many categorical variables expanded to dummies). Missingness is ~10%, so we generated >20 multiply imputed datasets (via mice).
I’d appreciate guidance on best practice for combining results across imputations:
ATE/CATE/RATE across imputations
Randomness / sample splitting consistency across imputations
Variable importance across imputations
I noted that variable_importance() can differ across imputations (with certain variables that always appear in the same causal_forests). Do you have any recommendations for summarizing this across imputed datasets? For example, should we normalize importances within each forest and report mean/median + stability metrics (top-k frequency) across imputations?
Lastly, in certain codes, I see that you utilize a train - test split (e.g. ijmpr code), if I understand correctly, this is to evaluate the fit of the causal forest? Is there some intuition to see when this is necessary as there is already a split due to honesty? Across the different examples, I see different approaches but I fail to see the reasoning behind it.
I have already implemented the “one forest per imputation + pooling” workflow for ATE/CATE/RATE, but I’d be very grateful for any suggestions!
Kind regards,
Eduardo