Avg_comparisons/CATE, bootstrapping, and MI #1405

alockhar · 2025-03-03T22:04:59Z

alockhar
Mar 3, 2025

Dear group,

I am working on estimating the conditional average treatment effect (cATE) for a set of analyses. In other words, I am estimating the exposure effect for each level of a variable of interest in my model using G-computation.

One of my sub-pieces is going beyond complete case and integrating multiple imputation. Basically, the challenge I am having is I am doing bootstrapping and MI together under the context of estimating the cATE. I have tried this two broad ways as per the literature:

Multiple imputation first and then bootstrapping each multiple imputation before pooling basically to get the variance.
The other way I have tried is bootstrapping and then generating several multiple imputations per bootstrap sample before analysis and pooling.

Approach 1: I have tried everything and will share my code but it has been a struggle:

#Estimation of model across each iteration of the multiple imputation dataset (about 500 iterations for now)

#Going across each imputed dataset and doing avg_comparisons to generate my cATE while bootstrapping and storing bootstrapped results for the cATE for each MI iteration.

storeit=list()

#just trying a little test (also used lapply, etc but just being transparent)
for(i in 1: 5){
check=avg_comparisons(lm(out.formula,data = dat[[i]]), variables="exposure", by=c("variable of interest to estimate exposure effect across"),newdata = datagrid("Vector of values for by variable above" = ValVec)) %>%inference(method="boot")
storeit=rbind(storeit, check)
}

**My first question:

I am trying to pool my cATE or basically pooling across each level of another variable once bootstrapped so I have pooled cATE across all imputations and values of my second variable. I looked into MICE to do so and maybe I could pool by level of the second variable? Any advice would be appreciated.**

Approach 2: I created a bootstrapped dataset from my full dataset (which included missings) and then impute twice per bootstrap iteration before trying. In the middle of this actually but may have follow-up code later this week on it to compare as I think the confidence interval coverage may be a little better ultimately.

Any experience on either of these again would be appreciated so I don't have to do everything in nuts and bolts (would love to do in context of marginaleffects).

Thank you!

vincentarelbundock · 2025-03-04T00:39:41Z

vincentarelbundock
Mar 4, 2025
Maintainer

I don't have experience with this, but you should probably call get_draws() after inferences() to extract the draws. Don't try to bind just the data frame if you want to combine draws.

1 reply

alockhar Mar 4, 2025
Author

Thank you, Vincent, for the suggestion.

I actually think that can help my approach 1. Basically, one methodological way is to generate M imputation datasets, and for each of these, B bootstrap datasets. If enough M imputations are run and it is treated like a pooled sample, then besides the mean, it can suffice to take the ordered mean of each MI/B dataset and consider percentiles (no Rubin rules necessary). I think this could be done for the pooled cATE per value of interest (and per your get_draws) suggestion. Another way shown in the paper (more complicated is also bootstrapping and then MI, but point estimates are generated in each MI prior to bootstrapping and the standard errors are calculated per MI/Bootstrap, etc.)

I am going to give the first part above a shot (percentiles) and if anything comes from approach 2 (bootstrapping first and then MI) I will share code.

Otherwise, I can share a very interesting MI/bootstrapping simulation paper here: https://pmc.ncbi.nlm.nih.gov/articles/PMC5986623/

alockhar · 2025-04-22T16:56:08Z

alockhar
Apr 22, 2025
Author

Following up: so I think my interpretation of thr inferences function for bootstrapping has been wrong.

For instance, let’s say I run one imputed dataset but call inferences with 2 draws (R=2) to get the cate….. I would have assumed that for two draws I would get two separate estimates per bootstrap sample per value of each subgroup value I am predicting on…..but they are the same estimates per bootstrap within the same imputation. I assumed each bootstrapped estimate would be different per bootstrapped dataset.. Maybe there’s something I’m missing.

Any advice would be great. Thank you again!

6 replies

alockhar Apr 22, 2025
Author

Sorry. No problem.

It's basically the same code as above but I use mtcars in a generic way:

library(marginaleffects)
library(datasets)

#outcome model
out.formula="wt~am*rcs(log(disp),4)+mpg"

#Predict on percentiles of disp
ValVec=seq(80.6,396,length=10)

#predicting on new vector for each level of subgroup
test=avg_comparisons(lm(out.formula,
                        data = mtcars),
  variables="am",by="disp",newdata=datagrid(disp=ValVec))%>%inferences(method="boot",R=2)%>%get_draws()

#If you check the output basically for the first value in the vector 80.6 for each bootstrap iteration your estimate is the same
#Or basically 2.47 for me for each bootstrap but I would have thought the estimate would vary given you are bootstrapping mtcars twice

#Basically this is the basic example but in mine I am doing this for each imputed dataset and grabbing percentiles per subgroup value of the estimate to get my variance (which is accurate and fine in itself)

vincentarelbundock Apr 22, 2025
Maintainer

Here is a fully reproducible version of your example, with a fixed random seed and the two missing packages that you forgot to load.

AFAICT, the first row estimate is different for the two different bootstrap draws. What am I missing?

library(marginaleffects)
library(datasets)
library(rms)
library(magrittr)
set.seed(1024)

out.formula = "wt~am*rcs(log(disp),4)+mpg"
ValVec = seq(80.6, 396, length = 10)
test = avg_comparisons(
    lm(out.formula, data = mtcars),
    variables = "am",
    by = "disp",
    newdata = datagrid(disp = ValVec)
) %>%
    inferences(method = "boot", R = 2) %>%
    get_draws() |>
    suppressWarnings()

do.call(rbind, by(test, test$drawid, head, n = 1))$draw
#> [1] 2.071407 2.892694

vincentarelbundock Apr 22, 2025
Maintainer

Note that the estimate column includes the estimate without bootstrap, not the bootstrap draws. For those, you need to use the draw column.

alockhar Apr 22, 2025
Author

Note that the estimate column includes the estimate without bootstrap, not the bootstrap draws. For those, you need to use the draw column.

This was the basic issue. Thanks.

vincentarelbundock Apr 22, 2025
Maintainer

Glad we found the issue! And relieved it wasn't a bug ;)

Avg_comparisons/CATE, bootstrapping, and MI #1405

Uh oh!

alockhar Mar 3, 2025

Replies: 2 comments · 7 replies

Uh oh!

vincentarelbundock Mar 4, 2025 Maintainer

Uh oh!

alockhar Mar 4, 2025 Author

Uh oh!

Uh oh!

alockhar Apr 22, 2025 Author

Uh oh!

Uh oh!

alockhar Apr 22, 2025 Author

Uh oh!

vincentarelbundock Apr 22, 2025 Maintainer

Uh oh!

Uh oh!

vincentarelbundock Apr 22, 2025 Maintainer

Uh oh!

alockhar Apr 22, 2025 Author

Uh oh!

Uh oh!

vincentarelbundock Apr 22, 2025 Maintainer

alockhar
Mar 3, 2025

Replies: 2 comments 7 replies

vincentarelbundock
Mar 4, 2025
Maintainer

alockhar Mar 4, 2025
Author

alockhar
Apr 22, 2025
Author

alockhar Apr 22, 2025
Author

vincentarelbundock Apr 22, 2025
Maintainer

vincentarelbundock Apr 22, 2025
Maintainer

alockhar Apr 22, 2025
Author

vincentarelbundock Apr 22, 2025
Maintainer