Skip to content

CV-TMLE vs TMLE #91

@olivierlabayle

Description

@olivierlabayle

Hello,

I am following the tutorial and trying to look at the difference between CV-TMLE and TMLE with the perinatal dataset.

perinatal.csv

To keep things simple I only use a glm as the model for both the propensity score and the outcome mean. I am surprised to see that the output is exactly the same for both procedures. The CV-TMLE seems to complain about glm not being "CV-aware" which might be the reason. However I don't understand why that should be the case. My understanding of CV-TMLE is that:

  • The dataset should be splitted in V folds
  • The glm models (for both A and Y) should be fitted on each split, so we should have V instantiations of each glm each trained on a different split.
  • The targeting step is pooled from predictions of the V glm model pairs on their respective validation sets
  • The final estimate is the average of estimates across validation folds
  • The influence curve (I am not entirely sure if it is pooled across validation samples or if multiple variance estimates are made and averaged)

As I understand it, we could have used a Super Learning instead of a GLM which would have resulted in another nested cross-validation procedure but Super Learning is not a requirement of CV-TMLE. The code to reproduce is below: you can tweak the learner_list to change to a super learner and then 2 different outputs are returned and no "CV-aware" complaint is formulated.

I would appreciate some clarification on the procedure and why this is happening! Thanks!

library(data.table)
library(tmle3)
library(sl3)

data = read.csv("perinatal.csv")

node_list <- list(
  W = c(
    "apgar1", "apgar5", "gagebrth", "mage", "meducyrs", "sexn"
  ),
  A = "parity01",
  Y = "haz01"
)

glm = Lrnr_glm$new()
lrn_mean = Lrnr_mean$new()
sl <- Lrnr_sl$new(learners = Stack$new(glm, lrn_mean), metalearner = Lrnr_nnls$new())

learner_list <- list(A = glm, Y = glm)
# learner_list = list(A=sl, Y = sl)

ate_spec <- tmle_ATE(
  treatment_level = 1,
  control_level = 0
)

tmle_task <- ate_spec$make_tmle_task(data, node_list)
initial_likelihood <- ate_spec$make_initial_likelihood(
  tmle_task,
  learner_list
)


targeted_likelihood_cv <- Targeted_Likelihood$new(initial_likelihood)

targeted_likelihood_no_cv <-
  Targeted_Likelihood$new(initial_likelihood,
    updater = list(cvtmle = FALSE)
  )

tmle_params_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_cv)
tmle_params_no_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_no_cv)

tmle_no_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_no_cv, tmle_params_no_cv,
  targeted_likelihood_no_cv$updater
)
tmle_no_cv
# -0.1855909

tmle_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_cv, tmle_params_cv,
  targeted_likelihood_cv$updater
)
tmle_cv
# -0.1855909

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions