Skip to content

Conversation

@joshua-slaughter
Copy link
Member

This is a draft pull request, allowing for the use of weighted nuisance estimators as detailed in Rose & van der Laan (2008). Here, they provide the following steps for the implementation of the case-control weighted TMLE:

  1. The assignment of weights $q_0$ to cases and $(1-q_0)\frac{1}{J}$ to the corresponding J controls. Here $q_0$ is a prevalence estimate and J is the ratio of controls to cases.
  2. Estimate the conditional probability of Y given A and W using assigned weights. (weighted Q fit)
  3. Estimate the conditional distribution of the exposure given confounders using the assigned weights. (weighted g fit)
  4. Calculate the clever covariate for each subject based on the g function.
  5. Update the initial fit from step 2 using the clever covariate using weighted MLE.
  6. Use the assigned weights and the updated Q fit to estimate causal parameters. This is done by averaging over the case-control weighted distribution of W
  7. Inference (ICs, errors, conf intervals)

I am curious as to how to implement Steps 5 and 6 as 5 requests the use of a weighted MLE and 6 requests using the weights to estimate the causal parameters.

@joshua-slaughter joshua-slaughter linked an issue Jun 24, 2025 that may be closed by this pull request
@olivierlabayle
Copy link
Member

Hi Josh,

I have had a look at https://www.degruyterbrill.com/document/doi/10.2202/1557-4679.1114/html and https://biostats.bepress.com/cgi/viewcontent.cgi?article=1238&context=ucbbiostat.

My understanding is that all fits need to be weighted using the case-control weights q_0 and (1-q_0)/J), including the fluctuation. I don't have extensive time to check this though so maybe @SjoerdVBeentjes could confirm?

Then, both the estimate and influence curve also need to be weighted with the same weights.

  • For the fluctuation, you will need to change the behaviour of the MLJBase.fit method to accept weights. Since the fluctuation is just another MLJ compliant model, the bottom part of this documentation will probably be useful.

  • For the parameter and gradient, they are cached from the fit of the fluctuation. So this needs to be updated as well.

@olivierlabayle
Copy link
Member

If you could revert the undesired changes to Project.toml and .gitignore that would be great !

@SjoerdVBeentjes
Copy link

Hi both,

I agree with the above description of the case-control weighted TMLE as enumerate by Joshua and confirmed by Olivier. In particular, refining the steps from Rose and Van der Laan (2008):

  1. Use case-control weights q_0 and (1-q_0)/J in weighted logistic regression to obtain an estimate of P*_0(Y=1|W,A).
  2. Use case-control weights q_0 and (1-q_0)/J in weighted logistic regression to obtain an estimate of P*_0(A=1|W).
  3. The clever covariate H(W,A) contains the estimate of g*_0 from step 2, but is otherwise unweighted.
  4. The updating step is performed via weighted (logistic) regression with case-control weights q_0 and (1-q_0)/J.
  5. The updated fit of Q*_0(W,A) is obtained by plugging in the estimated parameter \hat{\epsilon} from step 4.
  6. Care needs to be taking to obtain the final parameter estimate, e.g., see section 3.2.5 of Rose and Van der Laan (2008) for the example of the Risk Ratio (RR).
  7. Similarly, evaluating the asymptotic variance requires using the influence function D_{q_0}, which is again a case-control weighted version of the influence function D* using the case-control weights q_0 and (1-q_0)/J.

Finally, the choice of q_0 is of interest. It would be helpful if this could be supplied by the user. In this case, interesting choices are (i) the prevalence of ME/CFS in the UKB (or other database for comparison, e.g., All of Us), (ii) estimated prevalence of ME/CFS in the whole UK, (iii) a range of values of q_0 around either of the two estimates to perform a sensitivity analysis.

@olivierlabayle olivierlabayle self-assigned this Jul 31, 2025
Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Josh, we are on good tracks. There are three main issues that need to be tackled before I take a deeper look at correctness ant the test:

  • In get_prevalence_weights, I think the forumla you use does not match my original expectation and need to be clarifier
  • You defined new structures for marginal distributions etc... In principle this is more mathematically acurate. Unfortunately it makes the code less readable so unless really needed it is better to remove these.
  • There is a problem with passing weights to MLJ machine and we need to throw whenever the model does not support it. We can follow up on JuliaAI/MLJBase.jl#1005 to see how we want to move forward.

…e (no more ones() vectors); Error is thrown when a model is specifed that does not accept weights; Removed unnessesary dependency
@joshua-slaughter
Copy link
Member Author

For completeness, the latest commit addresses every previous issue/concern except for the issue with ProbablisticPipeline in MLJ.jl. Most importantly, this commit solves a major issue in implementation in which the weights used to approximate expectations were not normalized, resulting in incorrect influence curves and consequently standard errors. This did not influence the point estimate as it is still asymptotically unbiased.

For the former, if we could access the components of a pipeline and then assess if they can accept weights this should solve the issue but the documentation on this isn't great so maybe you'll have some pointers later.

@olivierlabayle
Copy link
Member

olivierlabayle commented Aug 4, 2025

Thanks Josh, this is starting to look really good. You can address my comments when you have time but the main things left I believe are:

  • implement the check for MLJ pipelines as suggested
  • I left a question about the weigthing of the counterfactual_aggregate, can you provide some reference for why we need to weigh this part of the gradient again and only this part? I don't think the other bits are further reweighted. I also don't udnerstand what is the w .*= length(w) / sum(w) for.
  • C-TMLE does not have support for it at the moment (if you don't have time, we'll need to make this explicit in the docs)
  • Documentation (update TMLE doctrings + add a paragraph in the docs as suggested earlier + optionally turn the test file into an example script)

@joshua-slaughter
Copy link
Member Author

  1. Apologies, somehow missed the suggestion. Will implement now.
  2. This weighting step is necessary as the case-control weighted empirical distribution must be used in all steps. If it is not used in the plugin then we wont recover the true population estimand and it will be a plugin for the biased population. (Rose's Thesis 3.2.4). So these weights must be propagated to all parts of the gradient. This is also done in the gradient Y|X as well. Furthermore, the multiplication is just to normalize the weights so that they sum to 1 since this is a requirement for approximating expectations.
  3. I don't believe I will have the time at this current moment to work on a CTMLE implementation
  4. Will add to this as well. I already have an example ready.

@olivierlabayle
Copy link
Member

Regarding weigthing then it is probably better to do it only once at this line isn't it?

gradient = ∇YX(observed_cache[:H], observed_cache[:y], Ey, observed_cache[:w]) .+ ∇W(ct_aggregate, Ψ̂)

gradient .= gradient .* prevalence_weights

@joshua-slaughter
Copy link
Member Author

I don't think its that simple as the weights for some parts of the gradient are implicitly passed and others require the explicit weighted that was added. Multiplying the normalized weights across the entire gradient doesn't return the desired behavior. I tested it recently and this was the outcome:
Screenshot 2025-08-05 at 00 28 16
Where we would want something like this (the current implementation)
Screenshot 2025-08-04 at 14 31 47

@olivierlabayle
Copy link
Member

I see, what you display is a bias (so unrelated to the gradient). This is because the TMLE estimate is computed from the counterfactual aggregate as well so indeed it is not as simple as just weighting the gradient. Since the weights are not counterfactual however, I think we only need to do something like: ct_aggregate .*= observed_cache[:w] when prevalence_weights are given or am I missing something again?

@joshua-slaughter
Copy link
Member Author

I've done some more reading and now understand it as such:

The only place in which weighting is necessary is in the initialize_observed_cache. This is because this is the cache that influences the targeting step which must account of the biased sampling distribution of the population. If not the estimand will be misspecified causing the bias shown previously. By adding the weights in the initialize_observed_cache the empirical mean represents that of the true population given the assumption that the prevalence is correct.

As you stated the weights are not counterfactual and do not apply to intialize_counterfactual_cache so I have removed them. However, the weights are implicitly passed to the compute_counterfactual_aggregate and the plugin estimator as both functions receive the unbiased/updated Q fit. After testing I can confirm this to be true. So here, the only place that requires weights is the observed_cache and they are propagted from there.

@olivierlabayle
Copy link
Member

Hi Josh,

I'm not sure I understand why the counterfactual_aggregate needs not be explicitly weighted as well. I thought the weighting of the gradient was to weigh the distribution of W and thus needs to be propagated across all subparts of the gradient? For instance the delta_X part also has an implicit weighting mechanism through the unnormalised prevalence_weights in the g fit but is also explicitly weighted in the gradient.

Also I think Github tends to hide review comments which is not ideal (you need to click on load more or you can also use vscode extension pull request to have it in your right in vscode), but there are quite a few to be addressed still.

@joshua-slaughter
Copy link
Member Author

I have addressed some of the easier comments so far and have held off on the tests and comments concerning the weights artifacts of thectf_aggregate mainly because this can be repurposed for the distribution of prevalence weights.

In my tests of where to apply the weights and how that effected the bias, I showed the following:

  • If the observed cache is not weighted the fluctuation parameter is misspecified causing bias in the updated estimate.
  • Removing the weights from the computation of the counterfactual aggregate does not impact bias because it is passed the correctly updated Q-fit.

However, the paper states (3.2.4) that in the estimation of the target parameter the marginal W distribution must be applied even thought the updated Q-fit is correctly specified. So although the bias is not impacted by this, we should still include the weights in this step to arrive at the correct gradient.

@olivierlabayle
Copy link
Member

olivierlabayle commented Aug 6, 2025

To make clear my reasoning, we have:

gradient = gradient_YX + gradient_W

It is my understanding, which I think you confirm, that the gradient needs to be weighted to provide valid inference (confidence intervals). That is we need:

weighted_gradient = weighted_gradient_YX + weighted_gradient_W

At the moment:

  • The gradient_YX is weighted through the update of the w component in the observed_cache, so this bit is fine.
  • The gradient_W is not weighted yet and this needs to be addressed.

Similarly, it is also my understanding that the TMLE estimate needs to be weighted as well. Fortunately, because both are computed from the ct_aggregate, weighting the ct_aggregate is enough to propagate through these and provide both valid point estimation and confidence intervals.

My suggestion is to define a function to be called in compute_gradient_and_estimate_from_caches! after ct_aggregate = compute_counterfactual_aggregate!(counterfactual_cache, Q):

weigh_counterfactual_aggregate!(ct_aggregate, prevalence_weights::Nothing) = ct_aggregate
weigh_counterfactual_aggregate!(ct_aggregate, prevalence_weights) = ct_aggregate .*= prevalence_weights

I am not sure what your tests are showing but we need to be clear on the methodology, and then test empirical results, not the other way around. Empirical results can be close to expectation but still wrong, or originating from a compensation of errors.

Let me know if something is wrong here.

@olivierlabayle
Copy link
Member

@joshua-slaughter I just improved the readability of the supervised_learner_supports_weights function which I had missed in my review.

Copy link
Member Author

@joshua-slaughter joshua-slaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this looks good to me!

@joshua-slaughter
Copy link
Member Author

I have updated the docs/src/examples/case_control_experiment.md to be more clear. Made small changes including:

  • The discrimination between the experimental unit of the full data distribution and that of the case-control weighted dsitribution
  • Clarification of the treatment variable (in some places it was T, in others A)
  • Minor changes to interpretation subsection at the end

Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @joshua-slaughter, please revert the gitignore changes and we can merge.

*.jl.cov
*.jl.mem
/docs/build/
/docs/src/examples/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was intended, the examples are copied within this folder at docs build time (see make.jl). So probably revert this.

@olivierlabayle olivierlabayle merged commit daafd81 into main Sep 29, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Case-Control Weighted TMLE

4 participants