-
Notifications
You must be signed in to change notification settings - Fork 5
Prevalence weighted nuisance functions #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi Josh, I have had a look at https://www.degruyterbrill.com/document/doi/10.2202/1557-4679.1114/html and https://biostats.bepress.com/cgi/viewcontent.cgi?article=1238&context=ucbbiostat. My understanding is that all fits need to be weighted using the case-control weights Then, both the estimate and influence curve also need to be weighted with the same weights.
|
|
If you could revert the undesired changes to |
|
Hi both, I agree with the above description of the case-control weighted TMLE as enumerate by Joshua and confirmed by Olivier. In particular, refining the steps from Rose and Van der Laan (2008):
Finally, the choice of |
olivierlabayle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Josh, we are on good tracks. There are three main issues that need to be tackled before I take a deeper look at correctness ant the test:
- In get_prevalence_weights, I think the forumla you use does not match my original expectation and need to be clarifier
- You defined new structures for marginal distributions etc... In principle this is more mathematically acurate. Unfortunately it makes the code less readable so unless really needed it is better to remove these.
- There is a problem with passing weights to MLJ machine and we need to throw whenever the model does not support it. We can follow up on JuliaAI/MLJBase.jl#1005 to see how we want to move forward.
…e (no more ones() vectors); Error is thrown when a model is specifed that does not accept weights; Removed unnessesary dependency
… weight subsetting
|
For completeness, the latest commit addresses every previous issue/concern except for the issue with ProbablisticPipeline in MLJ.jl. Most importantly, this commit solves a major issue in implementation in which the weights used to approximate expectations were not normalized, resulting in incorrect influence curves and consequently standard errors. This did not influence the point estimate as it is still asymptotically unbiased. For the former, if we could access the components of a pipeline and then assess if they can accept weights this should solve the issue but the documentation on this isn't great so maybe you'll have some pointers later. |
|
Thanks Josh, this is starting to look really good. You can address my comments when you have time but the main things left I believe are:
|
|
|
Regarding weigthing then it is probably better to do it only once at this line isn't it?
gradient .= gradient .* prevalence_weights |
|
I see, what you display is a bias (so unrelated to the gradient). This is because the TMLE estimate is computed from the counterfactual aggregate as well so indeed it is not as simple as just weighting the gradient. Since the weights are not counterfactual however, I think we only need to do something like: |
|
I've done some more reading and now understand it as such: The only place in which weighting is necessary is in the As you stated the weights are not counterfactual and do not apply to |
|
Hi Josh, I'm not sure I understand why the counterfactual_aggregate needs not be explicitly weighted as well. I thought the weighting of the gradient was to weigh the distribution of Also I think Github tends to hide review comments which is not ideal (you need to click on load more or you can also use vscode extension pull request to have it in your right in vscode), but there are quite a few to be addressed still. |
|
I have addressed some of the easier comments so far and have held off on the tests and comments concerning the weights artifacts of the In my tests of where to apply the weights and how that effected the bias, I showed the following:
However, the paper states (3.2.4) that in the estimation of the target parameter the marginal W distribution must be applied even thought the updated Q-fit is correctly specified. So although the bias is not impacted by this, we should still include the weights in this step to arrive at the correct gradient. |
|
To make clear my reasoning, we have:
It is my understanding, which I think you confirm, that the gradient needs to be weighted to provide valid inference (confidence intervals). That is we need:
At the moment:
Similarly, it is also my understanding that the TMLE My suggestion is to define a function to be called in weigh_counterfactual_aggregate!(ct_aggregate, prevalence_weights::Nothing) = ct_aggregate
weigh_counterfactual_aggregate!(ct_aggregate, prevalence_weights) = ct_aggregate .*= prevalence_weightsI am not sure what your tests are showing but we need to be clear on the methodology, and then test empirical results, not the other way around. Empirical results can be close to expectation but still wrong, or originating from a compensation of errors. Let me know if something is wrong here. |
…methods for clarity
…lows the papers. Both implementations should be concordant.
Ol ccwtmle
Update weighting procedure to handle nested model objects
|
@joshua-slaughter I just improved the readability of the |
joshua-slaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this looks good to me!
|
I have updated the
|
olivierlabayle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @joshua-slaughter, please revert the gitignore changes and we can merge.
| *.jl.cov | ||
| *.jl.mem | ||
| /docs/build/ | ||
| /docs/src/examples/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was intended, the examples are copied within this folder at docs build time (see make.jl). So probably revert this.


This is a draft pull request, allowing for the use of weighted nuisance estimators as detailed in Rose & van der Laan (2008). Here, they provide the following steps for the implementation of the case-control weighted TMLE:
I am curious as to how to implement Steps 5 and 6 as 5 requests the use of a weighted MLE and 6 requests using the weights to estimate the causal parameters.