-
Notifications
You must be signed in to change notification settings - Fork 780
Add Clustered Standard Errors Support #996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@kbattocchi was wondering if you had a chance to look into this PR and have any feedback I can work on. Thank you! |
Apologies for the slow response, I was traveling and missed this. First of all, thanks for the contribution, this is definitely a valuable feature that we'd love to have. In terms of high-level feedback, one thing is that I think we'd want to support federalization even with clustered standard errors, with the restriction that groups are assumed to be fully partitioned to each individual estimator (or in other words, group "1" for the first estimator is assumed to be different from group "1" in the second estimator). I believe that this should then be straightforward because again you can just compute the moments locally and combine them, except that you may add a count to the state to keep track of the number of groups when aggregating later (I'm not sure). The only other thing that comes to mind immediately is a question: is the bias-correction for clustered errors completely standard, or would it make sense to provide options for this akin to HC0 vs HC1 for non-clustered errors, where we can optionally correct for the degrees of freedom? |
@kbattocchi thanks for the feedback! On federalization: I think that makes sense, I'll try to add that soon. On bias-correction: libraries do differ a bit on the defaults and whether they allow switching bias-correction on/off. Across Statsmodels/linearmodels/Stata, there is only a single "cluster" string option. Statsmodels and linearmodels both allow to toggle bias-correction via a separate argument but, from what I can tell, Stata just sticks with the (G/(G-1))*((N - 1)/(N - k)) correction (manual, page 51). I think it makes sense to have that same correction as default but I'm unsure if we want to a) expose a separate use_correction argument in high level API vs b) use modified strings like "cluster_HC0" vs c) no toggling allowed at all. Any thoughts on this? |
a147e95
to
b08b338
Compare
- Implement clustered variance calculation in StatsModelsLinearRegression and StatsModels2SLS - Add cov_type='clustered' parameter to OrthoIV estimator - Add tests validating against statsmodels implementation Signed-off-by: Mikayel Sukiasyan <[email protected]>
… to handle groups=None Signed-off-by: Mikayel Sukiasyan <[email protected]>
Signed-off-by: Mikayel Sukiasyan <[email protected]>
…d corresponding test 2. Fix clustered SE computation issue with summarized data; add corresponding test Signed-off-by: Mikayel Sukiasyan <[email protected]>
…rection on/off for clustered covariance 2. Set both corrections on by default 3. Add new test for corrections and modify others with the new corrections defaults Signed-off-by: Mikayel Sukiasyan <[email protected]>
7576ff7
to
79cf45c
Compare
Updates:
Will mark this ready for review for now. |
Signed-off-by: Mikayel Sukiasyan <[email protected]>
Adding support for cluster-robust standard errors.
Implementation:
StatsModelsLinearRegression
andStatsModels2SLS
classes, using existing groups parameter plumbing. Matcheslinearmodels
package implementation.cov_type
parameter toOrthoIV
constructor to expose the option sincemodel_final
isn't an argument.File modifications:
econml/sklearn_extensions/linear_model.py -
Added_compute_clustered_variance()
methodseconml/iv/dml/_dml.py
- Addedcov_type
parameter toOrthoIV
class, passgroups
parameter tomodel_final
.econml/dml/_rlearner.py
- Pass groups parameter tomodel_final
.econml/tests/test_clustered_se.py
- Tests validating againststatsmodels
implementation (up to small sample correction).First contribution, so please let me know if I've missed anything!