Skip to content

Conversation

msukiasyan
Copy link

Adding support for cluster-robust standard errors.

Implementation:

  • Add clustered variance computation methods to StatsModelsLinearRegression and StatsModels2SLS classes, using existing groups parameter plumbing. Matches linearmodels package implementation.
  • Add cov_type parameter to OrthoIV constructor to expose the option since model_final isn't an argument.

File modifications:

  • econml/sklearn_extensions/linear_model.py - Added _compute_clustered_variance() methods
  • econml/iv/dml/_dml.py - Added cov_type parameter to OrthoIV class, pass groups parameter to model_final.
  • econml/dml/_rlearner.py - Pass groups parameter to model_final.
  • econml/tests/test_clustered_se.py - Tests validating against statsmodels implementation (up to small sample correction).

First contribution, so please let me know if I've missed anything!

@msukiasyan msukiasyan marked this pull request as ready for review August 13, 2025 21:52
@msukiasyan
Copy link
Author

@kbattocchi was wondering if you had a chance to look into this PR and have any feedback I can work on. Thank you!

@kbattocchi
Copy link
Collaborator

@kbattocchi was wondering if you had a chance to look into this PR and have any feedback I can work on. Thank you!

Apologies for the slow response, I was traveling and missed this.

First of all, thanks for the contribution, this is definitely a valuable feature that we'd love to have.

In terms of high-level feedback, one thing is that I think we'd want to support federalization even with clustered standard errors, with the restriction that groups are assumed to be fully partitioned to each individual estimator (or in other words, group "1" for the first estimator is assumed to be different from group "1" in the second estimator). I believe that this should then be straightforward because again you can just compute the moments locally and combine them, except that you may add a count to the state to keep track of the number of groups when aggregating later (I'm not sure).

The only other thing that comes to mind immediately is a question: is the bias-correction for clustered errors completely standard, or would it make sense to provide options for this akin to HC0 vs HC1 for non-clustered errors, where we can optionally correct for the degrees of freedom?

@msukiasyan
Copy link
Author

@kbattocchi thanks for the feedback!

On federalization: I think that makes sense, I'll try to add that soon.

On bias-correction: libraries do differ a bit on the defaults and whether they allow switching bias-correction on/off. Across Statsmodels/linearmodels/Stata, there is only a single "cluster" string option. Statsmodels and linearmodels both allow to toggle bias-correction via a separate argument but, from what I can tell, Stata just sticks with the (G/(G-1))*((N - 1)/(N - k)) correction (manual, page 51). I think it makes sense to have that same correction as default but I'm unsure if we want to a) expose a separate use_correction argument in high level API vs b) use modified strings like "cluster_HC0" vs c) no toggling allowed at all. Any thoughts on this?

@msukiasyan msukiasyan force-pushed the clustered-std-errors branch from a147e95 to b08b338 Compare October 2, 2025 07:37
@msukiasyan msukiasyan marked this pull request as draft October 2, 2025 17:28
msukiasyan and others added 5 commits October 2, 2025 14:13
- Implement clustered variance calculation in StatsModelsLinearRegression and StatsModels2SLS
- Add cov_type='clustered' parameter to OrthoIV estimator
- Add tests validating against statsmodels implementation

Signed-off-by: Mikayel Sukiasyan <[email protected]>
… to handle groups=None

Signed-off-by: Mikayel Sukiasyan <[email protected]>
…d corresponding test

2. Fix clustered SE computation issue with summarized data; add corresponding test

Signed-off-by: Mikayel Sukiasyan <[email protected]>
…rection on/off for clustered covariance

2. Set both corrections on by default
3. Add new test for corrections and modify others with the new corrections defaults

Signed-off-by: Mikayel Sukiasyan <[email protected]>
@msukiasyan msukiasyan force-pushed the clustered-std-errors branch from 7576ff7 to 79cf45c Compare October 2, 2025 21:13
@msukiasyan
Copy link
Author

Updates:

  1. Federated learning
    a. Added federated support for clustered covariance. This requires storing a few additional moments and n_groups for each learner. Added tests.
    b. Fixed an issue with summarized data (frequencies > 1) and clustered covariance. Added a test to catch this.
  2. Small sample corrections
    a. Set small sample correction to match Stata/statsmodels defaults, edited tests to reflect
    b. Added cov_options to allow toggling like in statsmodels

Will mark this ready for review for now.

@msukiasyan msukiasyan marked this pull request as ready for review October 2, 2025 21:27
Signed-off-by: Mikayel Sukiasyan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants