-
Notifications
You must be signed in to change notification settings - Fork 71
JOSS Paper for PyFixest #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
Maybe we can use a quarto document ? (not strong option, but I think it compiles to latex right?) |
|
|
||
| ## Summary | ||
|
|
||
| PyFixest is an open-source Python library that implements efficient routines for regression analysis with multiple potentially high-dimensional fixed effects by applying the Frisch-Waugh-Lovell theorem. It is a faithful port of the R package fixest [@berge2018], aiming to replicate fixest’s core design principles and functionality within the Python ecosystem. Users familiar with fixest in R can seamlessly transition their analysis to Python using the same syntax and obtaining identical results. Likewise, users of PyFixest should find it easy to port their analysis to R and use the fixest package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything else PyFixEst does that fixest does not? Or could we mention the scale we support (say, to run on GPU). I think is important to emphasize it's a port but also because of a reason: the scale and how these models can integrate better in production-like environments (I know, I know you can deploy R, but all the data pipelines are in Python, so it's definitely an advantage)
|
|
||
| By contrast, the R community has benefited from specialized tools like *lfe* [@gaure2013lfe] and *fixest* [@berge2018] for high-dimensional fixed effects regression. In particular, *fixest* has introduced extremely performant and user-friendly regression software to handle multiple (potentially high-dimensional) fixed effects. Relative to *pyhdfe*'s algorithms, which are all implemented via *numpy*, the demeaning algorithm of *fixest* is orders of magnitudes faster. Beyond computational efficiency, *fixest* has introduced a rich set of features for post-estimation analysis, including methods to easily summarize and plot regression results. | ||
|
|
||
| *PyFixest* aims to faithfully implement *fixest*'s core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - *feols*, *fepois* and *feglm* - should produce identical results in R and Python. All of *fixest*'s core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by *PyFixest*. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect readers to know what VI is? We could write instrumental variables (VI)
|
|
||
| *PyFixest* aims to faithfully implement *fixest*'s core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - *feols*, *fepois* and *feglm* - should produce identical results in R and Python. All of *fixest*'s core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by *PyFixest*. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2). | ||
|
|
||
| In addition to supporting fixed-effects and instrumental variable estimation, *PyFixest* provides support for regression weights, fast poisson regression with fixed effect demeaning (@correia2020fast), quantile regression (@koenker2001quantile) and a range of modern event study estimators, including the linear projections approach (@dube2023local, @busch2023lpdid), the two-stage difference-in-differences imputation estimator (@gardner2022two, @butts2021did2s), and the fully-saturated event study estimator proposed by @sun2021estimating. The package provides comprehensive options for calculating non-standard inference, including cluster robust variance estimators (CRV1 and CRV3, see @mackinnon2023fast), wild cluster bootstrap (@roodman2019fast, @fischer2022fwildclusterboot), randomization inference (@hess2017randomization), and the causal cluster variance estimator (@abadie2023should). PyFixest also implements methods to control the family-wise error rate by implementing the Romano-Wolf correction (@clarke2020romano and @romano2005exact), and enables users to compute simultaneous confidence bands through a multiplier bootstrap approach (@montiel2019simultaneous). Additionally, *PyFixest* provides support for Gelbach's regression decomposition (@gelbach2016covariates), Lal's event study specification test (@lal2025can), and estimation strategies based on compression & sufficient statistics as described in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing key points:
- The scale (JAX on GPU for example, maybe we can even show some benchmarks)
- We also work on documentation to make the content accessible to the Python community.
- Support multiple plotting backends (matplotlib, letsplot)
- It's a community project with ca 40 contributors. We work to have clear "first-time issues" to ensure a welcoming environment for development.
We could maybe also add a teaser about the plans for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense, I'll add all of these!
| @@ -0,0 +1,64 @@ | |||
| --- | |||
| title: 'PyFixest: A Python Port of fixest for High-Dimensional Fixed-Effects Regression' | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add something more to the title, say something along the lines
PyFixest: A Python scalable Port of fixest for High-Dimensional Fixed-Effects Regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or get rid of the "this is a port" billing entirely; it isn't a port in any meaningful sense since it doesn't really use fixest's source code or port over its design principles. Suggest changing it to "pyfixest: A library for High-dimensional Fixed-Effects Regression in Python".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing it to "pyfixest: A library for High-dimensional Fixed-Effects Regression in Python".
Yep, I think this is a good suggestion. I'd keep the reference to fixest as I still think that we borrow a lot from its API and default, but also no need to make it the core point of the paper, as by now we support quite some extra things not native to fixest (gpu, qr, compression, some DiD, multiple tesrting corrections).
| --- | ||
|
|
||
|
|
||
| ## Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's hard, but we should somehow prove the already important adoption of the package. Either by the number of downloads and / or some testimonies of companies that use it (I can testify we used it for price experimentation at Wolt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think testimonies would be awesome! But very hard to get, even if we find users that confirm that pf is used in their companies, we might have to clear this with their respective legal teams? Or don't you think so?
First draft of a JOSS paper for pf.