JOSS Paper for PyFixest #885

s3alfisc · 2025-04-28T19:35:54Z

First draft of a JOSS paper for pf.

codecov · 2025-04-28T19:43:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

❗ There is a different number of reports uploaded between BASE (d3ed85c) and HEAD (fe442ac). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (d3ed85c) HEAD (fe442ac)

tests-extended 1 0

Flag	Coverage Δ
core-tests	`76.82% <ø> (-3.53%)`	⬇️
tests-extended	`?`
tests-vs-r	`15.80% <ø> (-31.30%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 47 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

juanitorduz · 2025-04-29T18:42:33Z

Maybe we can use a quarto document ? (not strong option, but I think it compiles to latex right?)

juanitorduz · 2025-08-24T20:03:24Z

joss_paper/pyfixest_joss.md

+
+## Summary
+
+PyFixest is an open-source Python library that implements efficient routines for regression analysis with multiple potentially high-dimensional fixed effects by applying the Frisch-Waugh-Lovell theorem. It is a faithful port of the R package fixest [@berge2018], aiming to replicate fixest’s core design principles and functionality within the Python ecosystem. Users familiar with fixest in R can seamlessly transition their analysis to Python using the same syntax and obtaining identical results. Likewise, users of PyFixest should find it easy to port their analysis to R and use the fixest package.


Is there anything else PyFixEst does that fixest does not? Or could we mention the scale we support (say, to run on GPU). I think is important to emphasize it's a port but also because of a reason: the scale and how these models can integrate better in production-like environments (I know, I know you can deploy R, but all the data pipelines are in Python, so it's definitely an advantage)

juanitorduz · 2025-08-24T20:06:01Z

joss_paper/pyfixest_joss.md

+
+By contrast, the R community has benefited from specialized tools like *lfe* [@gaure2013lfe] and *fixest* [@berge2018] for high-dimensional fixed effects regression. In particular, *fixest* has introduced extremely performant and user-friendly regression software to handle multiple (potentially high-dimensional) fixed effects. Relative to *pyhdfe*'s algorithms, which are all implemented via *numpy*, the demeaning algorithm of *fixest* is orders of magnitudes faster. Beyond computational efficiency, *fixest* has introduced a rich set of features for post-estimation analysis, including methods to easily summarize and plot regression results.
+
+*PyFixest* aims to faithfully implement *fixest*'s core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - *feols*, *fepois* and *feglm* - should produce identical results in R and Python. All of *fixest*'s core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by *PyFixest*. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2).


Do we expect readers to know what VI is? We could write instrumental variables (VI)

juanitorduz · 2025-08-24T20:10:08Z

joss_paper/pyfixest_joss.md

+
+*PyFixest* aims to faithfully implement *fixest*'s core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - *feols*, *fepois* and *feglm* - should produce identical results in R and Python. All of *fixest*'s core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by *PyFixest*. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2).
+
+In addition to supporting fixed-effects and instrumental variable estimation, *PyFixest* provides support for regression weights, fast poisson regression with fixed effect demeaning (@correia2020fast), quantile regression (@koenker2001quantile) and a range of modern event study estimators, including the linear projections approach (@dube2023local, @busch2023lpdid), the two-stage difference-in-differences imputation estimator (@gardner2022two, @butts2021did2s), and the fully-saturated event study estimator proposed by @sun2021estimating. The package provides comprehensive options for calculating non-standard inference, including cluster robust variance estimators (CRV1 and CRV3, see @mackinnon2023fast), wild cluster bootstrap (@roodman2019fast, @fischer2022fwildclusterboot), randomization inference (@hess2017randomization), and the causal cluster variance estimator (@abadie2023should). PyFixest also implements methods to control the family-wise error rate by implementing the Romano-Wolf correction (@clarke2020romano and @romano2005exact), and enables users to compute simultaneous confidence bands through a multiplier bootstrap approach (@montiel2019simultaneous). Additionally, *PyFixest* provides support for Gelbach's regression decomposition (@gelbach2016covariates), Lal's event study specification test (@lal2025can), and estimation strategies based on compression & sufficient statistics as described in


I think we are missing key points:

The scale (JAX on GPU for example, maybe we can even show some benchmarks)

We also work on documentation to make the content accessible to the Python community.

Support multiple plotting backends (matplotlib, letsplot)

It's a community project with ca 40 contributors. We work to have clear "first-time issues" to ensure a welcoming environment for development.

We could maybe also add a teaser about the plans for the future?

Yep, makes sense, I'll add all of these!

juanitorduz · 2025-08-24T20:11:49Z

joss_paper/pyfixest_joss.md

@@ -0,0 +1,64 @@
+---
+title: 'PyFixest: A Python Port of fixest for High-Dimensional Fixed-Effects Regression'


I would add something more to the title, say something along the lines

PyFixest: A Python scalable Port of fixest for High-Dimensional Fixed-Effects Regression

Or get rid of the "this is a port" billing entirely; it isn't a port in any meaningful sense since it doesn't really use fixest's source code or port over its design principles. Suggest changing it to "pyfixest: A library for High-dimensional Fixed-Effects Regression in Python".

Suggest changing it to "pyfixest: A library for High-dimensional Fixed-Effects Regression in Python".

Yep, I think this is a good suggestion. I'd keep the reference to fixest as I still think that we borrow a lot from its API and default, but also no need to make it the core point of the paper, as by now we support quite some extra things not native to fixest (gpu, qr, compression, some DiD, multiple tesrting corrections).

juanitorduz · 2025-08-24T20:13:10Z

joss_paper/pyfixest_joss.md

+---
+
+
+## Summary


I know it's hard, but we should somehow prove the already important adoption of the package. Either by the number of downloads and / or some testimonies of companies that use it (I can testify we used it for price experimentation at Wolt)

Yes, I think testimonies would be awesome! But very hard to get, even if we find users that confirm that pf is used in their companies, we might have to clear this with their respective legal teams? Or don't you think so?

joss paper first attempt

cef7d9e

s3alfisc added 2 commits August 23, 2025 12:34

minor updates

23bf8c4

minor updates

fe442ac

s3alfisc merged commit b2765c8 into master Aug 23, 2025
8 checks passed

s3alfisc deleted the joss branch August 23, 2025 10:46

juanitorduz reviewed Aug 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JOSS Paper for PyFixest #885

JOSS Paper for PyFixest #885

Uh oh!

s3alfisc commented Apr 28, 2025

Uh oh!

codecov bot commented Apr 28, 2025 •

edited

Loading

Uh oh!

juanitorduz commented Apr 29, 2025

Uh oh!

Uh oh!

juanitorduz Aug 24, 2025

Uh oh!

juanitorduz Aug 24, 2025

Uh oh!

juanitorduz Aug 24, 2025

Uh oh!

s3alfisc Aug 25, 2025

Uh oh!

juanitorduz Aug 24, 2025

Uh oh!

apoorvalal Aug 24, 2025

Uh oh!

s3alfisc Aug 25, 2025

Uh oh!

juanitorduz Aug 24, 2025

Uh oh!

s3alfisc Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		## Summary

		PyFixest is an open-source Python library that implements efficient routines for regression analysis with multiple potentially high-dimensional fixed effects by applying the Frisch-Waugh-Lovell theorem. It is a faithful port of the R package fixest [@berge2018], aiming to replicate fixest’s core design principles and functionality within the Python ecosystem. Users familiar with fixest in R can seamlessly transition their analysis to Python using the same syntax and obtaining identical results. Likewise, users of PyFixest should find it easy to port their analysis to R and use the fixest package.


		By contrast, the R community has benefited from specialized tools like lfe [@gaure2013lfe] and fixest [@berge2018] for high-dimensional fixed effects regression. In particular, fixest has introduced extremely performant and user-friendly regression software to handle multiple (potentially high-dimensional) fixed effects. Relative to pyhdfe's algorithms, which are all implemented via numpy, the demeaning algorithm of fixest is orders of magnitudes faster. Beyond computational efficiency, fixest has introduced a rich set of features for post-estimation analysis, including methods to easily summarize and plot regression results.

		PyFixest aims to faithfully implement fixest's core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - feols, fepois and feglm - should produce identical results in R and Python. All of fixest's core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by PyFixest. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2).


		PyFixest aims to faithfully implement fixest's core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - feols, fepois and feglm - should produce identical results in R and Python. All of fixest's core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by PyFixest. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2).

		In addition to supporting fixed-effects and instrumental variable estimation, PyFixest provides support for regression weights, fast poisson regression with fixed effect demeaning (@correia2020fast), quantile regression (@koenker2001quantile) and a range of modern event study estimators, including the linear projections approach (@dube2023local, @busch2023lpdid), the two-stage difference-in-differences imputation estimator (@gardner2022two, @butts2021did2s), and the fully-saturated event study estimator proposed by @sun2021estimating. The package provides comprehensive options for calculating non-standard inference, including cluster robust variance estimators (CRV1 and CRV3, see @mackinnon2023fast), wild cluster bootstrap (@roodman2019fast, @fischer2022fwildclusterboot), randomization inference (@hess2017randomization), and the causal cluster variance estimator (@abadie2023should). PyFixest also implements methods to control the family-wise error rate by implementing the Romano-Wolf correction (@clarke2020romano and @romano2005exact), and enables users to compute simultaneous confidence bands through a multiplier bootstrap approach (@montiel2019simultaneous). Additionally, PyFixest provides support for Gelbach's regression decomposition (@gelbach2016covariates), Lal's event study specification test (@lal2025can), and estimation strategies based on compression & sufficient statistics as described in

		@@ -0,0 +1,64 @@
		---
		title: 'PyFixest: A Python Port of fixest for High-Dimensional Fixed-Effects Regression'

JOSS Paper for PyFixest #885

JOSS Paper for PyFixest #885

Uh oh!

Conversation

s3alfisc commented Apr 28, 2025

Uh oh!

codecov bot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

juanitorduz commented Apr 29, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Apr 28, 2025 •

edited

Loading