Skip to content

Conversation

@s3alfisc
Copy link
Member

First draft of a JOSS paper for pf.

@codecov
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

❗ There is a different number of reports uploaded between BASE (d3ed85c) and HEAD (fe442ac). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d3ed85c) HEAD (fe442ac)
tests-extended 1 0
Flag Coverage Δ
core-tests 76.82% <ø> (-3.53%) ⬇️
tests-extended ?
tests-vs-r 15.80% <ø> (-31.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 47 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juanitorduz
Copy link
Contributor

Maybe we can use a quarto document ? (not strong option, but I think it compiles to latex right?)

@s3alfisc s3alfisc merged commit b2765c8 into master Aug 23, 2025
8 checks passed
@s3alfisc s3alfisc deleted the joss branch August 23, 2025 10:46

## Summary

PyFixest is an open-source Python library that implements efficient routines for regression analysis with multiple potentially high-dimensional fixed effects by applying the Frisch-Waugh-Lovell theorem. It is a faithful port of the R package fixest [@berge2018], aiming to replicate fixest’s core design principles and functionality within the Python ecosystem. Users familiar with fixest in R can seamlessly transition their analysis to Python using the same syntax and obtaining identical results. Likewise, users of PyFixest should find it easy to port their analysis to R and use the fixest package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything else PyFixEst does that fixest does not? Or could we mention the scale we support (say, to run on GPU). I think is important to emphasize it's a port but also because of a reason: the scale and how these models can integrate better in production-like environments (I know, I know you can deploy R, but all the data pipelines are in Python, so it's definitely an advantage)


By contrast, the R community has benefited from specialized tools like *lfe* [@gaure2013lfe] and *fixest* [@berge2018] for high-dimensional fixed effects regression. In particular, *fixest* has introduced extremely performant and user-friendly regression software to handle multiple (potentially high-dimensional) fixed effects. Relative to *pyhdfe*'s algorithms, which are all implemented via *numpy*, the demeaning algorithm of *fixest* is orders of magnitudes faster. Beyond computational efficiency, *fixest* has introduced a rich set of features for post-estimation analysis, including methods to easily summarize and plot regression results.

*PyFixest* aims to faithfully implement *fixest*'s core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - *feols*, *fepois* and *feglm* - should produce identical results in R and Python. All of *fixest*'s core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by *PyFixest*. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect readers to know what VI is? We could write instrumental variables (VI)


*PyFixest* aims to faithfully implement *fixest*'s core functionality - efficient routines for OLS, IV, and Poisson regression with fixed effects - syntax, and post-estimation functionality. Identical input arguments to the main estimation functions that both packages share - *feols*, *fepois* and *feglm* - should produce identical results in R and Python. All of *fixest*'s core defaults, including the choice of variance covariance matrices, small sample corrections, handling of singleton fixed effects, and the treatment of multicollinear variables are preserved by *PyFixest*. To ensure identical behavior, both libraries are thoroughly tested against each other using rpy2 (gautier2008rpy2).

In addition to supporting fixed-effects and instrumental variable estimation, *PyFixest* provides support for regression weights, fast poisson regression with fixed effect demeaning (@correia2020fast), quantile regression (@koenker2001quantile) and a range of modern event study estimators, including the linear projections approach (@dube2023local, @busch2023lpdid), the two-stage difference-in-differences imputation estimator (@gardner2022two, @butts2021did2s), and the fully-saturated event study estimator proposed by @sun2021estimating. The package provides comprehensive options for calculating non-standard inference, including cluster robust variance estimators (CRV1 and CRV3, see @mackinnon2023fast), wild cluster bootstrap (@roodman2019fast, @fischer2022fwildclusterboot), randomization inference (@hess2017randomization), and the causal cluster variance estimator (@abadie2023should). PyFixest also implements methods to control the family-wise error rate by implementing the Romano-Wolf correction (@clarke2020romano and @romano2005exact), and enables users to compute simultaneous confidence bands through a multiplier bootstrap approach (@montiel2019simultaneous). Additionally, *PyFixest* provides support for Gelbach's regression decomposition (@gelbach2016covariates), Lal's event study specification test (@lal2025can), and estimation strategies based on compression & sufficient statistics as described in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing key points:

  • The scale (JAX on GPU for example, maybe we can even show some benchmarks)
  • We also work on documentation to make the content accessible to the Python community.
  • Support multiple plotting backends (matplotlib, letsplot)
  • It's a community project with ca 40 contributors. We work to have clear "first-time issues" to ensure a welcoming environment for development.

We could maybe also add a teaser about the plans for the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense, I'll add all of these!

@@ -0,0 +1,64 @@
---
title: 'PyFixest: A Python Port of fixest for High-Dimensional Fixed-Effects Regression'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something more to the title, say something along the lines

PyFixest: A Python scalable Port of fixest for High-Dimensional Fixed-Effects Regression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or get rid of the "this is a port" billing entirely; it isn't a port in any meaningful sense since it doesn't really use fixest's source code or port over its design principles. Suggest changing it to "pyfixest: A library for High-dimensional Fixed-Effects Regression in Python".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest changing it to "pyfixest: A library for High-dimensional Fixed-Effects Regression in Python".

Yep, I think this is a good suggestion. I'd keep the reference to fixest as I still think that we borrow a lot from its API and default, but also no need to make it the core point of the paper, as by now we support quite some extra things not native to fixest (gpu, qr, compression, some DiD, multiple tesrting corrections).

---


## Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's hard, but we should somehow prove the already important adoption of the package. Either by the number of downloads and / or some testimonies of companies that use it (I can testify we used it for price experimentation at Wolt)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think testimonies would be awesome! But very hard to get, even if we find users that confirm that pf is used in their companies, we might have to clear this with their respective legal teams? Or don't you think so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants