Skip to content

Quantile Regression #908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jun 15, 2025
Merged

Quantile Regression #908

merged 33 commits into from
Jun 15, 2025

Conversation

s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented May 20, 2025

What this PR does:

  • Implements quantile regression via a quantreg function via the Frisch-Newton Interior Point solver as described in Koenker and Ng or Portnoy & Koenker.
  • Implements a qplot method for visualization.
  • Implements nid and cluster robust errors.

Example:

%load_ext autoreload

import pyfixest as pf
data = pf.get_data(N = 1000)

fit_1_pf = pf.quantreg("Y ~ X1 + X2 + f1", data = data, quantile = 0.1)
fit_1_pf.tidy()

# Estimate	Std. Error	t value	Pr(>|t|)	2.5%	97.5%
# Coefficient						
# Intercept	-2.030300	0.000023	-90149.457722	0.0	-2.030344	-2.030256
# X1	    -0.938896	0.000097	-9704.853077	0.0	-0.939086	-0.938706
# X2	    -0.190710	0.000004	-51162.093928	0.0	-0.190718	-0.190703
# f1	     0.011888	0.000480	24.749725	0.0	0.010946	0.012831

R instead:

quantreg::rq(formula = Y ~ X1 + X2 + f1, tau = 0.1, data = data, method = "fn")
# Coefficients:
# (Intercept)          X1          X2          f1 
# -2.03029974 -0.93889613 -0.19071031  0.01188816 

@s3alfisc s3alfisc marked this pull request as draft May 20, 2025 19:38
@s3alfisc s3alfisc linked an issue May 25, 2025 that may be closed by this pull request
@apoorvalal
Copy link
Member

I'm confused about why this PR contains re-renders of the entire doc website; is that expected?

@s3alfisc
Copy link
Member Author

Ah no that is just a mess I produced because I did not check out the new branch properly. Noticed this a few days ago but then was busy developing locally. I'll clean this up, thanks!

@s3alfisc s3alfisc force-pushed the quantreg branch 2 times, most recently from 3ec25bc to 971feb8 Compare May 26, 2025 21:29
Copy link

codecov bot commented Jun 3, 2025

Codecov Report

Attention: Patch coverage is 75.66138% with 92 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyfixest/report/visualize.py 7.84% 47 Missing ⚠️
pyfixest/estimation/quantreg/quantreg_.py 73.52% 36 Missing ⚠️
pyfixest/estimation/estimation.py 83.33% 5 Missing ⚠️
pyfixest/estimation/feols_compressed_.py 50.00% 2 Missing ⚠️
pyfixest/estimation/feols_.py 90.00% 1 Missing ⚠️
pyfixest/estimation/quantreg/frisch_newton_ip.py 99.15% 1 Missing ⚠️
Flag Coverage Δ
core-tests 78.41% <75.66%> (-0.21%) ⬇️
tests-extended ?
tests-vs-r 16.48% <16.13%> (+0.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pyfixest/__init__.py 81.81% <100.00%> (ø)
pyfixest/estimation/FixestMulti_.py 80.71% <100.00%> (+0.81%) ⬆️
pyfixest/estimation/__init__.py 100.00% <100.00%> (ø)
pyfixest/estimation/demean_.py 52.54% <ø> (ø)
pyfixest/estimation/felogit_.py 88.23% <ø> (ø)
pyfixest/estimation/fepois_.py 88.73% <100.00%> (ø)
pyfixest/estimation/literals.py 86.66% <100.00%> (+0.95%) ⬆️
pyfixest/estimation/quantreg/__init__.py 100.00% <100.00%> (ø)
pyfixest/estimation/quantreg/utils.py 100.00% <100.00%> (ø)
pyfixest/estimation/solvers.py 100.00% <100.00%> (ø)
... and 8 more

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@s3alfisc s3alfisc marked this pull request as ready for review June 13, 2025 19:35
@apoorvalal
Copy link
Member

prototyped a couple of other solvers [based on reading the QR chapter of Pouliot's book - although his recommended solution of Barrodale-Roberts was finicky in my experiments so i got rid of it, it is unlikely to beat interior-point methods anyway].

Don't want to touch your branch since there's a lot going on, but new solvers can likely go in estimation/quantreg/

https://gist.github.com/apoorvalal/3e18eea79c6e9e8e8ee380e0fc0bab1f

Upshot is FN is very fast, and google's glop is next in speed. These simulations can go into the vignette

image

@s3alfisc
Copy link
Member Author

Oh that's very cool to see! I initially started out with scipy's solver (that's when I still thought adding qr support would be a one-weekend task) but soon realized it was orders of magnitude slower than the R implementation. Then I started looking at the Interior Point solver and there we go ...

Would it be a lot of work to add benchmarks against statsmodels? They implement a IWLS solver, and in my experience, it's quite fast for small problems.

Re this branch - I'll have to do one last round of reviewing (hopefully will get there tomorrow) and then I'll merge it. Then we can open a new PR with alternative solvers? By the design of the Quantreg class, it should be fairly easy.

@s3alfisc
Copy link
Member Author

Btw, if scikit really relies on scipy.linprog, maybe we should open a PR over there eventually?

@apoorvalal
Copy link
Member

apoorvalal commented Jun 15, 2025

image

turns out SM's IRLS is slower than LP for small problems but scales better with problem size. gist is updated.

Agree that this can be a separate followup PR; lmk when you merge [I implicitly ended up testing your interior point solver against existing solvers anyway - assume you'll test against quantreg in R in the unit tests?]

and yea scikit uses linprog and basically implements the textbook solution almost exactly ; assume the small speedup relative to my numpy implementation is due to the use of sparse matrices. Maybe worth contributing but i feel like scikit is meant to have reference implementations rather than fast production-ready ones anyway so idk how receptive they'll be.

@s3alfisc
Copy link
Member Author

assume you'll test against quantreg in R in the unit tests?

Yep, wherever possible =)

assume the small speedup relative to my numpy implementation is due to the use of sparse matrices

The FN implementation in pyfixest can easily be translated to accommodate sparse matrices - would also be a follow up PR (the reference paper in fact has "sparse" in its title).

Maybe worth contributing but i feel like scikit is meant to have reference implementations rather than fast production-ready ones anyway so idk how receptive they'll be.

You might be right, but worth asking nevertheless I think!

Btw will merge after the current CI run passes.

@s3alfisc s3alfisc merged commit b3b82a0 into master Jun 15, 2025
6 of 9 checks passed
@s3alfisc s3alfisc deleted the quantreg branch June 15, 2025 10:30
damandhaliwal pushed a commit to damandhaliwal/pyfixest that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quantile Regression Support
2 participants