Skip to content

Conversation

@s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented Dec 20, 2025

Problem: import pyfixest took ~2.5 seconds because all submodules were eagerly loaded at package import, even when users only needed a subset of functionality.

Solution: Replace eager imports with lazy loading.

Modules are now loaded on-demand when accessed (e.g., pf.feols loads estimation, pf.did2s loads did).

To enable this, moves estimation APIs - feols, fepois, feglm, quantreg to standalone scripts in pyfixest.estimation.api.

Closes #1095 .

@s3alfisc
Copy link
Member Author

We could make it even lazyer but maybe that would be overkill @apoorvalal?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements lazy loading to improve package import performance by replacing eager imports with on-demand module loading. The change reduces initial import time from ~2.5 seconds by deferring submodule loading until they are actually accessed.

Key Changes:

  • Replaced direct imports with a __getattr__ hook that loads modules and functions on first access
  • Created mapping dictionaries (_submodules and _lazy_imports) to define which modules contain which functions
  • Moved version handling to the top while keeping it eager since it's inexpensive

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 43 to 48

def __getattr__(name: str):
if name in _submodules:
return _importlib.import_module(f"pyfixest.{name}")
if name in _lazy_imports:
module = _importlib.import_module(_lazy_imports[name])
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __getattr__ function will repeatedly import the same module for each attribute access. Consider caching the imported modules in a dictionary (e.g., sys.modules already does this, but you could add explicit memoization) to avoid redundant import_module calls when accessing the same attribute multiple times.

Suggested change
def __getattr__(name: str):
if name in _submodules:
return _importlib.import_module(f"pyfixest.{name}")
if name in _lazy_imports:
module = _importlib.import_module(_lazy_imports[name])
# Cache for lazily imported submodules to avoid repeated import_module calls
_import_cache = {}
def _load_module(module_name: str):
"""Load a module once and reuse it for subsequent accesses."""
module = _import_cache.get(module_name)
if module is None:
module = _importlib.import_module(module_name)
_import_cache[module_name] = module
return module
def __getattr__(name: str):
if name in _submodules:
return _load_module(f"pyfixest.{name}")
if name in _lazy_imports:
module = _load_module(_lazy_imports[name])

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required because of Python's sys.modules caching

Comment on lines +53 to +54
def __dir__():
return __all__
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __dir__ function returns __all__ but __all__ is defined after the function definition. While Python's module-level code executes sequentially, this could fail if __dir__() is called during module initialization before line 57 executes. Consider moving the __all__ definition before the __dir__ function, or make __dir__ return a combination of _submodules and _lazy_imports.keys() to be independent of __all__.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is cleaner to move __all__ to the top?

@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

❌ Patch coverage is 91.45299% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyfixest/estimation/api/quantreg.py 82.60% 8 Missing ⚠️
pyfixest/__init__.py 80.95% 4 Missing ⚠️
pyfixest/estimation/api/utils.py 95.06% 4 Missing ⚠️
pyfixest/estimation/api/feglm.py 89.65% 3 Missing ⚠️
pyfixest/estimation/api/fepois.py 96.00% 1 Missing ⚠️
Flag Coverage Δ
core-tests 75.04% <91.45%> (+0.15%) ⬆️
tests-extended ?
tests-vs-r 17.51% <30.76%> (+0.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pyfixest/did/did2s.py 89.62% <100.00%> (ø)
pyfixest/did/lpdid.py 94.62% <100.00%> (ø)
pyfixest/did/saturated_twfe.py 16.55% <100.00%> (ø)
pyfixest/did/twfe.py 80.76% <100.00%> (ø)
pyfixest/estimation/__init__.py 100.00% <100.00%> (ø)
pyfixest/estimation/api/__init__.py 100.00% <100.00%> (ø)
pyfixest/estimation/api/feols.py 100.00% <100.00%> (ø)
pyfixest/estimation/ccv.py 10.20% <100.00%> (-85.72%) ⬇️
pyfixest/estimation/feiv_.py 86.79% <ø> (ø)
pyfixest/estimation/feols_.py 86.80% <ø> (-4.59%) ⬇️
... and 6 more

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@s3alfisc s3alfisc requested a review from leostimpfle December 21, 2025 17:23
if name in _submodules:
return _importlib.import_module(f"pyfixest.{name}")
if name in _lazy_imports:
module = _importlib.import_module(_lazy_imports[name])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we load the whole parent module rather than name? Couldn't we use _importlib.import_module(f"{_lazy_imports[name]}.{name}")?

Comment on lines +53 to +54
def __dir__():
return __all__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is cleaner to move __all__ to the top?

@leostimpfle leostimpfle self-requested a review January 1, 2026 11:00
@s3alfisc
Copy link
Member Author

s3alfisc commented Jan 1, 2026

The API change is breaking the maketables integration ... 😶‍🌫️ will have to open a PR there before merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

speed up library imports with laziness

3 participants