Skip to content

Fix memory blowup in GAM.gridsearch over large lambda grids#581

Open
RogerPR wants to merge 1 commit into
dswah:mainfrom
RogerPR:gridsearch-memory-fix
Open

Fix memory blowup in GAM.gridsearch over large lambda grids#581
RogerPR wants to merge 1 commit into
dswah:mainfrom
RogerPR:gridsearch-memory-fix

Conversation

@RogerPR

@RogerPR RogerPR commented May 13, 2026

Copy link
Copy Markdown

Summary

Closses issue: #242

GAM.gridsearch previously kept every fitted candidate model in memory
for the whole duration of the search, even when the caller did not need
them. For large lam grids this caused memory usage to grow linearly
with the grid size and could exhaust available RAM.

This PR keeps only the models that are actually needed:

  • best_model — to copy back into self when keep_best=True.
  • last_model — used as the warm-start coefficients for the next fit.
  • The full models list is only populated when return_scores=True,
    since that branch returns it to the caller.

In addition, fresh candidate models are now instantiated via
self.__class__(**self.get_params()) instead of deepcopy(self).
deepcopy was eagerly copying any large fitted state attached to
self, contributing to peak memory.

The public API is unchanged: gridsearch returns self by default and
the OrderedDict[model -> score] when return_scores=True, exactly as
before.

Changes

  • pygam/pygam.pyGAM.gridsearch: only retain models when
    return_scores=True; track last_model separately for warm-start;
    build candidates from ModelClass(**base_params) instead of
    deepcopy(self).
  • pygam/tests/test_memory_leak_gridsearch.py — new regression test
    that runs a 100-point gridsearch and asserts peak RSS stays bounded
    and end-of-run RSS does not grow beyond a tolerance.
  • pyproject.toml — adds psutil to the [dev] extras (used by the
    new test).

Test plan

  • pytest — 163 passed, 1 skipped (full suite green locally).
  • pre-commit run --files <changed files> — all hooks pass
    (ruff-format, ruff, trailing-whitespace, end-of-file-fixer).
  • Existing return_scores=True callers (covered by
    test_gridsearch_returns_scores, test_gridsearch_keep_best,
    test_no_cartesian_product, etc.) still pass — the
    OrderedDict[model -> score] return contract is preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant