Skip to content

fix(core): return deepcopy from get_params to prevent state mutation#524

Open
pankajbaid567 wants to merge 4 commits into
dswah:mainfrom
pankajbaid567:fix/issue-522-get-params-mutable-references
Open

fix(core): return deepcopy from get_params to prevent state mutation#524
pankajbaid567 wants to merge 4 commits into
dswah:mainfrom
pankajbaid567:fix/issue-522-get-params-mutable-references

Conversation

@pankajbaid567
Copy link
Copy Markdown

PR Title: Fix: get_params exposes mutable nested objects (#522)

Description

This PR fixes issue #522, where GAM.get_params() exposes direct references to the Term and TermList objects through the parameter dictionary. Because of this, modifications to the returned dictionary such as params["terms"][0].lam = ... would silently mutate the underlying model's state, violating the scikit-learn estimator contract.

This is fixed by updating Core.get_params() to return copy.deepcopy() of the dictionary and nested lists/objects rather than returning direct referential dictionaries.

Changes Made

  • pygam/core.py: Added import copy and wrapped returned variables from Core.get_params() in copy.deepcopy().
  • pygam/tests/test_get_params.py: Introduced a new test file validating that external mutations to the dictionary returned by get_params do not change the underlying structure.

Testing

  • Extended the test suite with test_get_params.py (verified it appropriately captures parameter changes without model changes).
  • Ran full regression testing suite (pytest pygam/tests); all passes.
  • Fully compatible with sklearn's clone logic.

Closes #522

Fixes dswah#522 where get_params exposed direct references to nested objects
like Term and TermList causing params['terms'][0].lam = ... to silently
mutate the underlying estimator configuration.

Returns copy.deepcopy of parameter dictionary and objects.
@dkstlzk
Copy link
Copy Markdown

dkstlzk commented Mar 13, 2026

Nice fix and good regression test!

Also thanks for picking up the issue reported in #522.

One potential concern is the use of copy.deepcopy on all parameters.
For larger models this may become expensive since get_params() is often called inside model selection utilities.
Would it make sense to deepcopy only the nested mutable objects (e.g. terms) instead of the entire parameter dictionary?

Address PR review feedback: replace blanket copy.deepcopy() on all
parameter values with a _maybe_deepcopy() helper that only copies
mutable types (list, dict, ndarray, Core subclasses), returning
immutable scalars (int, float, bool, str, None) as-is.

This avoids unnecessary overhead when get_params() is called
frequently during model selection (e.g. GridSearchCV).

Also add tests verifying scalar identity and list copy behavior.
@pankajbaid567
Copy link
Copy Markdown
Author

Thanks for the thoughtful review, @dkstlzk ! Great point

deepcopy on every parameter is indeed wasteful, especially in model selection loops like GridSearchCV where get_params() is called frequently.
I've updated the implementation (commit d47dcc7) to use a _maybe_deepcopy()helper that only deepcopies mutable types (list, dict , ndarray, and Core subclasses like Term/ TermList).
Immutable scalars (int, float, bool, str, None) are now returned as-is with zero copy overhead.

Also added two new tests to verify:
Scalar params preserve identity (no unnecessary copy)
Mutable params like callbacks are still independent copies
Full test suite passes (165/165). Let me know if you have any further suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] get_params() exposes mutable nested objects allowing external state mutation

2 participants