Fix Spline Identifiability using Sum-to-Zero Constraints (QR re-parameterization)#531
Open
Subham-KRLX wants to merge 3 commits into
Open
Fix Spline Identifiability using Sum-to-Zero Constraints (QR re-parameterization)#531Subham-KRLX wants to merge 3 commits into
Subham-KRLX wants to merge 3 commits into
Conversation
hritikkumarpradhan
approved these changes
Apr 9, 2026
hritikkumarpradhan
left a comment
There was a problem hiding this comment.
Hi @Subham-KRLX, fantastic work implementing the Wood (2017) QR re-parameterization here. I had this flagged as a major mathematical bottleneck for overparameterized models, so it's great to see this constraint being natively applied!
I pulled your branch locally to run a mathematical audit against varied data distributions.
Testing Performed:
- Identifiability Check: Ran the updated SplineTerm against a highly collinear, rank-deficient matrix. The legacy coefficient-centering warnings are successfully eliminated.
- GCV Stability: Cross-checked the Generalized Cross-Validation scores against the main branch. The re-parameterization holds the Sum-to-Zero constraint perfectly without artificially biasing the underlying model fit or tuning parameters.
- Summary Outputs: Verified that the
.summary()outputs remain stable.
The matrix math looks incredibly solid. I've approved the review from my end!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem Overlapping spline terms and intercepts cause rank-deficient model matrices and numerical instability. The current "coefficient centering" hack is unreliable and triggers persistent identifiability warnings.
The Fix Implements Sum-to-Zero constraints via QR re-parameterization (Wood, 2017).
Ensures a full-rank model matrix and a uniquely identified intercept.$X$ and penalty matrix $P$ into a reduced-rank space ($n-1$ ).
Transforms both the basis matrix
Suppresses unnecessary identifiability warnings in summary().
Key Changes terms.py & pygam.py: Automated detection of intercept/spline overlap, QR transformation logic, and improved p-value calculations.
Verification: All 162 pytest cases passed; reproduction script confirmed no more rank-deficiency warnings.
Fixes #530