Skip to content

Fix Spline Identifiability using Sum-to-Zero Constraints (QR re-parameterization)#531

Open
Subham-KRLX wants to merge 3 commits into
dswah:mainfrom
Subham-KRLX:esoc-2026-identifiability-fix
Open

Fix Spline Identifiability using Sum-to-Zero Constraints (QR re-parameterization)#531
Subham-KRLX wants to merge 3 commits into
dswah:mainfrom
Subham-KRLX:esoc-2026-identifiability-fix

Conversation

@Subham-KRLX
Copy link
Copy Markdown

The Problem Overlapping spline terms and intercepts cause rank-deficient model matrices and numerical instability. The current "coefficient centering" hack is unreliable and triggers persistent identifiability warnings.

The Fix Implements Sum-to-Zero constraints via QR re-parameterization (Wood, 2017).

Ensures a full-rank model matrix and a uniquely identified intercept.
Transforms both the basis matrix $X$ and penalty matrix $P$ into a reduced-rank space ($n-1$).
Suppresses unnecessary identifiability warnings in summary().

Key Changes terms.py & pygam.py: Automated detection of intercept/spline overlap, QR transformation logic, and improved p-value calculations.

Verification: All 162 pytest cases passed; reproduction script confirmed no more rank-deficiency warnings.

Fixes #530

Copy link
Copy Markdown

@hritikkumarpradhan hritikkumarpradhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Subham-KRLX, fantastic work implementing the Wood (2017) QR re-parameterization here. I had this flagged as a major mathematical bottleneck for overparameterized models, so it's great to see this constraint being natively applied!

I pulled your branch locally to run a mathematical audit against varied data distributions.

Testing Performed:

  • Identifiability Check: Ran the updated SplineTerm against a highly collinear, rank-deficient matrix. The legacy coefficient-centering warnings are successfully eliminated.
  • GCV Stability: Cross-checked the Generalized Cross-Validation scores against the main branch. The re-parameterization holds the Sum-to-Zero constraint perfectly without artificially biasing the underlying model fit or tuning parameters.
  • Summary Outputs: Verified that the .summary() outputs remain stable.

The matrix math looks incredibly solid. I've approved the review from my end!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposing Sum-to-Zero Constraints for Spline Identifiability

2 participants