Fix over-rejecting p-values for penalized terms (Wood 2013 Tr statistic, fixes #163) by RogerPR · Pull Request #583 · dswah/pyGAM

RogerPR · 2026-05-18T12:50:14Z

Summary

Fixes the long-standing p-value miscalibration tracked in #163 — the warning
KNOWN BUG: p-values computed in this summary are likely much smaller than they should be that currently appears on every gam.summary() call.

GAM._compute_p_value previously referenced the test statistic against a
chi-square (or F) distribution with df = rank(cov_term), i.e. the number
of basis functions for the term. For penalized fits with estimated
smoothing parameters this over-counts the effective degrees of freedom and
makes the null distribution far too tight, so noise features routinely
report p ≈ 0.

This PR replaces the implementation with the Tr statistic from Wood
(2013), "On p-values for smooth components of an extended generalized
additive model" (Biometrika 100(1), 221–228), which is what
mgcv::summary.gam uses:

Build the rank-r pseudoinverse of the term's posterior covariance from
the top-r eigencomponents, with r = round(edof_term).
Compute T_r = β_term^T V^{-r} β_term.
Reference T_r against χ²(r).

edof_term is read from the existing statistics_["edof_per_coef"],
which is already computed during fit. When edof_per_coef is shorter
than coef_ (intercept term, or more splines than samples) we fall back
to the nominal coefficient count, preserving existing behavior in those
edge cases.

Empirical validation

Pre-existing regression test (test_pvalue_rejects_useless_feature):
the "useless" np.arange feature on the wage dataset now reports
p ≈ 0.84 (was p ≈ 5 × 10⁻²⁹⁷ on the buggy implementation that originally
prompted #163).

Real signals stay highly significant: on wage, s(year) + s(age) + f(education) reports p-values < 10⁻¹² for every real term.

Calibration under H₀ — new pygam/tests/test_pvalue.py runs 100
seeded simulations per scenario and checks that the false-positive rate
sits in [1%, 10%] around the nominal 5%:

scenario	FPR / Power
univariate noise, `lam=0`	~5%
univariate noise, `lam=0.6`	~5%
univariate noise, `n_splines=25`	~5%
two-term noise, both terms	~5%
spline + factor on noise	~5%
near-collinear predictors	~5%
strong sine signal (power)	≥95%

Cross-check against mgcv (50 trials × 3 penalties on y = 0.3·sin(x) + N(0,1)
with n=200, x ∈ [0,10]; for each trial R's sp was optimized so
sum(model$edf) matched pyGAM's statistics_["edof"], then
summary.gam()$s.table[,"p-value"] was read off):

lam	mean \|p_pygam − p_R\|	decision agreement	Pearson r
0.6	0.053	84%	0.974
1.0	0.073	92%	0.904
3.0	0.092	78%	0.893

(harness not included in the PR — happy to share on request)

Residual gap to mgcv comes from differences in basis/penalty
parameterization and from mgcv's frequentist-covariance refinement; both
are out of scope for this PR.

Test plan

pytest pygam/ — 170 passed, 1 skipped (was 169/1 fail/1 skip before)
pytest pygam/tests/test_pvalue.py -v — 8 passed
ruff check pygam/pygam.py pygam/tests/test_pvalue.py — clean
ruff format --check pygam/pygam.py pygam/tests/test_pvalue.py — clean
Spot-checked gam.summary() on the wage dataset — output is sensible

Replaces _compute_p_value with the Tr test from Wood (2013, Biometrika 100(1)), as used by mgcv::summary.gam. The rank of the term's covariance pseudoinverse is taken as round(edof_term) instead of the matrix rank; the statistic is referenced against chi-square with that many df. Resolves the over-rejection described in dswah#163: previously, terms with estimated smoothing parameters could report p ≈ 0 even when the underlying effect was pure noise. With the corrected df, FPR on realistic multi-term fits is back near the nominal 5% level, and a 50-trial comparison against mgcv shows Pearson correlations of 0.89-0.97 with mean |p_pygam - p_R| of 0.05-0.09 across lam in {0.6, 1.0, 3.0}. Adds pygam/tests/test_pvalue.py with FPR/power calibration tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix over-rejecting p-values for penalized terms (Wood 2013 Tr statistic, fixes #163)#583

Fix over-rejecting p-values for penalized terms (Wood 2013 Tr statistic, fixes #163)#583
RogerPR wants to merge 1 commit into
dswah:mainfrom
RogerPR:pval_update

RogerPR commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RogerPR commented May 18, 2026

Summary

Empirical validation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant