-
Notifications
You must be signed in to change notification settings - Fork 179
Description
Describe the bug
ChiSquared._energy_x in skpro/distributions/chi_squared.py uses a mathematically incorrect identity in its closed-form energy (CRPS) formula. The code on line 186 computes chi2.cdf(xi, k + 1), but the correct partial expectation identity for the chi-squared distribution requires chi2.cdf(xi, k + 2).
This causes the cross-energy E[|X - x|] to return silently wrong values with errors up to 48.8%, directly corrupting CRPS scores for any chi-squared-based probabilistic prediction.
The _energy_self method (which uses numerical quadrature) is not affected — only the closed-form _energy_x is wrong.
To Reproduce
import numpy as np
from scipy.integrate import quad
from scipy.stats import chi2
from skpro.distributions.chi_squared import ChiSquared
k, x_val = 5, 5.0
d = ChiSquared(dof=k)
skpro_val = float(np.asarray(d._energy_x(np.array([[x_val]]))).flat[0])
truth, _ = quad(lambda t: abs(t - x_val) * chi2.pdf(t, k), 0, np.inf, limit=500)
print(f"skpro _energy_x: {skpro_val:.6f}") # 1.279329
print(f"numerical truth: {truth:.6f}") # 2.440830
print(f"error: {abs(skpro_val - truth)/truth*100:.1f}%") # 47.6%Full reproduction with multiple (dof, x) pairs attached as screenshot below.
Expected behavior
_energy_x should match direct numerical integration. The corrected formula (using k+2) matches to machine precision (< 1e-12 relative error).
Environment
- OS: macOS
- Python: 3.11
- skpro: latest main branch
Additional context
The bug is on [line 186 of chi_squared.py](https://github.com/sktime/skpro/blob/main/skpro/distributions/chi_squared.py#L186):
cdf_k1 = chi2.cdf(xi, k + 1) # WRONG: should be k + 2Why k+2 is correct:
The chi-squared PDF satisfies:
t · f(t; k) = k · f(t; k + 2)
This follows from Γ(k/2 + 1) = (k/2) · Γ(k/2) and 2^((k+2)/2) = 2 · 2^(k/2). Therefore:
∫₀ˣ t · f(t; k) dt = k · F(x; k+2) ← NOT k+1
The code uses F(x; k+1), which has no mathematical justification.