Skip to content

Add GenPareto and ExtGenPareto (Generalized Pareto) distributions#689

Open
maresb wants to merge 37 commits into
pymc-devs:mainfrom
maresb:add-gpd-extgpd-distributions
Open

Add GenPareto and ExtGenPareto (Generalized Pareto) distributions#689
maresb wants to merge 37 commits into
pymc-devs:mainfrom
maresb:add-gpd-extgpd-distributions

Conversation

@maresb
Copy link
Copy Markdown
Collaborator

@maresb maresb commented May 31, 2026

Summary

Adds two continuous distributions to pymc_extras.distributions:

  • GenPareto — the Generalized Pareto distribution (GPD), the peaks-over-threshold counterpart of GenExtreme. Parametrized as in Coles (2001) / SciPy's genpareto (c = ξ).
  • ExtGenPareto — the extended GPD of Naveau et al. (2016): F = H**κ with H the GPD CDF, adding a lower-tail shape κ while ξ keeps controlling the upper tail (κ = 1 recovers the GPD).

Each provides stable pure-PyTensor logp / logcdf / logccdf / icdf kernels, a SymbolicRandomVariable (inverse-CDF sampling), a support_point, and a probability-integral default transform so they work as latent variables. Both are exported from pymc_extras.distributions and added to the Sphinx API reference.

Supersedes #638

This reimplements #638 following the review there: the RVs are now SymbolicRandomVariables doing inverse-CDF sampling, so the random methods work across backends without SciPy. (The original GPD attempt was #294.)

Earlier review feedback addressed

Numerical behaviour

The kernels stay finite and accurate deep into the tails and near the bounded ξ < 0 wall (down to the documented float64 representability limits); the probability-integral default transform avoids the ξ = 0 Jacobian kink an Interval transform would introduce (the headline reason for it); and boundary / non-finite-parameter behaviour follows PyMC's own conventions (e.g. a non-finite shape gives NaN like pm.Gamma, a non-finite scale gives -inf like pm.Normal). All of this is covered in tests/distributions/test_continuous.py.

maresb and others added 2 commits May 31, 2026 11:14
…stributions

Adds two continuous distributions to pymc_extras.distributions, structured in
three layers so the probability math is forward-facing and reusable:

1. Pure PyTensor functions (gen_pareto_logp/_logcdf/_icdf and the ExtGenPareto
   variants) that depend only on pytensor, so they can be reused directly or
   lifted into pytensor-distributions unchanged.
2. SymbolicRandomVariable Ops that sample by inverse-CDF on a uniform draw, so
   the random methods work on every backend without a SciPy object-mode path.
3. Thin Continuous distribution classes wiring the two together.

The xi -> 0 limit (GPD collapsing to the exponential) is handled with
C1-smooth log1p(u)/u and expm1(u)/u helpers instead of switch(isclose(xi, 0)),
keeping both the value and its gradient continuous through the limit -- the
property NUTS relies on near xi = 0.

ExtGenPareto is the Naveau et al. (2016) extended GPD with carrier G(v) = v**k;
kappa = 1 recovers the plain GPD.

Includes logp/logcdf/icdf checks against scipy.genpareto and numpy references,
support-point (median) tests, RV size/param and KS goodness-of-fit tests, and
explicit C1-smoothness-through-xi=0 tests. Registers both in
distributions/__init__.py and docs/api_reference.rst.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pymc-extras groups distribution definitions by kind (all continuous
distributions live in distributions/continuous.py, discrete in discrete.py,
etc.), not one module per distribution. The initial commit added a standalone
generalized_pareto.py / test_generalized_pareto.py, which broke that taxonomy.

Fold both distributions (their RV Ops, pure-PyTensor helper functions, and
tests) into continuous.py / test_continuous.py alongside GenExtreme, the closest
sibling. No behavioural change: the three-layer structure (pure functions ->
SymbolicRandomVariable -> Continuous class) is preserved verbatim, and the test
additions append to the existing module, leaving the GenExtreme/Chi/Maxwell
tests untouched (the module-level pytestmark gains only filterwarnings entries).

Full tests/distributions/ suite: 169 passed, 8 skipped; ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (86fac3c) to head (a22433e).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #689       +/-   ##
===========================================
+ Coverage   51.60%   90.42%   +38.81%     
===========================================
  Files          73       73               
  Lines        8003     8268      +265     
===========================================
+ Hits         4130     7476     +3346     
+ Misses       3873      792     -3081     
Files with missing lines Coverage Δ
pymc_extras/distributions/__init__.py 100.00% <100.00%> (ø)
pymc_extras/distributions/continuous.py 98.78% <100.00%> (+42.44%) ⬆️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maresb maresb marked this pull request as draft May 31, 2026 17:54
Acts on an external review of the GenPareto / ExtGenPareto distributions:

1. Boundary numerics.
   - ``_safe_mul`` maps the IEEE ``0 * inf -> nan`` of ``xi * z`` back to ``0`` so
     the xi = 0 path is finite at x = inf.
   - logp / logcdf pin ``z = +inf`` explicitly to ``-inf`` / ``0`` (for xi > 0 the
     in-support branch is ``log1p(inf)/inf -> nan``).
   - ``_gpd_upper_bound`` gives ``icdf`` the finite upper endpoint ``mu - sigma/xi``
     at ``q = 1`` for xi < 0 (was ``nan`` from ``inf * 0``); ``q = 0`` returns ``mu``.
   - icdf evaluates its interior formula on a safe value so the discarded q = 0/1
     branch never produces a nan that the graph rewriter can surface.

2. Scalar icdf. ``value`` is coerced with ``pt.as_tensor_variable`` and the methods
   use PyMC's ``check_icdf_value`` / ``check_icdf_parameters``; direct calls like
   ``GenPareto.icdf(0.0, ...)`` no longer raise ``TypeError``. q outside [0, 1]
   returns ``nan`` (the standard PyMC icdf contract).

3. Pure-PyTensor kernels. ``gen_pareto_*`` / ``ext_gen_pareto_*`` no longer call
   ``check_parameters``; validation lives entirely in the ``Continuous`` wrapper
   methods, so the kernels are genuinely reusable (e.g. in pytensor-distributions).

4. Docs. Register GenPareto and ExtGenPareto in docs/api_reference.rst.

5. Tests. Stop skipping the parameter outside-edge test: xi carries explicit
   ``(None, None)`` edges (unconstrained, no invalid probe) while sigma / kappa
   keep 0 / inf sentinels so the harness probes sigma <= 0 and kappa <= 0 (must
   raise). The xi grid stays within (-1, 1]: at xi <= -1 the GPD has a finite
   density at its closed upper endpoint (a measure-zero point the open-support
   convention here legitimately handles as -inf), and at xi > 1 the q = 0.99
   quantile (~1e10) exceeds ``check_icdf``'s absolute tolerance though it is
   correct to ~1e-15 relative. Scope the FPE RuntimeWarning filters to the GPD
   test classes via per-class ``pytestmark`` instead of muting the whole module.
   Add ``TestGenParetoBoundaries`` covering x = inf, q = 0, q = 1, q outside
   [0, 1], and invalid sigma / kappa.

tests/distributions/test_continuous.py: 39 passed; full tests/distributions/:
174 passed, 8 skipped (stable across repeated runs); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maresb maresb force-pushed the add-gpd-extgpd-distributions branch from 494cafd to 46cd2bc Compare May 31, 2026 22:11
…heavy tail

1. ``_safe_mul`` now repairs ONLY the indeterminate ``0 * inf`` (the two exact
   ``{0} x {+-inf}`` cases) instead of every NaN product. A genuine ``nan``
   parameter (e.g. ``xi = nan`` from bad input) again propagates instead of
   being silently turned into the ``xi = 0`` exponential branch. Added
   ``test_nan_xi_does_not_become_exponential``.

2. Documented the open-support convention explicitly. The Support table lists the
   bounded tail as closed, but the implementation returns ``-inf`` / ``0`` at the
   xi < 0 right endpoint. A Notes section on GenPareto (referenced from
   ExtGenPareto) now states this is exact as a density limit for -1 < xi < 0 and
   only differs from SciPy at the measure-zero boundary for xi <= -1.

3. Added ``TestGenParetoHeavyTail`` with *relative*-tolerance checks for xi > 1
   (the infinite-mean regime the absolute-tolerance ``check_icdf`` harness can't
   reach): logp / logcdf / icdf vs SciPy at rtol, plus median ``support_point``
   for GenPareto and ExtGenPareto where the mean is infinite.

4. Softened the migration wording: the pure functions are portable math kernels
   (density / CDF / quantile), not a full functional API (no pdf/sf/isf/rvs/
   moments), so they are "portable kernels", not a distribution "lifted unchanged".

tests/distributions/test_continuous.py: 46 passed; full tests/distributions/:
181 passed, 8 skipped (stable across repeated runs); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maresb maresb force-pushed the add-gpd-extgpd-distributions branch from aeed57b to f8e2483 Compare June 1, 2026 06:37
maresb and others added 10 commits June 1, 2026 07:14
Deep-dive hardening, prioritising portability to pytensor-distributions and
following the conventions of existing pymc / pymc-extras continuous distributions.

logccdf (log complementary CDF = log survival function):
- Add ``gen_pareto_logccdf`` / ``ext_gen_pareto_logccdf`` pure kernels and wire
  ``logccdf`` classmethods on both distributions. PyMC auto-registers it (like
  ``logcdf``) and ``pm.logccdf`` now works for both.
- The survival exponent is computed directly (``-m`` for GPD; ``log1mexp(kappa *
  log H)`` for the extended GPD), so it is exact in the heavy upper tail -- the
  regime where PyMC's generic ``log1mexp(logcdf)`` fallback collapses (logcdf -> 0
  there). This is the natural ``logsf`` primitive for a peaks-over-threshold model
  and the direct counterpart of pytensor-distributions' ``logsf``.
- Tests: ``check_logccdf`` vs ``scipy.genpareto.logsf`` (GPD) and a tail-stable
  reference (ExtGPD), an explicit tail-stability test out to x = 1e8,
  ``check_selfconsistency_icdf`` (cdf(icdf(q)) == q), and a kappa = 1 reduction.

Moments documentation (no methods -- matching the pymc/pymc-extras convention of
documenting moments in the Support/Mean/Variance table and exposing only
``support_point``):
- GPD table gains Median, Mode and Entropy rows (all closed-form, validated
  against scipy).
- ExtGPD table gains exact closed-form Mean and Variance via the Beta function
  ``kappa * B(kappa, 1 - xi)`` (valid xi < 1 and xi < 1/2 respectively), Median,
  and the Mode (T* = (1 + xi)/(kappa + xi) for kappa > 1, else mu), with the
  xi -> 0 digamma / trigamma limits noted. These are exact Naveau-construction
  results -- not numerical-truncation approximations -- verified numerically
  against integration to <= 1e-8. ``support_point`` remains the median.

tests/distributions/test_continuous.py: 52 passed; full tests/distributions/:
187 passed, 8 skipped (stable across repeated runs); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ms, docs

Acts on a third-pass review of the GenPareto / ExtGenPareto distributions.

1. ExtGPD logccdf far-tail collapse (blocker). ``log1mexp(kappa * log H)``
   returned -inf deep in the tail because ``log H -> 0`` underflows there, even
   though the survival is finite. Recomputed from the GPD log survival ``a =
   log(1 - H) = -m`` (exact in the tail): generic ``log1mexp(kappa * log1mexp(a))``
   when ``a`` is not tiny, and a second-order series ``log(kappa) + a +
   log1p(-(kappa-1)/2 S + ...)`` for ``a < -30``. Verified against 60-digit
   reference to <= 4e-15 across xi/kappa; e.g. xi=0, kappa=2.5, x=1000 now gives
   -999.084, not -inf. Factored out ``_gpd_log_S``.

2. Non-finite ``xi`` now propagates as nan (blocker). The support switch masked
   nan to -inf ("valid parameter, impossible value"). logp now propagates nan,
   matching logcdf/logccdf and pymc's own distributions (Normal propagates nan
   for a nan parameter; pymc never uses isfinite parameter checks). Strengthened
   the test to assert nan across logp/logcdf/logccdf, not merely "non-finite".

3. Default transforms (production blocker). Registered an ``Interval`` default
   transform on both distributions via ``bounds_fn`` returning ``(mu,
   switch(xi < 0, mu - sigma/xi, +inf))`` -- a single parameter-dependent
   expression that covers the heavy (xi >= 0, one-sided) and bounded (xi < 0,
   two-sided) regimes, and works when mu/sigma/xi are themselves random. Without
   it a latent GPD samples on all of R (constant -inf below mu). Tests confirm
   latent sampling stays in support with zero divergences and observed RVs are
   unaffected.

4. Docs corrected to match the implementation / domain:
   - Support table now shows the open right endpoint ``mu <= x < mu - sigma/xi``.
   - GPD/ExtGPD Mode noted as valid only for xi > -1 (for xi <= -1 the supremum
     is at the right endpoint, not mu).
   - ExtGPD median wording fixed: it is the GPD quantile at CDF probability
     H = 2^(-1/kappa) (equivalently survival 1 - 2^(-1/kappa)), not "survival
     2^(-1/kappa)".

tests/distributions/test_continuous.py: 58 passed; full tests/distributions/:
193 passed, 8 skipped (stable across repeated runs); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previously committed ``Interval`` default transform has a log-Jacobian that
is discontinuous in xi at 0: the bounded two-sided (sigmoid) map and the
one-sided (exp) map do not connect as the upper endpoint ``mu - sigma/xi``
diverges, so the transformed density has a ~1e12 gradient kink at xi = 0. With
xi a *random* variable this triggers divergences -- measured 163/1600 with xi
sampled across 0, versus 65 for a lower-bound-only transform. (The earlier
"verified zero divergences" only exercised *fixed* xi, where the kink never
activates -- a real gap in that check.)

Replace it with a probability-integral transform ``y = logit(F(x))``. Because F
is the family's own CDF, the transformed prior density is exactly Logistic and
free of mu/sigma/xi, so it is C1 in every parameter (no kink anywhere), while
the inverse ``icdf`` enforces the correct support -- including the moving upper
wall for xi < 0 -- for all xi. Implemented as ``_GPDProbabilityIntegralTransform``
with kappa-aware GenPareto / ExtGenPareto subclasses, registered as the default
transform for both.

Measured divergences (xi random, 1600 draws): across 0 -> 3 (was 163); xi < 0
-> 0 (was 33); xi > 0 -> 0 (was 33). Validated: transformed density integrates
to 1 for every xi, ``forward(backward(y)) == y`` exactly, and the latent
posterior mean matches the untransformed target (so the divergence reduction is
not from sampling a different distribution).

Tests gain a Jacobian-integrates-to-1 check across both classes and a direct
gradient-continuity-through-xi=0 assertion (the property an Interval transform
fails) -- the regression guard the original check lacked.

tests/distributions/test_continuous.py: 64 passed; full tests/distributions/:
199 passed, 8 skipped (stable across repeated runs); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… NaN; docs

Acts on a fourth-pass review.

Blocking -- PIT transform invalid in the positive unconstrained tail. The old
``backward`` did ``icdf(sigmoid(y))``; ``sigmoid(y)`` rounds to exactly 1 for
y >= 37 in float64, so ``icdf(1)`` returned inf/the wall and the transformed
logp became nan/-inf on valid unconstrained values. Reworked the transform to
stay in survival/log space: ``backward`` builds the GPD excess
``m = softplus(y)`` (base family) / ``-log(-expm1(-softplus(-y)/kappa))``
(extended) directly, and ``log_jac_det`` is the analytic ``log F + log S - logp``
rather than autodiffing the quantile graph. The transformed density is now
exactly Logistic and finite far into the tail. Isolated root cause for the
residual bounded-xi case: once ``backward`` materialises x at the finite wall,
``1 + xi z`` cancels to 0 (its true value is ``exp(-m)``); for xi >= 0 there is
no wall and the transform is finite to |y| ~ 1000, while the bounded xi < 0 case
is limited by float64 representability near the wall (|y| ~ 70), far past any
sampler's reach.

High -- ExtGenPareto.logccdf wrong for large kappa. The tail branch fired on
``a < -30`` but assumed ``kappa * S_gpd`` small; for large kappa it returned a
positive "log probability". Now keyed on ``log(kappa) + a < -30`` (i.e. on
``kappa * S_gpd``), verified <= 0 and accurate to <= 5e-16 vs an 80-digit
reference for kappa up to 1e20.

Medium -- NaN propagation made consistent. A non-finite xi now yields nan from
logp/logcdf/logccdf at every value (below support, in support, above support)
via a shared ``_propagate_nonfinite_shape`` helper, instead of being masked to
-inf/0 at the boundaries.

Medium -- tests now probe the tails (y up to +-45 bounded, +-400 unbounded) and
forward(backward(y)) round-trip, plus a large-kappa logccdf check -- the
coverage the previous "integrate over [-30, 30]" tests lacked.

Medium -- ExtGenPareto mode docs give the xi = 0 limit (mu - sigma ln T*).

tests/distributions/test_continuous.py: 74 passed; full tests/distributions/:
209 passed, 8 skipped (stable across repeated runs); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd xi=+-inf

Fifth-pass review cleanup -- no new mechanism, just honesty and a policy fix.

PIT transform range of validity (was overclaimed). The framework adds
logp(backward(y)) to log_jac_det(y), so the map is only valid where the quantile
x = backward(y) is *representable*. For xi <= 0 (no upper wall) x grows at most
linearly and the map is exact essentially everywhere (verified past |y| = 1000).
For a heavy upper tail x ~ exp(xi * m), so it overflows float64 at roughly
y ~ 709 / xi (e.g. y ~ 142 for xi = 5); the bounded xi < 0 case is limited the
other way by the wall's ULP near |y| ~ 60. This is a representability limit on x,
not a bug, and is far past any sampler's reach -- but the docstring no longer
claims the map is finite on all of R for every parameter. Replaced the
"finite arbitrarily far" test with one that probes each regime to its real reach
and an explicit test that the heavy-tail map overflows only past y ~ 709 / xi.

Non-finite xi policy (was inconsistent). ``_propagate_nonfinite_shape`` keyed on
``isnan(1 + xi * z)``, which is finite at x = mu (z = 0) and for several regions
when xi = +-inf, so a bad shape leaked the ordinary boundary values there. Now
keys on ``isfinite(xi)`` directly, so xi in {nan, +inf, -inf} yields nan from
logp / logcdf / logccdf at *every* value -- in support, at the endpoint x = mu,
below, and above. Test parametrized over all three non-finite shapes and all
four value regions.

tests/distributions/test_continuous.py: 79 passed; full tests/distributions/:
214 passed, 8 skipped; ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sixth-pass review. The ExtGenPareto transform's backward built the GPD excess
m = -log(S_F) from log F_ext = -softplus(-y). In the upper tail softplus(-y)
rounds to exactly 0 near y ~ 745, so log F_ext -> -0, S_F -> 0 and m -> inf even
though the quantile is still perfectly representable -- e.g. ExtGenPareto(xi=0,
kappa=2) returned -inf at y = 1000 (true x ~ 1000.69) and ExtGenPareto(xi=0.3,
kappa=2) returned nan at y = 1000 (well below the y ~ 709/xi ~ 2363 overflow
boundary). This was an early underflow, not the documented representability
limit.

Recover m from the survival side instead. With t = softplus(y) = -log S_ext
(stable for all y) and S_ext = exp(-t),
  S_F = 1 - (1 - S_ext)**(1/kappa) ~ S_ext / kappa  in the far tail,
so m -> t + log(kappa) - log(_log1p_div(-exp(-t))). That form never builds
exp(-y) before a log: when exp(-t) underflows to 0 the C1 helper returns its
limit 1 and m = t + log(kappa) stays finite until t itself overflows float64.
The exact bulk form (m = -log(1 - F_ext**(1/kappa))) and this tail form agree to
O(exp(-t)), so they are switched at t = 40 with both branches evaluated on
clamped inputs to keep the discarded branch (and its gradient) finite.

Verified against the previous (correct-in-bulk) implementation: ABS error
<= 2.2e-16 across kappa in [0.5, 100] wherever both are valid; the switch is C1
(value and gradient match to float64, gradient exactly 1 either side); the
reviewer's probes are now exact (y = 1000, xi = 0 -> 1000 + log 2; y = 1e6 still
finite); round-trip forward(backward(y)) holds to 2e-13 (xi=0.3) and 1.9e-9
(xi=0) far past y = 1000. Bumped the ExtGenPareto tail tests from y_max 700/400
to 1000 so the formerly-failing probes are now in the standing suite, and
tightened the transform docstring's "Range of validity" (xi < 0 is bounded, not
"light/exponential"; ExtGPD now reaches as far as the base family).

tests/distributions/test_continuous.py: 79 passed; full tests/distributions/:
214 passed, 8 skipped (stable x2); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Seventh-pass review. The ExtGenPareto transform's bulk/tail switch keyed on
t = softplus(y) alone, ignoring kappa. The tail approximation S_F ~ S_ext / kappa
is only valid when that ratio is small, so for tiny kappa the tail branch was
selected too early and returned a *negative* excess -- e.g. ExtGenPareto(xi=0,
kappa=1e-20) at y=40 gave an excess of ~ -6, an x below mu, and logp = -inf
instead of the Logistic value. Same class of bug as the earlier large-kappa
logccdf: the branch condition was on the wrong small quantity.

Two changes:

1. Recover the bulk excess with -log1mexp(log F_ext / kappa) instead of
   -log(-expm1(log F_ext / kappa)). They are algebraically identical, but
   log1mexp's log1p branch never forms 1 - H, so a tiny GPD survival no longer
   rounds the excess to 0. This makes the inverse exact across kappa (the bulk
   now matches the prior implementation to <= 2.2e-16 for every kappa in
   [0.5, 100], including kappa = 100 where the intermediate form had drifted to
   ~5e-6), and keeps the excess >= 0 -- in support -- for *all* kappa > 0. The
   bulk/tail switch moved to value = 700 (the bulk is exact up to the log F_ext
   underflow near 745, so it now covers every sampler-reachable y for any kappa).

2. Documented the transform's kappa domain on the class. Inverting the carrier
   needs H = F_ext ** (1/kappa) resolvable from 1, i.e. ~exp(-|y|) of headroom in
   kappa: exact for kappa down to ~exp(-|y|) (kappa >~ 1e-15 at |y| = 40,
   >~ 1e-32 at |y| = 80), so any fixed kappa > 0 is exact out to |y| ~ log(1/kappa)
   -- far past a sampler's reach. Smaller kappa underflows the carrier and the
   excess rounds toward 0 (x -> mu, still in support, logp -> -inf); it never
   leaves the support. (Small kappa also has essentially no representable lower
   tail -- |y| / kappa >~ 745 -- which is the ExtGPD concentrating toward mu.)

Tests: parametrized exactness across ten orders of magnitude in kappa (1e-2 .. 1e8,
including the kappa < 1 upper tail that used to go out of support), and a
regression that tiny kappa (1e-20 .. 1e-300) stays >= mu and never NaN.

tests/distributions/test_continuous.py: 87 passed; full tests/distributions/:
222 passed, 8 skipped (stable x2); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eighth-pass review. Last round fixed only the default transform's excess
recovery; ext_gen_pareto_icdf, ExtGenParetoRV.rv_op and support_point still used
the unstable -log(-expm1(log q / kappa)) route, so the distribution's quantile,
sampler, init point and transform disagreed in exactly the small-kappa regime the
previous commit documented. The naive form rounds the tiny GPD survival
1 - q ** (1/kappa) to 1, collapsing the excess to 0 and the quantile to mu.

Extracted the stable inverse into a shared helper
_ext_gpd_excess_from_log_prob(log_q, kappa) = -log1mexp(log_q / kappa) (whose
log1p branch never forms 1 - H) and routed all four paths through it. Concrete
effects (mu = 0):

  * icdf(0.5, kappa=0.01) now returns 7.89e-31 (= 0.5 ** 100), not 0; matches
    ref_ext_icdf to rtol 1e-9 and the transform's backward(0) exactly.
  * support_point is the representable median again, so the default initial point
    has a finite logp (was -inf) wherever the median clears mu in float64.
  * random draws stop collapsing onto mu: at kappa=0.01 the fraction landing
    exactly at mu drops from ~0.69 to ~0.001 (kappa=0.001: ~0.96 -> ~0.47, the
    remainder being genuine mass below the smallest denormal).

Ordinary kappa is unchanged (icdf still matches SciPy to ~1e-16; the support_point
and KS RNG tests pass untouched). For kappa so small that the median is itself
float64-indistinguishable from mu (or mu's ULP swamps the tiny excess) the
distribution is a numerical point mass at mu and the init logp is -inf -- the same
representability floor documented for the transform's kappa domain, now shared
consistently by all four inverses rather than disagreeing.

Tests: parametrized agreement of icdf / transform-backward / support_point / init
logp for small kappa, and a draws-do-not-collapse regression.

tests/distributions/test_continuous.py: 91 passed; full tests/distributions/:
226 passed, 8 skipped (stable x2); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ninth-pass review. support_point returned the ExtGPD median; for small kappa that
median is sub-ULP from mu (or the tiny excess is swamped by mu's ULP), so it
rounds onto mu, whose transformed value is -inf -- a broken default initial point
for a valid parameter such as kappa = 1e-4.

A support point only has to be a usable initialization, not the exact median when
the median is unrepresentable. When the median collapses onto mu, fall back to the
underlying GPD median (carrier H = 1/2, excess = log 2) -- a higher ExtGPD
quantile (F = 0.5 ** kappa) that is representably interior (mu + O(sigma)) for any
kappa. Because the transform's forward map is logcdf - logccdf in log space (never
logit(F) with F rounded to 1), the resulting initial logp is finite across the
whole kappa > 0 domain (e.g. kappa = 1e-300 starts at y ~ 691, logp ~ -691, not
-inf). At kappa = 1 the fallback equals the ExtGPD median, and for ordinary kappa
the median never collapses, so support_point is unchanged there (the existing
support_point and the small-kappa median tests pass untouched).

Documented on the transform that the kappa underflow floor affects only deep-tail
*evaluation* accuracy, not the initial point, which is now finite for all kappa.

Tests: support_point falls back to the GPD median and yields a finite initial logp
for kappa in {1e-4, 1e-8, 1e-300}.

tests/distributions/test_continuous.py: 94 passed; full tests/distributions/:
229 passed, 8 skipped (stable x2); ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tenth-pass review (docs only, no behaviour change).

The ExtGenPareto class docstring still said the support_point "is the median"; it
now falls back to the underlying GPD median when the ExtGPD median rounds onto mu.
Updated the class docstring to: the ExtGPD median when representable, otherwise the
GPD-median fallback.

Softened the transform's "finite initial point for every kappa > 0" claim. That
holds at ordinary scales, but the fallback mu + O(sigma) still rounds back to mu in
the general representability limit where the support has no distinct interior point
-- sigma far below ulp(mu) (e.g. mu=1e12, sigma=1e-12), or a bounded xi < 0 whose
whole width is sub-ULP -- where the initial logp is -inf. Noted as the (parameter-
agnostic) representability caveat it is.

tests/distributions/test_continuous.py: 94 passed; ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Generalized Pareto distribution family to pymc_extras, including a base GPD (GenPareto) and an extended carrier version (ExtGenPareto), along with API exports, documentation entries, and an extensive test suite validating logp/CDF/CCDF/ICDF behavior, boundary handling, RNG behavior, and default transforms.

Changes:

  • Implement GenPareto / ExtGenPareto distributions with stable logp/logcdf/logccdf/icdf kernels, inverse-CDF SymbolicRandomVariable RNG ops, and a probability-integral default transform.
  • Export new distributions via pymc_extras.distributions and include them in the Sphinx API reference.
  • Add comprehensive tests for correctness, boundary cases, tail stability, transform behavior, and sampling behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
pymc_extras/distributions/continuous.py Implements GenPareto/ExtGenPareto math, RNG ops, distribution wrappers, and PIT default transforms.
tests/distributions/test_continuous.py Adds extensive validation tests for the new GPD family (logp/CDF/CCDF/ICDF, transforms, RNG, edge cases).
pymc_extras/distributions/__init__.py Re-exports GenPareto and ExtGenPareto in the public distributions namespace.
docs/api_reference.rst Adds GenPareto and ExtGenPareto to the autosummary API reference list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py
Comment thread tests/distributions/test_continuous.py Outdated
maresb and others added 9 commits June 1, 2026 22:24
…idate s

Numerical-robustness pass for the bounded (xi < 0) tail, where the family's only
data-dependent quantity s = 1 + xi*z cancels toward 0 as value approaches the wall
mu - sigma/xi. The log-density *value* survives (it is ~|1+1/xi|*log s), but its
sigma/xi gradient carries ~z/s terms that inherit s's lost low bits, so they are
accurate only to ~ulp/s. This is a representability limit of the float64 input
value, not of the formula -- so this commit documents and pins it rather than
pretending to recover precision the inputs no longer carry.

* Consolidate: form (t, log_s) = (xi*z, log(1 + xi*z)) once per call via a new
  _gpd_tail(z, xi) and thread it through logp/logcdf/logccdf for both families, so
  the single cancellation site is explicit and the support mask, density and
  survival never re-derive s inconsistently. Pure refactor -- bit-identical to the
  prior graph on a boundary-spanning grid (verified by hash); all existing tests
  unchanged.

* Tests (TestGenParetoBoundaryPrecision): sweep value -> the xi<0 wall for
  xi in {-0.05, -0.3, -0.7} and margins s in {1e-2 .. 1e-12} against a 100-digit
  decimal reference built from the exact s. Assert logp & d/d(sigma,xi,kappa) stay
  finite, correctly signed and monotone in s; the logp value stays accurate to
  <1e-4 relative at every margin; the sigma/xi gradient matches to the achievable
  ~ulp/s bound (and to machine precision far from the wall); and the kappa gradient
  -- whose carrier term has no 1/s factor -- stays exact even at the wall.

* Docs: a precision note on GenPareto (and a cross-reference + the kappa-exempt
  detail on ExtGenPareto) stating the relative-precision limit near the wall and
  pointing boundary-critical callers to parameterize in terms of the margin s.

tests/distributions/test_continuous.py: 98 passed; full tests/distributions/:
233 passed, 8 skipped; ruff check + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/ExtGenPareto

The previous Examples used `sigma ~ HalfNormal`, `xi ~ Normal` -- a free (sigma,xi)
pair, which is exactly the parameterization that diverges as the support wall
mu - sigma/xi approaches max(data) (the support edge depends on the parameters).

Replace both Examples with the idiomatic, sampler-stable peaks-over-threshold
recipe: keep natural priors on the GPD scale (Exponential) and shape, but bound
the shape below by the data-in-support constraint,

    max(data) < mu - sigma/xi   <=>   xi > -sigma / (max(data) - mu),

with no upper bound (xi >= 0 is an unbounded tail that always contains the data).
A TruncatedNormal on xi with that derived lower bound lets NUTS sample the whole
shape range -- bounded and unbounded tails alike -- with zero support-wall
divergences (its transform stretches the wall to infinity in unconstrained space),
recovering the SciPy MLE; verified for GPD and ExtGPD across xi in [-0.9, +0.3],
mu fixed or free. GPD parameters are named pareto_* to keep them distinct from the
mu/sigma *of the prior distributions*. A note flags that this is sampling
stability, separate from the deeper float64 precision limit right at the wall.

Docs only; no code change. ruff check + format clean; tests unchanged (98 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ppa logp

test_extgenpareto_transform_stays_in_support_for_tiny_kappa asserted the
transformed logp is not NaN at kappa in {1e-20, 1e-100, 1e-300}. At that depth the
carrier excess underflows to exactly 0, so the quantile collapses onto mu, where a
kappa < 1 GPD density diverges (logp = +inf). PyMC forms the transformed logp as
logp(backward(y)) + log_jac_det(y) = (+inf) + (-inf) -- a genuine indeterminate
that pytensor resolves to -inf on Linux but NaN on Windows (the CI failure), the
documented sub-~1e-15 kappa floor (a numerical point mass at mu), not a defect the
transform can resolve. Practical kappa keep the excess nonzero (e.g. kappa=1e-2
gives a ~4e-218 excess) so x stays > mu and the transformed logp is finite -- the
exact-across-kappa test (kappa >= 1e-2) is unaffected.

Drop the logp assertion; keep the robust, meaningful guarantee this test exists for
(the regression for the earlier negative-excess bug): the quantile is finite and
>= mu (in support) at every probed y. Comment explains why the logp is not asserted.

tests/distributions/test_continuous.py: 98 passed; ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An Exponential prior on kappa has its mode at 0, so it does not regularize away
from kappa -> 0, where the ExtGPD collapses toward a point mass at mu and sigma can
run off to compensate (a kappa->0, sigma->inf ridge). Gamma(2, 1) has density -> 0
at the origin (mode at 1, mean 2), keeping kappa off that degenerate corner without
being informative about the bulk. Verified: 0 divergences, kappa recovered and held
well away from 0 (posterior min ~0.75) across xi in [-0.5, +0.2].

Docs only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kappa tails

Addresses three review findings on the (Ext)GPD tail/boundary numerics.

1. Transform NaN for small kappa (blocker). A latent ExtGenPareto with
   kappa < ~|y|/745 (e.g. kappa=0.01 at y=-10, not just absurdly small kappa) hit
   a NaN transformed logp: the carrier excess underflows, the quantile collapses
   onto mu, and the kappa<1 density diverges there (logp(mu)=+inf), so PyMC's
   logp(backward(y)) + log_jac_det(y) became +inf - inf (-inf on Linux, NaN on
   Windows -- the prior CI failure). Fix: the PIT transformed density is exactly
   Logistic(y) for all parameters, so log_jac_det now returns that target directly
   (-softplus(y) - softplus(-y) - logp(backward)); the logp terms cancel and
   backward floors x just inside the open support so logp(backward) stays finite.
   The transformed logp is now finite and exactly Logistic over all of R -- this
   also removes the old heavy-tail (xi>0) "overflow past y~709/xi" limitation;
   only the *recovered quantile* backward(y) saturates to +inf / the endpoints
   there (the latent value is genuinely unrepresentable, which is honest).
   Verified: latent ExtGenPareto(kappa in {0.5, 0.05}) samples with 0 divergences,
   no NaN, in support; integrate-to-1 holds for bounded/unbounded, GPD/ExtGPD.

2. logccdf NaN at huge finite kappa. The tail series formed (kappa-1)(kappa-2),
   which overflows float64 (~1e310) for kappa>~1e155 and then multiplies an
   underflowed S^2 -> NaN. Keep only the first-order correction (single kappa
   factor): the tail branch is taken when kappa*S < e^-30, so the dropped
   second-order term is < e^-60, below the result's float64 precision -- bit-
   identical for ordinary kappa, finite and correct for kappa up to 1e300.

3. Examples: assert pareto_mu < xmin (strict). For kappa < 1 the density diverges
   at x = mu, so a datum exactly at the threshold gives +inf logp; exceedances are
   strictly above the threshold anyway.

Docs: rewrote the transform's "Range of validity" -- the transformed logp is always
finite Logistic; it is only x = backward(y) that saturates in the far tails.
Tests: heavy-tail test now asserts finite-Logistic-with-overflowing-quantile;
small-kappa test restored to assert finite == Logistic (down to kappa=1e-300);
large-kappa logccdf test extended to 1e155 / 1e300.

tests/distributions/test_continuous.py: 101 passed; full tests/distributions/:
236 passed, 8 skipped; ruff + format + doctest clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tear-down round. Three real defects in 652a865.

1. logccdf tail expansion was mis-gated. The tail series is a Taylor expansion in
   *both* S_gpd and kappa*S_gpd, but it was gated only on kappa*S_gpd < e^-30 --
   true for tiny kappa even when S_gpd ~ 1 (the body), so it used the wrong
   asymptotic there (log(kappa)+log(1-H) instead of log(kappa)+log(-log H)); e.g.
   kappa=1e-100 at x=0.01 was off by ~1.1. Now gate on BOTH a < -30 (S_gpd small)
   AND log(kappa)+a < -30, so the body uses the exact generic branch. The
   correction is written in r = kappa*S_gpd and s = S_gpd (with the quadratic
   r^2-3rs+2s^2 term) -- bounded, no kappa^k powers -- so it is valid where both
   are small and never overflows. New test pins small-kappa body values to the
   exact reference; the >=1e155 large-kappa cases stay finite.

2. The "finite everywhere" PIT logp relied on the optimizer cancelling
   logp(backward) against log_jac_det; an unoptimized (FAST_COMPILE) graph
   evaluated -inf + inf = NaN at the unreachable upper saturation (xi>0 quantile
   overflow at y~709/xi; the bounded wall). Fix: floor the subtracted logp at
   -1e300 in log_jac_det, so there the term stays finite and the transformed logp
   is -inf (robust, mode-independent), never NaN -- without relying on cancellation.
   No reachable x has logp below ~-1e3, so the floor never perturbs a valid value.
   The transformed logp is exactly Logistic for the whole reachable range and -inf
   only at the unreachable saturation; claim corrected accordingly. New test checks
   no NaN and == Logistic across the reachable range under FAST_COMPILE, for
   bounded/unbounded GPD and small-kappa ExtGPD.

3. The lower floor underflowed. floor = |mu|*8eps + sigma*1e-300 vanished for tiny
   sigma (e.g. sigma=1e-50 -> 1e-350 -> 0), so x stayed at mu and logp(mu) = +inf
   for kappa<1 -> NaN. Use a fixed absolute term: floor = |mu|*8eps + 1e-300 (1e-300
   is a normal double; covers mu=0 and tiny sigma).

The kappa<1 lower collapse (the blocker) is now robust even in FAST_COMPILE
(logp(backward) finite via the floor, so the cancellation is between finite numbers).

tests/distributions/test_continuous.py: 105 passed; full tests/distributions/:
240 passed, 8 skipped; ruff + format + doctest clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… docs

Follow-up review. The logccdf series fix stood; these address the rest.

1. Stale docstrings (the comment that re-breaks the next person). The logccdf
   docstring still described the OLD gating ("keyed on log(kappa)+a ... correct for
   any kappa") and the old single-variable series; rewritten to match the code --
   gated on BOTH a < -30 and log(kappa)+a < -30, correction in r = kappa*S and
   s = S, with why (the body needs the generic branch: log(kappa)+log(-log H), not
   log(kappa)+log(1-H)). A stale "first-order single-kappa-factor" test comment and
   the over-broad transform "covers every parameter" claim are corrected too.

2. dtype portability (float32). The transform floors were hard-coded float64:
   1e-300 casts to 0 under float32 (and -1e300 to -inf), so the small-kappa
   collapse left logp(mu) = +inf and the transformed logp NaN in an unoptimized
   float32 graph. Now dtype-aware: the lower floor is |mu|*8*eps + finfo.tiny, and
   log_jac_det's guard uses finfo(dtype).eps. New float32 test (kappa = 1e-20).

3. Scale: mu != 0, sigma << ulp(mu). The lower floor lifts x by ~ulp(mu), so for a
   near-delta sigma the floored x sits at z ~ 1/ulp and the Logistic residue is lost
   to cancellation -- previously a silent wrong finite value. log_jac_det now
   returns -inf wherever |logp(backward)| > 1/sqrt(eps) (the cancellation would lose
   more than half the digits), so this regime -- and the unreachable upper
   saturation -- give a clean -inf reject, never a wrong finite or a NaN, and
   without relying on the optimizer cancelling logp. The "finite and Logistic"
   claim is narrowed to the representable range; the limits are documented.

New tests: small-kappa logccdf body vs exact reference; transformed logp under
FAST_COMPILE (no NaN, == Logistic) over the reachable range; upper saturation and
degenerate near-delta give -inf not NaN; float32 small-kappa collapse is finite.

tests/distributions/test_continuous.py: 107 passed; full tests/distributions/:
242 passed, 8 skipped; ruff + format + doctest clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…non-finite kappa

Addresses the remaining review findings on fa003da.

- Non-finite kappa (finding 4): kappa = inf passes the `kappa > 0` check but is
  not a valid shape, and leaked inconsistent nan / -inf / 0 across logp /
  logcdf / logccdf / icdf. Extend `_propagate_nonfinite_shape` with a `kappa`
  arg and map result -> nan wherever kappa is non-finite, so all four agree.
  (kappa = nan is already caught by `kappa > 0`, which raises.) New test
  `test_nonfinite_kappa_propagates_consistently`.

- Transform contract (findings 2 & 3): add a "Bijectivity and the saturated
  readout" paragraph stating plainly that once the floor/wall binds the map is
  no longer invertible -- `forward(backward(y)) != y`, the saturated `backward`
  has derivative 0, and `log_jac_det` is then the exact-PIT density correction
  (log F + log S - logp(x)), NOT the Jacobian of the floored readout. The
  transformed density the sampler targets is still the correct Logistic; only
  the recovered latent x is quantized at the boundary. New test
  `test_transform_is_a_saturated_readout_when_the_quantile_collapses` proves
  both y collapse to the same x (so forward(backward(y)) != y) while the
  transformed logp stays Logistic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e NaN in icdf endpoints

The previous round's non-finite-shape propagation was bypassed by the PIT
transform: log_jac_det returns `logistic - logp(x)` and the framework adds
`logp(backward(y))`, so the optimizer cancels the two `logp(x)` terms. For a
non-finite xi/kappa that `logp(x)` is NaN, and the cancellation made the NaN
disappear -- the transformed latent logp came back as a *finite* Logistic,
silently accepting an invalid parameter as a valid latent.

Fixes, all matching PyMC's own conventions (verified against pm.Normal /
pm.Gamma / pm.StudentT):

- log_jac_det: route a NaN `logp(x)` through a switch whose other branch is a
  bare constant (no `logp_x` to cancel), so the transformed logp stays NaN for a
  non-finite shape. PyMC's check_start_vals then rejects it loudly -- exactly as
  pm.Gamma(alpha=inf) / pm.StudentT(nu=inf) do (their transformed latent logp is
  also NaN, not a spurious finite density).

- icdf (both GPD and ExtGPD): wrap the result in `_propagate_nonfinite_shape` so
  the q = 0 / 1 endpoints propagate a non-finite shape as NaN too, instead of
  handing back mu / the upper bound while the interior is NaN.

- sigma = inf: confirmed it already matches pm.Normal's scale-of-infinity
  convention (logp = -inf, a clean reject, not a finite-density leak). The shape
  inf -> NaN / scale inf -> -inf asymmetry is PyMC's, not ours; documented and
  tested rather than diverged from.

Tests: a parametrized regression that the transformed logp is non-finite (and
check_start_vals raises SamplingError) for xi=+-inf and kappa=inf on both
classes; icdf endpoint coverage added to the non-finite xi/kappa tests; an
infinite-scale test asserting parity with pm.Normal. test_continuous 114 passed;
full tests/distributions/ 249 passed, 8 skipped; ruff + format + doctest clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xi=-inf

Test sharpness only (no behaviour change). The transformed logp for a non-finite
shape is specifically NaN (verified across all cases), so assert np.isnan rather
than `not np.isfinite` -- a future regression that turned it into a -inf clean
reject would otherwise pass. Also add GenPareto xi=-inf to the matrix so both
signs are covered for the no-kappa family.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py
Comment thread pymc_extras/distributions/continuous.py Outdated
Comment thread tests/distributions/test_continuous.py
Comment thread tests/distributions/test_continuous.py
maresb and others added 6 commits June 2, 2026 14:53
The integrand is the smooth, exactly-Logistic transformed density, so the
quadrature error is dominated by the +-30 tail truncation (~2e-13), not the grid
spacing -- 2001 points already integrate to ~1e-13, far inside the 1e-3
tolerance, without evaluating the compiled function 30k times in a Python loop
(x5 parametrized cases).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _ExtGenParetoPIT._excess_from_y: the bulk/tail crossover was a hard-coded float64
  700.0. Under float32 that routed y in ~[88, 700) through the bulk branch where
  log F_ext has already underflowed, blowing the excess (and transformed logp) up
  to +inf well inside the reachable range. Derive the cutoff from
  finfo(value.dtype) instead -- ~700 for float64 (min(700, 708.4 - 8) = 700, so
  float64 is bit-identical), ~80 for float32. New regression
  test_excess_from_y_upper_tail_is_finite_under_float32.

- test_latent_sampling_stays_in_support: this runs real NUTS x3 cases; cut
  draws/tune 200/300 -> 100/200 and force cores=1 so it stays cheap and
  deterministic in CI (the clean PIT geometry still gives 0 divergences in support).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Profiling tests/distributions/ showed our GPD/ExtGPD tests were ~55% of the
suite runtime, dominated by (a) tests that recompiled a fresh pytensor function
on every .eval() inside a loop, and (b) one real-NUTS test that compiled the
sampler three times. Both are compile-bound, not compute-bound.

- test_nonfinite_xi / test_nonfinite_kappa: probe all points in one vectorised
  eval per method (value/q as a vector) instead of recompiling per (point,
  method). ~93 compiles -> ~9 for the xi test (2.9s -> 1.8s).
- test_logp_logcdf_at_infinity / test_icdf_endpoints: batch the three xi into one
  dist so each method compiles once instead of once per xi.
- test_latent_sampling_stays_in_support: sample the three geometries (heavy,
  exponential, bounded) as three independent latents in ONE model. The cost is
  the per-model sampler compile, not the draws, so one combined model is far
  cheaper than three parametrized cases and still checks each geometry stays in
  support with zero divergences.

Net: tests/distributions/ 57s -> 52s; test_continuous.py 34.6s -> 30.6s. All
pass; ruff + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uplicate params

The transform suite had four tests all asserting "transformed logp == Logistic(y),
finite where representable" -- finite-in-tails, deep-into-tail, exact-across-kappa,
small-kappa -- accumulated one per review round (~17 parametrized cases). Fold them
into one parametrized test with one row per *distinct* regime (saturation at y~37,
the y~745 deep-tail underflow, the y~709/xi overflow boundary, large kappa, the
kappa<1 collapse onto mu), so each mechanism is still guarded but with 8 model
builds instead of 17. The merged test also checks the recovered quantile stays in
support (x >= mu), which the deep-tail cases previously skipped.

Also drop two duplicate parametrize cases on kept tests: logccdf large-kappa 1e6
(subsumed by 1e20; the overflow guards are 1e155/1e300) and heavy-tail xi=3.0
(bracketed by 1.5 and 5.0).

Net: our tests 102 -> 89; tests/distributions/ 249 -> 238. The wall-clock gain is
small (~1s) -- the per-test cost is a fixed pytensor compile the merged test still
pays per case -- but the suite is less redundant and each transform regime is now
documented in one place.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two small per-test-compile reductions found while profiling:

- test_nonfinite_shape_does_not_leak...: the parametrized (x5) test recompiled the
  model + initial point for check_start_vals on every case. The NaN-logp check is
  the real per-shape assertion; the loud-init symptom is one mechanism, so check it
  once in a separate test instead of five times.
- test_nonfinite_xi_propagates_consistently: drop the -inf case. The guard is
  pt.isfinite(xi), which is sign-agnostic, so -inf exercises the exact same path as
  +inf; nan + inf already cover both IEEE kinds.

Profiling note (not a code change): the suite is pytensor-compile-bound, so the
only large lever is parallelism -- tests/distributions/ runs ~50s serially vs
~18s under `pytest -n auto` (xdist, already a dev dependency). The single longest
test is the real-NUTS test_latent_sampling_stays_in_support (~5s of sampler
compile), which becomes the long pole once tests run in parallel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sform merge cases

- TestGenParetoBoundaryPrecision pins a documented *numerical limit* (the sigma/xi
  gradient degrades to ~ulp/s at the xi<0 wall), not a feature. The s-cancellation
  mechanism is xi-independent, so one representative xi=-0.3 over a margin sweep
  (1e-2 .. 1e-12) documents it fully: the largest margin is the machine-accurate
  anchor away from the wall (subsuming the separate far-from-boundary test, now
  dropped), the smallest is the ~ulp/s limit. Drops 3 of the 4 instances of the
  slowest non-NUTS group (gradient-graph + Decimal compiles).

- Give the consolidated test_transformed_logp_is_logistic_where_representable
  descriptive parametrize ids (bounded-gpd, ext-collapse, ...) so a failure names
  the regime instead of reporting opaque [builder3-kwargs3-...] indices -- the one
  real downside of having merged the four tail/kappa transform tests.

tests/distributions/ 238 -> 235 tests, all pass; ruff + format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maresb maresb changed the title Add gpd extgpd distributions Add GenPareto and ExtGenPareto (Generalized Pareto) distributions Jun 2, 2026
_uniform_draw was a two-call-site one-liner around
`uniform(size=size, rng=rng, return_next_rng=True)` whose only content was a
docstring. Inlining it at the two rv_op sites is the same length and clearer --
the reader sees the idiom instead of chasing a helper for it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maresb maresb marked this pull request as ready for review June 2, 2026 22:08
@maresb
Copy link
Copy Markdown
Collaborator Author

maresb commented Jun 2, 2026

@ricardoV94, this is a monster, but it's been battle-tested. I've also tried to ensure that all your feedback from previous PRs has been taken into account.

@maresb maresb requested a review from ricardoV94 June 2, 2026 22:09
@maresb maresb force-pushed the add-gpd-extgpd-distributions branch from 797ef29 to 9198af3 Compare June 2, 2026 22:15
maresb and others added 6 commits June 3, 2026 06:47
The bounded-POT example priors bounded xi only from below by the data-fit
constraint -sigma/(xmax-mu). That permits xi <= -1 whenever sigma > xmax-mu (which
Exponential(1) always allows), and for xi <= -1 the GPD density diverges at the
upper endpoint -> unbounded likelihood -> improper posterior. The sampler runs
away: on small-range exceedances the chain pins the upper wall onto max(data) with
xi well below -1 (observed 47 GPD / 18 ExtGPD divergences, xi down to -5.7, wall
margin ~1e-9). Fix: floor the prior at xi > -1, the standard regularity condition,
via lower=pm.math.maximum(-1.0, -sigma/(xmax-mu)). With the floor the posterior is
proper and the runaway is gone (divergences 47->2 / 18->4, no xi < -1).

Also document why mu is held fixed: the lower endpoint is mu, where a kappa < 1
ExtGPD density diverges. Fixing the threshold keeps that singularity away from the
data; a free mu would slide up to min(data) and sit on it (the lower-endpoint
mirror of the xi <= -1 wall). The GPD endpoint density is finite, so a free mu
there is only the harder 3-parameter problem, not a blow-up.

Docstring examples only; no kernel change. The kernels are mathematically correct
(the densities genuinely diverge in those regimes) -- the flaw was the example
priors permitting the non-regular region.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…u caveat

The example priors used Exponential(1) on sigma, the one dimensional parameter.
That hard-codes sigma ~ O(1), so the dimensionless xi inference is not invariant to
the data's units: rescaling the same data x1->x1e6 drove xi-hat from 0.33 to 8.84
(true 0.2), with clean sampling -- silent, confident garbage. A log-scale prior
(LogNormal(0, 10)) makes xi invariant to scale across ~9 orders of magnitude
without the prior depending on the data (the Jeffreys 1/sigma limit is the exact,
improper version).

Also correct the GPD free-mu caveat: a free threshold is NOT safe for the GPD --
it has the classic three-parameter unbounded likelihood (mu -> min(data), sigma ->
0) whenever xi > n - 1. ExtGPD just hits it far more easily (mu -> min(data) alone
for kappa < 1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_ExtGenParetoPIT._excess_from_y used the asymptotic S_F ~ S_ext/kappa in the
tail branch, valid only when exp(-y)/kappa << 1. For subnormal kappa that ratio
is O(1) at representable y (e.g. y=710, kappa=4e-309: exp(-y)/kappa ~ 1.1), so it
returned a negative "excess" (-0.112 vs the true 0.395), collapsing backward onto
the floor and breaking the round-trip (forward(backward(710)) = 703.5, not 710).

Invert the carrier exactly instead: a = -log(F_ext)/kappa, m = -log(1 - exp(-a)),
carried as log_a = -t + log(_log1p_div(-S_ext)) - log(kappa) so it survives S_ext
underflow, with the a->0 (m ~ -log_a) and a->inf (m ~ 0) limits split out. This
preserves the large-kappa tail (m ~ y + log kappa) and fixes the subnormal case;
regression test asserts excess(710, 4e-309) ~ 0.3954 and a clean round-trip.

Doc fixes: the GPD wall density diverges for xi < -1, not xi <= -1 (xi = -1 is the
finite uniform boundary); the LogNormal(0,10) sigma prior is robust across scales,
not scale-invariant (its location is fixed in the data's units).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maresb content looks great, delivery leaves a lot to be desired.

Can we:

  1. Move functionality to new file to keep it organized (can still allow importing from distributions/continuous)
  2. Aggressivepy pair down comment and docstring verbosity. When everything is emphasized ad nauseum nothing is. If a human or llm can infer what the comments are saying from the code we don't need it. Keep what's actually tricky/hard to recover.
  1. Measure the total CI time incurred by the new tests? It's hard to gauge locally because of caching/different backends/GA being single core by default. Gotta assess if reasonable.
  2. Some guards seem overly defensive and beyond what we do elsewhere or have felt the need for. Specially the fear of masking infinite/nan values. May be fine or may ne too much. Need your own opinion here.
  3. Whenever I bash on style I'm bashing on the tool not you. All grievances are my own problem.

)


# ===========================================================================
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no dangling comments in global space. Put them inside class/functions where they make sense.

Assuming they are even needed. They don't seem to be

Only ``xi * z`` needs this: at ``xi = 0`` with an infinite observation the
product is mathematically ``0`` (the exponential GPD carries no shape term),
but ``0.0 * inf`` is ``nan`` under IEEE, and a ``nan`` in the unused branch of
a ``switch`` can still leak (the backend may lower it to ``cond*a + ...``).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What backend does that?

return pt.and_(z >= 0, 1 + t > 0)


def _propagate_nonfinite_shape(result, xi, kappa=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really needed? We don't worry about say inf sigma in normal likelihood and things like that



def ext_gen_pareto_logp(value, mu, sigma, xi, kappa):
"""Pure-PyTensor extended-GPD log-density; out-of-support values map to ``-inf``."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pure PyTensor is such a useless statement. Really can we trim 90% of these comments and docstrings. Just visual noise

wall noted above (reached only when the wall is pinned within ``~1e-13`` of the
data).

The threshold ``mu`` is held fixed -- the standard peaks-over-threshold setup,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this paragraph sound less like llm speech? At least remove the load bearing. I believe there are exactly 0 occurrences of this term in the first 2 decades of PyMC history :)



class _GPDProbabilityIntegralTransform(Transform):
"""Default transform for the GPD family: ``y = logit(F(x))``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting

return pt.switch(pt.isnan(logp_x), np.nan, jac)


class _GenParetoPIT(_GPDProbabilityIntegralTransform):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these classes about/for?


# The Generalized Pareto family legitimately evaluates to +/-inf at the
# (measure-zero) support boundary and at probabilities 0 / 1; NumPy flags those
# as divide-by-zero / invalid / overflow during ``.eval()``. They are the correct
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we suppress these warnings are you sure we are the ones firing them, not the test? if so suppress in the tests with np.errstate context

if xi < 0:
assert np.all(xs <= mu - sigma / xi + 1e-9), f"x{i} (xi={xi}) past wall"

def test_observed_is_unaffected(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is stupid

@ricardoV94
Copy link
Copy Markdown
Member

PyMC's own conventions (e.g. a non-finite shape gives NaN like pm.Gamma, a non-finite scale gives -inf like pm.Normal).

There aren't so strict conventions. Parameter bounds are usually wrapped in a check_parameter_values that can be ignored (Model(check_bounds=False)) or converted to a ninf switch when compile_logp is called.

@maresb
Copy link
Copy Markdown
Collaborator Author

maresb commented Jun 3, 2026

Thanks a lot @ricardoV94 for the speedy and savage review. I really appreciate the time you put in to read over this, and of course I don't take the criticism personally since I didn't really "write" this stuff myself. And probably I should have put in more effort before asking for review, but I had to stop somewhere. (I did put in a lot of time doing fairly rigorous correctness testing though; it's probably not perfect but I think it's close.)

This is probably overengineered, but I was going for correctness first. I actually feel like you're much better qualified to give an opinion on the various nan/inf checks than I am. I'm curious if the checks will get in the way of rewrites.

Regarding the tests, I did some extensive profiling, and I don't think there are likely many obvious wins beyond removing certain tests altogether. I'll reproduce the profiling for you, but from memory, no individual test went beyond a few seconds, and the time to run test_continuous raised from ~25 sec. to ~50 sec. I'll split out both the functions and the tests into separate files to make this easier to see.

Thanks so much for your time and patience, I really appreciate it!!!

@maresb
Copy link
Copy Markdown
Collaborator Author

maresb commented Jun 3, 2026

While some of the complicated numerics were motivated purely by aggressive parameter scans, a very large proportion were actually necessary to preserve stability that was leading to real divergences during sampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants