Add GenPareto and ExtGenPareto (Generalized Pareto) distributions by maresb · Pull Request #689 · pymc-devs/pymc-extras

maresb · 2026-05-31T17:32:26Z

Summary

Adds two continuous distributions to pymc_extras.distributions:

GenPareto — the Generalized Pareto distribution (GPD), the peaks-over-threshold counterpart of GenExtreme. Parametrized as in Coles (2001) / SciPy's genpareto (c = ξ).
ExtGenPareto — the extended GPD of Naveau et al. (2016): F = H**κ with H the GPD CDF, adding a lower-tail shape κ while ξ keeps controlling the upper tail (κ = 1 recovers the GPD).

Each provides stable pure-PyTensor logp / logcdf / logccdf / icdf kernels, a SymbolicRandomVariable (inverse-CDF sampling), a support_point, and a probability-integral default transform so they work as latent variables. Both are exported from pymc_extras.distributions and added to the Sphinx API reference.

Supersedes #638

This reimplements #638 following the review there: the RVs are now SymbolicRandomVariables doing inverse-CDF sampling, so the random methods work across backends without SciPy. (The original GPD attempt was #294.)

Earlier review feedback addressed

Implement the RV as a SymbolicRandomVariable / inverse-CDF (Add Generalized Pareto (GPD) and Extended GPD distributions #638) — done for both distributions.
Add a docs API entry (Implement Generalized Pareto distribution #294) — both classes are in docs/api_reference.rst.
Don't skip the out-of-support edge test (Implement Generalized Pareto distribution #294) — parameter domains use real edges (no skip_paramdomain_outside_edge_test); boundary behaviour is covered explicitly in TestGenParetoBoundaries.
support_point only needs a finite-logp point (Implement Generalized Pareto distribution #294) — returns the median (finite for every ξ, unlike the mean, which diverges for ξ ≥ 1), broadcast over size.
The ξ = 0 limit (Implement Generalized Pareto distribution #294) — the whole family is routed through log1p(ξz)/ξ, keeping value and gradient C¹ through ξ = 0 (regression-tested).

Numerical behaviour

The kernels stay finite and accurate deep into the tails and near the bounded ξ < 0 wall (down to the documented float64 representability limits); the probability-integral default transform avoids the ξ = 0 Jacobian kink an Interval transform would introduce (the headline reason for it); and boundary / non-finite-parameter behaviour follows PyMC's own conventions (e.g. a non-finite shape gives NaN like pm.Gamma, a non-finite scale gives -inf like pm.Normal). All of this is covered in tests/distributions/test_continuous.py.

…stributions Adds two continuous distributions to pymc_extras.distributions, structured in three layers so the probability math is forward-facing and reusable: 1. Pure PyTensor functions (gen_pareto_logp/_logcdf/_icdf and the ExtGenPareto variants) that depend only on pytensor, so they can be reused directly or lifted into pytensor-distributions unchanged. 2. SymbolicRandomVariable Ops that sample by inverse-CDF on a uniform draw, so the random methods work on every backend without a SciPy object-mode path. 3. Thin Continuous distribution classes wiring the two together. The xi -> 0 limit (GPD collapsing to the exponential) is handled with C1-smooth log1p(u)/u and expm1(u)/u helpers instead of switch(isclose(xi, 0)), keeping both the value and its gradient continuous through the limit -- the property NUTS relies on near xi = 0. ExtGenPareto is the Naveau et al. (2016) extended GPD with carrier G(v) = v**k; kappa = 1 recovers the plain GPD. Includes logp/logcdf/icdf checks against scipy.genpareto and numpy references, support-point (median) tests, RV size/param and KS goodness-of-fit tests, and explicit C1-smoothness-through-xi=0 tests. Registers both in distributions/__init__.py and docs/api_reference.rst. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pymc-extras groups distribution definitions by kind (all continuous distributions live in distributions/continuous.py, discrete in discrete.py, etc.), not one module per distribution. The initial commit added a standalone generalized_pareto.py / test_generalized_pareto.py, which broke that taxonomy. Fold both distributions (their RV Ops, pure-PyTensor helper functions, and tests) into continuous.py / test_continuous.py alongside GenExtreme, the closest sibling. No behavioural change: the three-layer structure (pure functions -> SymbolicRandomVariable -> Continuous class) is preserved verbatim, and the test additions append to the existing module, leaving the GenExtreme/Chi/Maxwell tests untouched (the module-level pytestmark gains only filterwarnings entries). Full tests/distributions/ suite: 169 passed, 8 skipped; ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-05-31T17:38:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (86fac3c) to head (a22433e).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #689       +/-   ##
===========================================
+ Coverage   51.60%   90.42%   +38.81%     
===========================================
  Files          73       73               
  Lines        8003     8268      +265     
===========================================
+ Hits         4130     7476     +3346     
+ Misses       3873      792     -3081

Files with missing lines	Coverage Δ
pymc_extras/distributions/__init__.py	`100.00% <100.00%> (ø)`
pymc_extras/distributions/continuous.py	`98.78% <100.00%> (+42.44%)`	⬆️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Acts on an external review of the GenPareto / ExtGenPareto distributions: 1. Boundary numerics. - ``_safe_mul`` maps the IEEE ``0 * inf -> nan`` of ``xi * z`` back to ``0`` so the xi = 0 path is finite at x = inf. - logp / logcdf pin ``z = +inf`` explicitly to ``-inf`` / ``0`` (for xi > 0 the in-support branch is ``log1p(inf)/inf -> nan``). - ``_gpd_upper_bound`` gives ``icdf`` the finite upper endpoint ``mu - sigma/xi`` at ``q = 1`` for xi < 0 (was ``nan`` from ``inf * 0``); ``q = 0`` returns ``mu``. - icdf evaluates its interior formula on a safe value so the discarded q = 0/1 branch never produces a nan that the graph rewriter can surface. 2. Scalar icdf. ``value`` is coerced with ``pt.as_tensor_variable`` and the methods use PyMC's ``check_icdf_value`` / ``check_icdf_parameters``; direct calls like ``GenPareto.icdf(0.0, ...)`` no longer raise ``TypeError``. q outside [0, 1] returns ``nan`` (the standard PyMC icdf contract). 3. Pure-PyTensor kernels. ``gen_pareto_*`` / ``ext_gen_pareto_*`` no longer call ``check_parameters``; validation lives entirely in the ``Continuous`` wrapper methods, so the kernels are genuinely reusable (e.g. in pytensor-distributions). 4. Docs. Register GenPareto and ExtGenPareto in docs/api_reference.rst. 5. Tests. Stop skipping the parameter outside-edge test: xi carries explicit ``(None, None)`` edges (unconstrained, no invalid probe) while sigma / kappa keep 0 / inf sentinels so the harness probes sigma <= 0 and kappa <= 0 (must raise). The xi grid stays within (-1, 1]: at xi <= -1 the GPD has a finite density at its closed upper endpoint (a measure-zero point the open-support convention here legitimately handles as -inf), and at xi > 1 the q = 0.99 quantile (~1e10) exceeds ``check_icdf``'s absolute tolerance though it is correct to ~1e-15 relative. Scope the FPE RuntimeWarning filters to the GPD test classes via per-class ``pytestmark`` instead of muting the whole module. Add ``TestGenParetoBoundaries`` covering x = inf, q = 0, q = 1, q outside [0, 1], and invalid sigma / kappa. tests/distributions/test_continuous.py: 39 passed; full tests/distributions/: 174 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…heavy tail 1. ``_safe_mul`` now repairs ONLY the indeterminate ``0 * inf`` (the two exact ``{0} x {+-inf}`` cases) instead of every NaN product. A genuine ``nan`` parameter (e.g. ``xi = nan`` from bad input) again propagates instead of being silently turned into the ``xi = 0`` exponential branch. Added ``test_nan_xi_does_not_become_exponential``. 2. Documented the open-support convention explicitly. The Support table lists the bounded tail as closed, but the implementation returns ``-inf`` / ``0`` at the xi < 0 right endpoint. A Notes section on GenPareto (referenced from ExtGenPareto) now states this is exact as a density limit for -1 < xi < 0 and only differs from SciPy at the measure-zero boundary for xi <= -1. 3. Added ``TestGenParetoHeavyTail`` with *relative*-tolerance checks for xi > 1 (the infinite-mean regime the absolute-tolerance ``check_icdf`` harness can't reach): logp / logcdf / icdf vs SciPy at rtol, plus median ``support_point`` for GenPareto and ExtGenPareto where the mean is infinite. 4. Softened the migration wording: the pure functions are portable math kernels (density / CDF / quantile), not a full functional API (no pdf/sf/isf/rvs/ moments), so they are "portable kernels", not a distribution "lifted unchanged". tests/distributions/test_continuous.py: 46 passed; full tests/distributions/: 181 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Deep-dive hardening, prioritising portability to pytensor-distributions and following the conventions of existing pymc / pymc-extras continuous distributions. logccdf (log complementary CDF = log survival function): - Add ``gen_pareto_logccdf`` / ``ext_gen_pareto_logccdf`` pure kernels and wire ``logccdf`` classmethods on both distributions. PyMC auto-registers it (like ``logcdf``) and ``pm.logccdf`` now works for both. - The survival exponent is computed directly (``-m`` for GPD; ``log1mexp(kappa * log H)`` for the extended GPD), so it is exact in the heavy upper tail -- the regime where PyMC's generic ``log1mexp(logcdf)`` fallback collapses (logcdf -> 0 there). This is the natural ``logsf`` primitive for a peaks-over-threshold model and the direct counterpart of pytensor-distributions' ``logsf``. - Tests: ``check_logccdf`` vs ``scipy.genpareto.logsf`` (GPD) and a tail-stable reference (ExtGPD), an explicit tail-stability test out to x = 1e8, ``check_selfconsistency_icdf`` (cdf(icdf(q)) == q), and a kappa = 1 reduction. Moments documentation (no methods -- matching the pymc/pymc-extras convention of documenting moments in the Support/Mean/Variance table and exposing only ``support_point``): - GPD table gains Median, Mode and Entropy rows (all closed-form, validated against scipy). - ExtGPD table gains exact closed-form Mean and Variance via the Beta function ``kappa * B(kappa, 1 - xi)`` (valid xi < 1 and xi < 1/2 respectively), Median, and the Mode (T* = (1 + xi)/(kappa + xi) for kappa > 1, else mu), with the xi -> 0 digamma / trigamma limits noted. These are exact Naveau-construction results -- not numerical-truncation approximations -- verified numerically against integration to <= 1e-8. ``support_point`` remains the median. tests/distributions/test_continuous.py: 52 passed; full tests/distributions/: 187 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ms, docs Acts on a third-pass review of the GenPareto / ExtGenPareto distributions. 1. ExtGPD logccdf far-tail collapse (blocker). ``log1mexp(kappa * log H)`` returned -inf deep in the tail because ``log H -> 0`` underflows there, even though the survival is finite. Recomputed from the GPD log survival ``a = log(1 - H) = -m`` (exact in the tail): generic ``log1mexp(kappa * log1mexp(a))`` when ``a`` is not tiny, and a second-order series ``log(kappa) + a + log1p(-(kappa-1)/2 S + ...)`` for ``a < -30``. Verified against 60-digit reference to <= 4e-15 across xi/kappa; e.g. xi=0, kappa=2.5, x=1000 now gives -999.084, not -inf. Factored out ``_gpd_log_S``. 2. Non-finite ``xi`` now propagates as nan (blocker). The support switch masked nan to -inf ("valid parameter, impossible value"). logp now propagates nan, matching logcdf/logccdf and pymc's own distributions (Normal propagates nan for a nan parameter; pymc never uses isfinite parameter checks). Strengthened the test to assert nan across logp/logcdf/logccdf, not merely "non-finite". 3. Default transforms (production blocker). Registered an ``Interval`` default transform on both distributions via ``bounds_fn`` returning ``(mu, switch(xi < 0, mu - sigma/xi, +inf))`` -- a single parameter-dependent expression that covers the heavy (xi >= 0, one-sided) and bounded (xi < 0, two-sided) regimes, and works when mu/sigma/xi are themselves random. Without it a latent GPD samples on all of R (constant -inf below mu). Tests confirm latent sampling stays in support with zero divergences and observed RVs are unaffected. 4. Docs corrected to match the implementation / domain: - Support table now shows the open right endpoint ``mu <= x < mu - sigma/xi``. - GPD/ExtGPD Mode noted as valid only for xi > -1 (for xi <= -1 the supremum is at the right endpoint, not mu). - ExtGPD median wording fixed: it is the GPD quantile at CDF probability H = 2^(-1/kappa) (equivalently survival 1 - 2^(-1/kappa)), not "survival 2^(-1/kappa)". tests/distributions/test_continuous.py: 58 passed; full tests/distributions/: 193 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The previously committed ``Interval`` default transform has a log-Jacobian that is discontinuous in xi at 0: the bounded two-sided (sigmoid) map and the one-sided (exp) map do not connect as the upper endpoint ``mu - sigma/xi`` diverges, so the transformed density has a ~1e12 gradient kink at xi = 0. With xi a *random* variable this triggers divergences -- measured 163/1600 with xi sampled across 0, versus 65 for a lower-bound-only transform. (The earlier "verified zero divergences" only exercised *fixed* xi, where the kink never activates -- a real gap in that check.) Replace it with a probability-integral transform ``y = logit(F(x))``. Because F is the family's own CDF, the transformed prior density is exactly Logistic and free of mu/sigma/xi, so it is C1 in every parameter (no kink anywhere), while the inverse ``icdf`` enforces the correct support -- including the moving upper wall for xi < 0 -- for all xi. Implemented as ``_GPDProbabilityIntegralTransform`` with kappa-aware GenPareto / ExtGenPareto subclasses, registered as the default transform for both. Measured divergences (xi random, 1600 draws): across 0 -> 3 (was 163); xi < 0 -> 0 (was 33); xi > 0 -> 0 (was 33). Validated: transformed density integrates to 1 for every xi, ``forward(backward(y)) == y`` exactly, and the latent posterior mean matches the untransformed target (so the divergence reduction is not from sampling a different distribution). Tests gain a Jacobian-integrates-to-1 check across both classes and a direct gradient-continuity-through-xi=0 assertion (the property an Interval transform fails) -- the regression guard the original check lacked. tests/distributions/test_continuous.py: 64 passed; full tests/distributions/: 199 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… NaN; docs Acts on a fourth-pass review. Blocking -- PIT transform invalid in the positive unconstrained tail. The old ``backward`` did ``icdf(sigmoid(y))``; ``sigmoid(y)`` rounds to exactly 1 for y >= 37 in float64, so ``icdf(1)`` returned inf/the wall and the transformed logp became nan/-inf on valid unconstrained values. Reworked the transform to stay in survival/log space: ``backward`` builds the GPD excess ``m = softplus(y)`` (base family) / ``-log(-expm1(-softplus(-y)/kappa))`` (extended) directly, and ``log_jac_det`` is the analytic ``log F + log S - logp`` rather than autodiffing the quantile graph. The transformed density is now exactly Logistic and finite far into the tail. Isolated root cause for the residual bounded-xi case: once ``backward`` materialises x at the finite wall, ``1 + xi z`` cancels to 0 (its true value is ``exp(-m)``); for xi >= 0 there is no wall and the transform is finite to |y| ~ 1000, while the bounded xi < 0 case is limited by float64 representability near the wall (|y| ~ 70), far past any sampler's reach. High -- ExtGenPareto.logccdf wrong for large kappa. The tail branch fired on ``a < -30`` but assumed ``kappa * S_gpd`` small; for large kappa it returned a positive "log probability". Now keyed on ``log(kappa) + a < -30`` (i.e. on ``kappa * S_gpd``), verified <= 0 and accurate to <= 5e-16 vs an 80-digit reference for kappa up to 1e20. Medium -- NaN propagation made consistent. A non-finite xi now yields nan from logp/logcdf/logccdf at every value (below support, in support, above support) via a shared ``_propagate_nonfinite_shape`` helper, instead of being masked to -inf/0 at the boundaries. Medium -- tests now probe the tails (y up to +-45 bounded, +-400 unbounded) and forward(backward(y)) round-trip, plus a large-kappa logccdf check -- the coverage the previous "integrate over [-30, 30]" tests lacked. Medium -- ExtGenPareto mode docs give the xi = 0 limit (mu - sigma ln T*). tests/distributions/test_continuous.py: 74 passed; full tests/distributions/: 209 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nd xi=+-inf Fifth-pass review cleanup -- no new mechanism, just honesty and a policy fix. PIT transform range of validity (was overclaimed). The framework adds logp(backward(y)) to log_jac_det(y), so the map is only valid where the quantile x = backward(y) is *representable*. For xi <= 0 (no upper wall) x grows at most linearly and the map is exact essentially everywhere (verified past |y| = 1000). For a heavy upper tail x ~ exp(xi * m), so it overflows float64 at roughly y ~ 709 / xi (e.g. y ~ 142 for xi = 5); the bounded xi < 0 case is limited the other way by the wall's ULP near |y| ~ 60. This is a representability limit on x, not a bug, and is far past any sampler's reach -- but the docstring no longer claims the map is finite on all of R for every parameter. Replaced the "finite arbitrarily far" test with one that probes each regime to its real reach and an explicit test that the heavy-tail map overflows only past y ~ 709 / xi. Non-finite xi policy (was inconsistent). ``_propagate_nonfinite_shape`` keyed on ``isnan(1 + xi * z)``, which is finite at x = mu (z = 0) and for several regions when xi = +-inf, so a bad shape leaked the ordinary boundary values there. Now keys on ``isfinite(xi)`` directly, so xi in {nan, +inf, -inf} yields nan from logp / logcdf / logccdf at *every* value -- in support, at the endpoint x = mu, below, and above. Test parametrized over all three non-finite shapes and all four value regions. tests/distributions/test_continuous.py: 79 passed; full tests/distributions/: 214 passed, 8 skipped; ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sixth-pass review. The ExtGenPareto transform's backward built the GPD excess m = -log(S_F) from log F_ext = -softplus(-y). In the upper tail softplus(-y) rounds to exactly 0 near y ~ 745, so log F_ext -> -0, S_F -> 0 and m -> inf even though the quantile is still perfectly representable -- e.g. ExtGenPareto(xi=0, kappa=2) returned -inf at y = 1000 (true x ~ 1000.69) and ExtGenPareto(xi=0.3, kappa=2) returned nan at y = 1000 (well below the y ~ 709/xi ~ 2363 overflow boundary). This was an early underflow, not the documented representability limit. Recover m from the survival side instead. With t = softplus(y) = -log S_ext (stable for all y) and S_ext = exp(-t), S_F = 1 - (1 - S_ext)**(1/kappa) ~ S_ext / kappa in the far tail, so m -> t + log(kappa) - log(_log1p_div(-exp(-t))). That form never builds exp(-y) before a log: when exp(-t) underflows to 0 the C1 helper returns its limit 1 and m = t + log(kappa) stays finite until t itself overflows float64. The exact bulk form (m = -log(1 - F_ext**(1/kappa))) and this tail form agree to O(exp(-t)), so they are switched at t = 40 with both branches evaluated on clamped inputs to keep the discarded branch (and its gradient) finite. Verified against the previous (correct-in-bulk) implementation: ABS error <= 2.2e-16 across kappa in [0.5, 100] wherever both are valid; the switch is C1 (value and gradient match to float64, gradient exactly 1 either side); the reviewer's probes are now exact (y = 1000, xi = 0 -> 1000 + log 2; y = 1e6 still finite); round-trip forward(backward(y)) holds to 2e-13 (xi=0.3) and 1.9e-9 (xi=0) far past y = 1000. Bumped the ExtGenPareto tail tests from y_max 700/400 to 1000 so the formerly-failing probes are now in the standing suite, and tightened the transform docstring's "Range of validity" (xi < 0 is bounded, not "light/exponential"; ExtGPD now reaches as far as the base family). tests/distributions/test_continuous.py: 79 passed; full tests/distributions/: 214 passed, 8 skipped (stable x2); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Seventh-pass review. The ExtGenPareto transform's bulk/tail switch keyed on t = softplus(y) alone, ignoring kappa. The tail approximation S_F ~ S_ext / kappa is only valid when that ratio is small, so for tiny kappa the tail branch was selected too early and returned a *negative* excess -- e.g. ExtGenPareto(xi=0, kappa=1e-20) at y=40 gave an excess of ~ -6, an x below mu, and logp = -inf instead of the Logistic value. Same class of bug as the earlier large-kappa logccdf: the branch condition was on the wrong small quantity. Two changes: 1. Recover the bulk excess with -log1mexp(log F_ext / kappa) instead of -log(-expm1(log F_ext / kappa)). They are algebraically identical, but log1mexp's log1p branch never forms 1 - H, so a tiny GPD survival no longer rounds the excess to 0. This makes the inverse exact across kappa (the bulk now matches the prior implementation to <= 2.2e-16 for every kappa in [0.5, 100], including kappa = 100 where the intermediate form had drifted to ~5e-6), and keeps the excess >= 0 -- in support -- for *all* kappa > 0. The bulk/tail switch moved to value = 700 (the bulk is exact up to the log F_ext underflow near 745, so it now covers every sampler-reachable y for any kappa). 2. Documented the transform's kappa domain on the class. Inverting the carrier needs H = F_ext ** (1/kappa) resolvable from 1, i.e. ~exp(-|y|) of headroom in kappa: exact for kappa down to ~exp(-|y|) (kappa >~ 1e-15 at |y| = 40, >~ 1e-32 at |y| = 80), so any fixed kappa > 0 is exact out to |y| ~ log(1/kappa) -- far past a sampler's reach. Smaller kappa underflows the carrier and the excess rounds toward 0 (x -> mu, still in support, logp -> -inf); it never leaves the support. (Small kappa also has essentially no representable lower tail -- |y| / kappa >~ 745 -- which is the ExtGPD concentrating toward mu.) Tests: parametrized exactness across ten orders of magnitude in kappa (1e-2 .. 1e8, including the kappa < 1 upper tail that used to go out of support), and a regression that tiny kappa (1e-20 .. 1e-300) stays >= mu and never NaN. tests/distributions/test_continuous.py: 87 passed; full tests/distributions/: 222 passed, 8 skipped (stable x2); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Eighth-pass review. Last round fixed only the default transform's excess recovery; ext_gen_pareto_icdf, ExtGenParetoRV.rv_op and support_point still used the unstable -log(-expm1(log q / kappa)) route, so the distribution's quantile, sampler, init point and transform disagreed in exactly the small-kappa regime the previous commit documented. The naive form rounds the tiny GPD survival 1 - q ** (1/kappa) to 1, collapsing the excess to 0 and the quantile to mu. Extracted the stable inverse into a shared helper _ext_gpd_excess_from_log_prob(log_q, kappa) = -log1mexp(log_q / kappa) (whose log1p branch never forms 1 - H) and routed all four paths through it. Concrete effects (mu = 0): * icdf(0.5, kappa=0.01) now returns 7.89e-31 (= 0.5 ** 100), not 0; matches ref_ext_icdf to rtol 1e-9 and the transform's backward(0) exactly. * support_point is the representable median again, so the default initial point has a finite logp (was -inf) wherever the median clears mu in float64. * random draws stop collapsing onto mu: at kappa=0.01 the fraction landing exactly at mu drops from ~0.69 to ~0.001 (kappa=0.001: ~0.96 -> ~0.47, the remainder being genuine mass below the smallest denormal). Ordinary kappa is unchanged (icdf still matches SciPy to ~1e-16; the support_point and KS RNG tests pass untouched). For kappa so small that the median is itself float64-indistinguishable from mu (or mu's ULP swamps the tiny excess) the distribution is a numerical point mass at mu and the init logp is -inf -- the same representability floor documented for the transform's kappa domain, now shared consistently by all four inverses rather than disagreeing. Tests: parametrized agreement of icdf / transform-backward / support_point / init logp for small kappa, and a draws-do-not-collapse regression. tests/distributions/test_continuous.py: 91 passed; full tests/distributions/: 226 passed, 8 skipped (stable x2); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ninth-pass review. support_point returned the ExtGPD median; for small kappa that median is sub-ULP from mu (or the tiny excess is swamped by mu's ULP), so it rounds onto mu, whose transformed value is -inf -- a broken default initial point for a valid parameter such as kappa = 1e-4. A support point only has to be a usable initialization, not the exact median when the median is unrepresentable. When the median collapses onto mu, fall back to the underlying GPD median (carrier H = 1/2, excess = log 2) -- a higher ExtGPD quantile (F = 0.5 ** kappa) that is representably interior (mu + O(sigma)) for any kappa. Because the transform's forward map is logcdf - logccdf in log space (never logit(F) with F rounded to 1), the resulting initial logp is finite across the whole kappa > 0 domain (e.g. kappa = 1e-300 starts at y ~ 691, logp ~ -691, not -inf). At kappa = 1 the fallback equals the ExtGPD median, and for ordinary kappa the median never collapses, so support_point is unchanged there (the existing support_point and the small-kappa median tests pass untouched). Documented on the transform that the kappa underflow floor affects only deep-tail *evaluation* accuracy, not the initial point, which is now finite for all kappa. Tests: support_point falls back to the GPD median and yields a finite initial logp for kappa in {1e-4, 1e-8, 1e-300}. tests/distributions/test_continuous.py: 94 passed; full tests/distributions/: 229 passed, 8 skipped (stable x2); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Tenth-pass review (docs only, no behaviour change). The ExtGenPareto class docstring still said the support_point "is the median"; it now falls back to the underlying GPD median when the ExtGPD median rounds onto mu. Updated the class docstring to: the ExtGPD median when representable, otherwise the GPD-median fallback. Softened the transform's "finite initial point for every kappa > 0" claim. That holds at ordinary scales, but the fallback mu + O(sigma) still rounds back to mu in the general representability limit where the support has no distinct interior point -- sigma far below ulp(mu) (e.g. mu=1e12, sigma=1e-12), or a bounded xi < 0 whose whole width is sub-ULP -- where the initial logp is -inf. Noted as the (parameter- agnostic) representability caveat it is. tests/distributions/test_continuous.py: 94 passed; ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a Generalized Pareto distribution family to pymc_extras, including a base GPD (GenPareto) and an extended carrier version (ExtGenPareto), along with API exports, documentation entries, and an extensive test suite validating logp/CDF/CCDF/ICDF behavior, boundary handling, RNG behavior, and default transforms.

Changes:

Implement GenPareto / ExtGenPareto distributions with stable logp/logcdf/logccdf/icdf kernels, inverse-CDF SymbolicRandomVariable RNG ops, and a probability-integral default transform.
Export new distributions via pymc_extras.distributions and include them in the Sphinx API reference.
Add comprehensive tests for correctness, boundary cases, tail stability, transform behavior, and sampling behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`pymc_extras/distributions/continuous.py`	Implements GenPareto/ExtGenPareto math, RNG ops, distribution wrappers, and PIT default transforms.
`tests/distributions/test_continuous.py`	Adds extensive validation tests for the new GPD family (logp/CDF/CCDF/ICDF, transforms, RNG, edge cases).
`pymc_extras/distributions/__init__.py`	Re-exports `GenPareto` and `ExtGenPareto` in the public distributions namespace.
`docs/api_reference.rst`	Adds `GenPareto` and `ExtGenPareto` to the autosummary API reference list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…idate s Numerical-robustness pass for the bounded (xi < 0) tail, where the family's only data-dependent quantity s = 1 + xi*z cancels toward 0 as value approaches the wall mu - sigma/xi. The log-density *value* survives (it is ~|1+1/xi|*log s), but its sigma/xi gradient carries ~z/s terms that inherit s's lost low bits, so they are accurate only to ~ulp/s. This is a representability limit of the float64 input value, not of the formula -- so this commit documents and pins it rather than pretending to recover precision the inputs no longer carry. * Consolidate: form (t, log_s) = (xi*z, log(1 + xi*z)) once per call via a new _gpd_tail(z, xi) and thread it through logp/logcdf/logccdf for both families, so the single cancellation site is explicit and the support mask, density and survival never re-derive s inconsistently. Pure refactor -- bit-identical to the prior graph on a boundary-spanning grid (verified by hash); all existing tests unchanged. * Tests (TestGenParetoBoundaryPrecision): sweep value -> the xi<0 wall for xi in {-0.05, -0.3, -0.7} and margins s in {1e-2 .. 1e-12} against a 100-digit decimal reference built from the exact s. Assert logp & d/d(sigma,xi,kappa) stay finite, correctly signed and monotone in s; the logp value stays accurate to <1e-4 relative at every margin; the sigma/xi gradient matches to the achievable ~ulp/s bound (and to machine precision far from the wall); and the kappa gradient -- whose carrier term has no 1/s factor -- stays exact even at the wall. * Docs: a precision note on GenPareto (and a cross-reference + the kappa-exempt detail on ExtGenPareto) stating the relative-precision limit near the wall and pointing boundary-critical callers to parameterize in terms of the margin s. tests/distributions/test_continuous.py: 98 passed; full tests/distributions/: 233 passed, 8 skipped; ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…/ExtGenPareto The previous Examples used `sigma ~ HalfNormal`, `xi ~ Normal` -- a free (sigma,xi) pair, which is exactly the parameterization that diverges as the support wall mu - sigma/xi approaches max(data) (the support edge depends on the parameters). Replace both Examples with the idiomatic, sampler-stable peaks-over-threshold recipe: keep natural priors on the GPD scale (Exponential) and shape, but bound the shape below by the data-in-support constraint, max(data) < mu - sigma/xi <=> xi > -sigma / (max(data) - mu), with no upper bound (xi >= 0 is an unbounded tail that always contains the data). A TruncatedNormal on xi with that derived lower bound lets NUTS sample the whole shape range -- bounded and unbounded tails alike -- with zero support-wall divergences (its transform stretches the wall to infinity in unconstrained space), recovering the SciPy MLE; verified for GPD and ExtGPD across xi in [-0.9, +0.3], mu fixed or free. GPD parameters are named pareto_* to keep them distinct from the mu/sigma *of the prior distributions*. A note flags that this is sampling stability, separate from the deeper float64 precision limit right at the wall. Docs only; no code change. ruff check + format clean; tests unchanged (98 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ppa logp test_extgenpareto_transform_stays_in_support_for_tiny_kappa asserted the transformed logp is not NaN at kappa in {1e-20, 1e-100, 1e-300}. At that depth the carrier excess underflows to exactly 0, so the quantile collapses onto mu, where a kappa < 1 GPD density diverges (logp = +inf). PyMC forms the transformed logp as logp(backward(y)) + log_jac_det(y) = (+inf) + (-inf) -- a genuine indeterminate that pytensor resolves to -inf on Linux but NaN on Windows (the CI failure), the documented sub-~1e-15 kappa floor (a numerical point mass at mu), not a defect the transform can resolve. Practical kappa keep the excess nonzero (e.g. kappa=1e-2 gives a ~4e-218 excess) so x stays > mu and the transformed logp is finite -- the exact-across-kappa test (kappa >= 1e-2) is unaffected. Drop the logp assertion; keep the robust, meaningful guarantee this test exists for (the regression for the earlier negative-excess bug): the quantile is finite and >= mu (in support) at every probed y. Comment explains why the logp is not asserted. tests/distributions/test_continuous.py: 98 passed; ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

An Exponential prior on kappa has its mode at 0, so it does not regularize away from kappa -> 0, where the ExtGPD collapses toward a point mass at mu and sigma can run off to compensate (a kappa->0, sigma->inf ridge). Gamma(2, 1) has density -> 0 at the origin (mode at 1, mean 2), keeping kappa off that degenerate corner without being informative about the bulk. Verified: 0 divergences, kappa recovered and held well away from 0 (posterior min ~0.75) across xi in [-0.5, +0.2]. Docs only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…kappa tails Addresses three review findings on the (Ext)GPD tail/boundary numerics. 1. Transform NaN for small kappa (blocker). A latent ExtGenPareto with kappa < ~|y|/745 (e.g. kappa=0.01 at y=-10, not just absurdly small kappa) hit a NaN transformed logp: the carrier excess underflows, the quantile collapses onto mu, and the kappa<1 density diverges there (logp(mu)=+inf), so PyMC's logp(backward(y)) + log_jac_det(y) became +inf - inf (-inf on Linux, NaN on Windows -- the prior CI failure). Fix: the PIT transformed density is exactly Logistic(y) for all parameters, so log_jac_det now returns that target directly (-softplus(y) - softplus(-y) - logp(backward)); the logp terms cancel and backward floors x just inside the open support so logp(backward) stays finite. The transformed logp is now finite and exactly Logistic over all of R -- this also removes the old heavy-tail (xi>0) "overflow past y~709/xi" limitation; only the *recovered quantile* backward(y) saturates to +inf / the endpoints there (the latent value is genuinely unrepresentable, which is honest). Verified: latent ExtGenPareto(kappa in {0.5, 0.05}) samples with 0 divergences, no NaN, in support; integrate-to-1 holds for bounded/unbounded, GPD/ExtGPD. 2. logccdf NaN at huge finite kappa. The tail series formed (kappa-1)(kappa-2), which overflows float64 (~1e310) for kappa>~1e155 and then multiplies an underflowed S^2 -> NaN. Keep only the first-order correction (single kappa factor): the tail branch is taken when kappa*S < e^-30, so the dropped second-order term is < e^-60, below the result's float64 precision -- bit- identical for ordinary kappa, finite and correct for kappa up to 1e300. 3. Examples: assert pareto_mu < xmin (strict). For kappa < 1 the density diverges at x = mu, so a datum exactly at the threshold gives +inf logp; exceedances are strictly above the threshold anyway. Docs: rewrote the transform's "Range of validity" -- the transformed logp is always finite Logistic; it is only x = backward(y) that saturates in the far tails. Tests: heavy-tail test now asserts finite-Logistic-with-overflowing-quantile; small-kappa test restored to assert finite == Logistic (down to kappa=1e-300); large-kappa logccdf test extended to 1e155 / 1e300. tests/distributions/test_continuous.py: 101 passed; full tests/distributions/: 236 passed, 8 skipped; ruff + format + doctest clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Tear-down round. Three real defects in 652a865. 1. logccdf tail expansion was mis-gated. The tail series is a Taylor expansion in *both* S_gpd and kappa*S_gpd, but it was gated only on kappa*S_gpd < e^-30 -- true for tiny kappa even when S_gpd ~ 1 (the body), so it used the wrong asymptotic there (log(kappa)+log(1-H) instead of log(kappa)+log(-log H)); e.g. kappa=1e-100 at x=0.01 was off by ~1.1. Now gate on BOTH a < -30 (S_gpd small) AND log(kappa)+a < -30, so the body uses the exact generic branch. The correction is written in r = kappa*S_gpd and s = S_gpd (with the quadratic r^2-3rs+2s^2 term) -- bounded, no kappa^k powers -- so it is valid where both are small and never overflows. New test pins small-kappa body values to the exact reference; the >=1e155 large-kappa cases stay finite. 2. The "finite everywhere" PIT logp relied on the optimizer cancelling logp(backward) against log_jac_det; an unoptimized (FAST_COMPILE) graph evaluated -inf + inf = NaN at the unreachable upper saturation (xi>0 quantile overflow at y~709/xi; the bounded wall). Fix: floor the subtracted logp at -1e300 in log_jac_det, so there the term stays finite and the transformed logp is -inf (robust, mode-independent), never NaN -- without relying on cancellation. No reachable x has logp below ~-1e3, so the floor never perturbs a valid value. The transformed logp is exactly Logistic for the whole reachable range and -inf only at the unreachable saturation; claim corrected accordingly. New test checks no NaN and == Logistic across the reachable range under FAST_COMPILE, for bounded/unbounded GPD and small-kappa ExtGPD. 3. The lower floor underflowed. floor = |mu|*8eps + sigma*1e-300 vanished for tiny sigma (e.g. sigma=1e-50 -> 1e-350 -> 0), so x stayed at mu and logp(mu) = +inf for kappa<1 -> NaN. Use a fixed absolute term: floor = |mu|*8eps + 1e-300 (1e-300 is a normal double; covers mu=0 and tiny sigma). The kappa<1 lower collapse (the blocker) is now robust even in FAST_COMPILE (logp(backward) finite via the floor, so the cancellation is between finite numbers). tests/distributions/test_continuous.py: 105 passed; full tests/distributions/: 240 passed, 8 skipped; ruff + format + doctest clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… docs Follow-up review. The logccdf series fix stood; these address the rest. 1. Stale docstrings (the comment that re-breaks the next person). The logccdf docstring still described the OLD gating ("keyed on log(kappa)+a ... correct for any kappa") and the old single-variable series; rewritten to match the code -- gated on BOTH a < -30 and log(kappa)+a < -30, correction in r = kappa*S and s = S, with why (the body needs the generic branch: log(kappa)+log(-log H), not log(kappa)+log(1-H)). A stale "first-order single-kappa-factor" test comment and the over-broad transform "covers every parameter" claim are corrected too. 2. dtype portability (float32). The transform floors were hard-coded float64: 1e-300 casts to 0 under float32 (and -1e300 to -inf), so the small-kappa collapse left logp(mu) = +inf and the transformed logp NaN in an unoptimized float32 graph. Now dtype-aware: the lower floor is |mu|*8*eps + finfo.tiny, and log_jac_det's guard uses finfo(dtype).eps. New float32 test (kappa = 1e-20). 3. Scale: mu != 0, sigma << ulp(mu). The lower floor lifts x by ~ulp(mu), so for a near-delta sigma the floored x sits at z ~ 1/ulp and the Logistic residue is lost to cancellation -- previously a silent wrong finite value. log_jac_det now returns -inf wherever |logp(backward)| > 1/sqrt(eps) (the cancellation would lose more than half the digits), so this regime -- and the unreachable upper saturation -- give a clean -inf reject, never a wrong finite or a NaN, and without relying on the optimizer cancelling logp. The "finite and Logistic" claim is narrowed to the representable range; the limits are documented. New tests: small-kappa logccdf body vs exact reference; transformed logp under FAST_COMPILE (no NaN, == Logistic) over the reachable range; upper saturation and degenerate near-delta give -inf not NaN; float32 small-kappa collapse is finite. tests/distributions/test_continuous.py: 107 passed; full tests/distributions/: 242 passed, 8 skipped; ruff + format + doctest clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…non-finite kappa Addresses the remaining review findings on fa003da. - Non-finite kappa (finding 4): kappa = inf passes the `kappa > 0` check but is not a valid shape, and leaked inconsistent nan / -inf / 0 across logp / logcdf / logccdf / icdf. Extend `_propagate_nonfinite_shape` with a `kappa` arg and map result -> nan wherever kappa is non-finite, so all four agree. (kappa = nan is already caught by `kappa > 0`, which raises.) New test `test_nonfinite_kappa_propagates_consistently`. - Transform contract (findings 2 & 3): add a "Bijectivity and the saturated readout" paragraph stating plainly that once the floor/wall binds the map is no longer invertible -- `forward(backward(y)) != y`, the saturated `backward` has derivative 0, and `log_jac_det` is then the exact-PIT density correction (log F + log S - logp(x)), NOT the Jacobian of the floored readout. The transformed density the sampler targets is still the correct Logistic; only the recovered latent x is quantized at the boundary. New test `test_transform_is_a_saturated_readout_when_the_quantile_collapses` proves both y collapse to the same x (so forward(backward(y)) != y) while the transformed logp stays Logistic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e NaN in icdf endpoints The previous round's non-finite-shape propagation was bypassed by the PIT transform: log_jac_det returns `logistic - logp(x)` and the framework adds `logp(backward(y))`, so the optimizer cancels the two `logp(x)` terms. For a non-finite xi/kappa that `logp(x)` is NaN, and the cancellation made the NaN disappear -- the transformed latent logp came back as a *finite* Logistic, silently accepting an invalid parameter as a valid latent. Fixes, all matching PyMC's own conventions (verified against pm.Normal / pm.Gamma / pm.StudentT): - log_jac_det: route a NaN `logp(x)` through a switch whose other branch is a bare constant (no `logp_x` to cancel), so the transformed logp stays NaN for a non-finite shape. PyMC's check_start_vals then rejects it loudly -- exactly as pm.Gamma(alpha=inf) / pm.StudentT(nu=inf) do (their transformed latent logp is also NaN, not a spurious finite density). - icdf (both GPD and ExtGPD): wrap the result in `_propagate_nonfinite_shape` so the q = 0 / 1 endpoints propagate a non-finite shape as NaN too, instead of handing back mu / the upper bound while the interior is NaN. - sigma = inf: confirmed it already matches pm.Normal's scale-of-infinity convention (logp = -inf, a clean reject, not a finite-density leak). The shape inf -> NaN / scale inf -> -inf asymmetry is PyMC's, not ours; documented and tested rather than diverged from. Tests: a parametrized regression that the transformed logp is non-finite (and check_start_vals raises SamplingError) for xi=+-inf and kappa=inf on both classes; icdf endpoint coverage added to the non-finite xi/kappa tests; an infinite-scale test asserting parity with pm.Normal. test_continuous 114 passed; full tests/distributions/ 249 passed, 8 skipped; ruff + format + doctest clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…xi=-inf Test sharpness only (no behaviour change). The transformed logp for a non-finite shape is specifically NaN (verified across all cases), so assert np.isnan rather than `not np.isfinite` -- a future regression that turned it into a -inf clean reject would otherwise pass. Also add GenPareto xi=-inf to the matrix so both signs are covered for the no-kappa family. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

The integrand is the smooth, exactly-Logistic transformed density, so the quadrature error is dominated by the +-30 tail truncation (~2e-13), not the grid spacing -- 2001 points already integrate to ~1e-13, far inside the 1e-3 tolerance, without evaluating the compiled function 30k times in a Python loop (x5 parametrized cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- _ExtGenParetoPIT._excess_from_y: the bulk/tail crossover was a hard-coded float64 700.0. Under float32 that routed y in ~[88, 700) through the bulk branch where log F_ext has already underflowed, blowing the excess (and transformed logp) up to +inf well inside the reachable range. Derive the cutoff from finfo(value.dtype) instead -- ~700 for float64 (min(700, 708.4 - 8) = 700, so float64 is bit-identical), ~80 for float32. New regression test_excess_from_y_upper_tail_is_finite_under_float32. - test_latent_sampling_stays_in_support: this runs real NUTS x3 cases; cut draws/tune 200/300 -> 100/200 and force cores=1 so it stays cheap and deterministic in CI (the clean PIT geometry still gives 0 divergences in support). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Profiling tests/distributions/ showed our GPD/ExtGPD tests were ~55% of the suite runtime, dominated by (a) tests that recompiled a fresh pytensor function on every .eval() inside a loop, and (b) one real-NUTS test that compiled the sampler three times. Both are compile-bound, not compute-bound. - test_nonfinite_xi / test_nonfinite_kappa: probe all points in one vectorised eval per method (value/q as a vector) instead of recompiling per (point, method). ~93 compiles -> ~9 for the xi test (2.9s -> 1.8s). - test_logp_logcdf_at_infinity / test_icdf_endpoints: batch the three xi into one dist so each method compiles once instead of once per xi. - test_latent_sampling_stays_in_support: sample the three geometries (heavy, exponential, bounded) as three independent latents in ONE model. The cost is the per-model sampler compile, not the draws, so one combined model is far cheaper than three parametrized cases and still checks each geometry stays in support with zero divergences. Net: tests/distributions/ 57s -> 52s; test_continuous.py 34.6s -> 30.6s. All pass; ruff + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…uplicate params The transform suite had four tests all asserting "transformed logp == Logistic(y), finite where representable" -- finite-in-tails, deep-into-tail, exact-across-kappa, small-kappa -- accumulated one per review round (~17 parametrized cases). Fold them into one parametrized test with one row per *distinct* regime (saturation at y~37, the y~745 deep-tail underflow, the y~709/xi overflow boundary, large kappa, the kappa<1 collapse onto mu), so each mechanism is still guarded but with 8 model builds instead of 17. The merged test also checks the recovered quantile stays in support (x >= mu), which the deep-tail cases previously skipped. Also drop two duplicate parametrize cases on kept tests: logccdf large-kappa 1e6 (subsumed by 1e20; the overflow guards are 1e155/1e300) and heavy-tail xi=3.0 (bracketed by 1.5 and 5.0). Net: our tests 102 -> 89; tests/distributions/ 249 -> 238. The wall-clock gain is small (~1s) -- the per-test cost is a fixed pytensor compile the merged test still pays per case -- but the suite is less redundant and each transform regime is now documented in one place. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two small per-test-compile reductions found while profiling: - test_nonfinite_shape_does_not_leak...: the parametrized (x5) test recompiled the model + initial point for check_start_vals on every case. The NaN-logp check is the real per-shape assertion; the loud-init symptom is one mechanism, so check it once in a separate test instead of five times. - test_nonfinite_xi_propagates_consistently: drop the -inf case. The guard is pt.isfinite(xi), which is sign-agnostic, so -inf exercises the exact same path as +inf; nan + inf already cover both IEEE kinds. Profiling note (not a code change): the suite is pytensor-compile-bound, so the only large lever is parallelism -- tests/distributions/ runs ~50s serially vs ~18s under `pytest -n auto` (xdist, already a dev dependency). The single longest test is the real-NUTS test_latent_sampling_stays_in_support (~5s of sampler compile), which becomes the long pole once tests run in parallel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sform merge cases - TestGenParetoBoundaryPrecision pins a documented *numerical limit* (the sigma/xi gradient degrades to ~ulp/s at the xi<0 wall), not a feature. The s-cancellation mechanism is xi-independent, so one representative xi=-0.3 over a margin sweep (1e-2 .. 1e-12) documents it fully: the largest margin is the machine-accurate anchor away from the wall (subsuming the separate far-from-boundary test, now dropped), the smallest is the ~ulp/s limit. Drops 3 of the 4 instances of the slowest non-NUTS group (gradient-graph + Decimal compiles). - Give the consolidated test_transformed_logp_is_logistic_where_representable descriptive parametrize ids (bounded-gpd, ext-collapse, ...) so a failure names the regime instead of reporting opaque [builder3-kwargs3-...] indices -- the one real downside of having merged the four tail/kappa transform tests. tests/distributions/ 238 -> 235 tests, all pass; ruff + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_uniform_draw was a two-call-site one-liner around `uniform(size=size, rng=rng, return_next_rng=True)` whose only content was a docstring. Inlining it at the two rv_op sites is the same length and clearer -- the reader sees the idiom instead of chasing a helper for it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

maresb · 2026-06-02T22:09:29Z

@ricardoV94, this is a monster, but it's been battle-tested. I've also tried to ensure that all your feedback from previous PRs has been taken into account.

The bounded-POT example priors bounded xi only from below by the data-fit constraint -sigma/(xmax-mu). That permits xi <= -1 whenever sigma > xmax-mu (which Exponential(1) always allows), and for xi <= -1 the GPD density diverges at the upper endpoint -> unbounded likelihood -> improper posterior. The sampler runs away: on small-range exceedances the chain pins the upper wall onto max(data) with xi well below -1 (observed 47 GPD / 18 ExtGPD divergences, xi down to -5.7, wall margin ~1e-9). Fix: floor the prior at xi > -1, the standard regularity condition, via lower=pm.math.maximum(-1.0, -sigma/(xmax-mu)). With the floor the posterior is proper and the runaway is gone (divergences 47->2 / 18->4, no xi < -1). Also document why mu is held fixed: the lower endpoint is mu, where a kappa < 1 ExtGPD density diverges. Fixing the threshold keeps that singularity away from the data; a free mu would slide up to min(data) and sit on it (the lower-endpoint mirror of the xi <= -1 wall). The GPD endpoint density is finite, so a free mu there is only the harder 3-parameter problem, not a blow-up. Docstring examples only; no kernel change. The kernels are mathematically correct (the densities genuinely diverge in those regimes) -- the flaw was the example priors permitting the non-regular region. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…u caveat The example priors used Exponential(1) on sigma, the one dimensional parameter. That hard-codes sigma ~ O(1), so the dimensionless xi inference is not invariant to the data's units: rescaling the same data x1->x1e6 drove xi-hat from 0.33 to 8.84 (true 0.2), with clean sampling -- silent, confident garbage. A log-scale prior (LogNormal(0, 10)) makes xi invariant to scale across ~9 orders of magnitude without the prior depending on the data (the Jeffreys 1/sigma limit is the exact, improper version). Also correct the GPD free-mu caveat: a free threshold is NOT safe for the GPD -- it has the classic three-parameter unbounded likelihood (mu -> min(data), sigma -> 0) whenever xi > n - 1. ExtGPD just hits it far more easily (mu -> min(data) alone for kappa < 1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_ExtGenParetoPIT._excess_from_y used the asymptotic S_F ~ S_ext/kappa in the tail branch, valid only when exp(-y)/kappa << 1. For subnormal kappa that ratio is O(1) at representable y (e.g. y=710, kappa=4e-309: exp(-y)/kappa ~ 1.1), so it returned a negative "excess" (-0.112 vs the true 0.395), collapsing backward onto the floor and breaking the round-trip (forward(backward(710)) = 703.5, not 710). Invert the carrier exactly instead: a = -log(F_ext)/kappa, m = -log(1 - exp(-a)), carried as log_a = -t + log(_log1p_div(-S_ext)) - log(kappa) so it survives S_ext underflow, with the a->0 (m ~ -log_a) and a->inf (m ~ 0) limits split out. This preserves the large-kappa tail (m ~ y + log kappa) and fixes the subnormal case; regression test asserts excess(710, 4e-309) ~ 0.3954 and a clean round-trip. Doc fixes: the GPD wall density diverges for xi < -1, not xi <= -1 (xi = -1 is the finite uniform boundary); the LogNormal(0,10) sigma prior is robust across scales, not scale-invariant (its location is fixed in the data's units). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…hreshold-stability role

ricardoV94

@maresb content looks great, delivery leaves a lot to be desired.

Can we:

Move functionality to new file to keep it organized (can still allow importing from distributions/continuous)
Aggressivepy pair down comment and docstring verbosity. When everything is emphasized ad nauseum nothing is. If a human or llm can infer what the comments are saying from the code we don't need it. Keep what's actually tricky/hard to recover.

Measure the total CI time incurred by the new tests? It's hard to gauge locally because of caching/different backends/GA being single core by default. Gotta assess if reasonable.
Some guards seem overly defensive and beyond what we do elsewhere or have felt the need for. Specially the fear of masking infinite/nan values. May be fine or may ne too much. Need your own opinion here.
Whenever I bash on style I'm bashing on the tool not you. All grievances are my own problem.

ricardoV94 · 2026-06-03T12:25:43Z

        )
+
+
+# ===========================================================================


Please no dangling comments in global space. Put them inside class/functions where they make sense.

Assuming they are even needed. They don't seem to be

ricardoV94 · 2026-06-03T12:28:47Z

+    Only ``xi * z`` needs this: at ``xi = 0`` with an infinite observation the
+    product is mathematically ``0`` (the exponential GPD carries no shape term),
+    but ``0.0 * inf`` is ``nan`` under IEEE, and a ``nan`` in the unused branch of
+    a ``switch`` can still leak (the backend may lower it to ``cond*a + ...``).


What backend does that?

ricardoV94 · 2026-06-03T12:32:20Z

+    return pt.and_(z >= 0, 1 + t > 0)
+
+
+def _propagate_nonfinite_shape(result, xi, kappa=None):


is this really needed? We don't worry about say inf sigma in normal likelihood and things like that

ricardoV94 · 2026-06-03T12:33:49Z

+
+
+def ext_gen_pareto_logp(value, mu, sigma, xi, kappa):
+    """Pure-PyTensor extended-GPD log-density; out-of-support values map to ``-inf``."""


pure PyTensor is such a useless statement. Really can we trim 90% of these comments and docstrings. Just visual noise

ricardoV94 · 2026-06-03T12:38:28Z

+    wall noted above (reached only when the wall is pinned within ``~1e-13`` of the
+    data).
+
+    The threshold ``mu`` is held fixed -- the standard peaks-over-threshold setup,


Can we make this paragraph sound less like llm speech? At least remove the load bearing. I believe there are exactly 0 occurrences of this term in the first 2 decades of PyMC history :)

ricardoV94 · 2026-06-03T12:39:48Z

+
+
+class _GPDProbabilityIntegralTransform(Transform):
+    """Default transform for the GPD family: ``y = logit(F(x))``.


interesting

ricardoV94 · 2026-06-03T12:41:14Z

+        return pt.switch(pt.isnan(logp_x), np.nan, jac)
+
+
+class _GenParetoPIT(_GPDProbabilityIntegralTransform):


What are these classes about/for?

ricardoV94 · 2026-06-03T12:42:34Z


+# The Generalized Pareto family legitimately evaluates to +/-inf at the
+# (measure-zero) support boundary and at probabilities 0 / 1; NumPy flags those
+# as divide-by-zero / invalid / overflow during ``.eval()``. They are the correct


we suppress these warnings are you sure we are the ones firing them, not the test? if so suppress in the tests with np.errstate context

ricardoV94 · 2026-06-03T12:45:52Z

+            if xi < 0:
+                assert np.all(xs <= mu - sigma / xi + 1e-9), f"x{i} (xi={xi}) past wall"
+
+    def test_observed_is_unaffected(self):


This test is stupid

ricardoV94 · 2026-06-03T12:56:03Z

PyMC's own conventions (e.g. a non-finite shape gives NaN like pm.Gamma, a non-finite scale gives -inf like pm.Normal).

There aren't so strict conventions. Parameter bounds are usually wrapped in a check_parameter_values that can be ignored (Model(check_bounds=False)) or converted to a ninf switch when compile_logp is called.

maresb · 2026-06-03T16:18:25Z

Thanks a lot @ricardoV94 for the speedy and savage review. I really appreciate the time you put in to read over this, and of course I don't take the criticism personally since I didn't really "write" this stuff myself. And probably I should have put in more effort before asking for review, but I had to stop somewhere. (I did put in a lot of time doing fairly rigorous correctness testing though; it's probably not perfect but I think it's close.)

This is probably overengineered, but I was going for correctness first. I actually feel like you're much better qualified to give an opinion on the various nan/inf checks than I am. I'm curious if the checks will get in the way of rewrites.

Regarding the tests, I did some extensive profiling, and I don't think there are likely many obvious wins beyond removing certain tests altogether. I'll reproduce the profiling for you, but from memory, no individual test went beyond a few seconds, and the time to run test_continuous raised from ~25 sec. to ~50 sec. I'll split out both the functions and the tests into separate files to make this easier to see.

Thanks so much for your time and patience, I really appreciate it!!!

maresb · 2026-06-03T16:20:59Z

While some of the complicated numerics were motivated purely by aggressive parameter scans, a very large proportion were actually necessary to preserve stability that was leading to real divergences during sampling.

maresb and others added 2 commits May 31, 2026 11:14

maresb marked this pull request as draft May 31, 2026 17:54

maresb force-pushed the add-gpd-extgpd-distributions branch from 494cafd to 46cd2bc Compare May 31, 2026 22:11

maresb force-pushed the add-gpd-extgpd-distributions branch from aeed57b to f8e2483 Compare June 1, 2026 06:37

maresb and others added 10 commits June 1, 2026 07:14

maresb requested a review from Copilot June 1, 2026 18:09

Copilot started reviewing on behalf of maresb June 1, 2026 18:09 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread pymc_extras/distributions/continuous.py

Comment thread pymc_extras/distributions/continuous.py

Comment thread pymc_extras/distributions/continuous.py

Comment thread tests/distributions/test_continuous.py Outdated

maresb and others added 9 commits June 1, 2026 22:24

maresb requested a review from Copilot June 2, 2026 14:46

Copilot started reviewing on behalf of maresb June 2, 2026 14:46 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

maresb and others added 6 commits June 2, 2026 14:53

maresb changed the title ~~Add gpd extgpd distributions~~ Add GenPareto and ExtGenPareto (Generalized Pareto) distributions Jun 2, 2026

maresb mentioned this pull request Jun 2, 2026

Add Generalized Pareto (GPD) and Extended GPD distributions #638

Closed

4 tasks

maresb marked this pull request as ready for review June 2, 2026 22:08

maresb requested a review from ricardoV94 June 2, 2026 22:09

maresb force-pushed the add-gpd-extgpd-distributions branch from 797ef29 to 9198af3 Compare June 2, 2026 22:15

maresb and others added 6 commits June 3, 2026 06:47

docs: trim verbose sigma-prior comment

7772b6a

docs: make the sigma-prior comment say why (units -> scale-invariant xi)

40f3f8d

docs: name the ExtGPD family (Naveau Type 1 / P&T EGP3) and kappa's t…

a22433e

…hreshold-stability role

ricardoV94 reviewed Jun 3, 2026

View reviewed changes

		)


		# ===========================================================================

		return pt.and_(z >= 0, 1 + t > 0)


		def _propagate_nonfinite_shape(result, xi, kappa=None):



		def ext_gen_pareto_logp(value, mu, sigma, xi, kappa):
		"""Pure-PyTensor extended-GPD log-density; out-of-support values map to ``-inf``."""



		class _GPDProbabilityIntegralTransform(Transform):
		"""Default transform for the GPD family: ``y = logit(F(x))``.

		return pt.switch(pt.isnan(logp_x), np.nan, jac)


		class _GenParetoPIT(_GPDProbabilityIntegralTransform):

Conversation

maresb commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Supersedes #638

Earlier review feedback addressed

Numerical behaviour

Uh oh!

codecov-commenter commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maresb commented Jun 2, 2026

Uh oh!

ricardoV94 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Jun 3, 2026

Uh oh!

maresb commented Jun 3, 2026

Uh oh!

maresb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maresb commented May 31, 2026 •

edited

Loading

codecov-commenter commented May 31, 2026 •

edited

Loading