Add GenPareto and ExtGenPareto (Generalized Pareto) distributions#689
Add GenPareto and ExtGenPareto (Generalized Pareto) distributions#689maresb wants to merge 37 commits into
Conversation
…stributions Adds two continuous distributions to pymc_extras.distributions, structured in three layers so the probability math is forward-facing and reusable: 1. Pure PyTensor functions (gen_pareto_logp/_logcdf/_icdf and the ExtGenPareto variants) that depend only on pytensor, so they can be reused directly or lifted into pytensor-distributions unchanged. 2. SymbolicRandomVariable Ops that sample by inverse-CDF on a uniform draw, so the random methods work on every backend without a SciPy object-mode path. 3. Thin Continuous distribution classes wiring the two together. The xi -> 0 limit (GPD collapsing to the exponential) is handled with C1-smooth log1p(u)/u and expm1(u)/u helpers instead of switch(isclose(xi, 0)), keeping both the value and its gradient continuous through the limit -- the property NUTS relies on near xi = 0. ExtGenPareto is the Naveau et al. (2016) extended GPD with carrier G(v) = v**k; kappa = 1 recovers the plain GPD. Includes logp/logcdf/icdf checks against scipy.genpareto and numpy references, support-point (median) tests, RV size/param and KS goodness-of-fit tests, and explicit C1-smoothness-through-xi=0 tests. Registers both in distributions/__init__.py and docs/api_reference.rst. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pymc-extras groups distribution definitions by kind (all continuous distributions live in distributions/continuous.py, discrete in discrete.py, etc.), not one module per distribution. The initial commit added a standalone generalized_pareto.py / test_generalized_pareto.py, which broke that taxonomy. Fold both distributions (their RV Ops, pure-PyTensor helper functions, and tests) into continuous.py / test_continuous.py alongside GenExtreme, the closest sibling. No behavioural change: the three-layer structure (pure functions -> SymbolicRandomVariable -> Continuous class) is preserved verbatim, and the test additions append to the existing module, leaving the GenExtreme/Chi/Maxwell tests untouched (the module-level pytestmark gains only filterwarnings entries). Full tests/distributions/ suite: 169 passed, 8 skipped; ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #689 +/- ##
===========================================
+ Coverage 51.60% 90.42% +38.81%
===========================================
Files 73 73
Lines 8003 8268 +265
===========================================
+ Hits 4130 7476 +3346
+ Misses 3873 792 -3081
🚀 New features to boost your workflow:
|
Acts on an external review of the GenPareto / ExtGenPareto distributions:
1. Boundary numerics.
- ``_safe_mul`` maps the IEEE ``0 * inf -> nan`` of ``xi * z`` back to ``0`` so
the xi = 0 path is finite at x = inf.
- logp / logcdf pin ``z = +inf`` explicitly to ``-inf`` / ``0`` (for xi > 0 the
in-support branch is ``log1p(inf)/inf -> nan``).
- ``_gpd_upper_bound`` gives ``icdf`` the finite upper endpoint ``mu - sigma/xi``
at ``q = 1`` for xi < 0 (was ``nan`` from ``inf * 0``); ``q = 0`` returns ``mu``.
- icdf evaluates its interior formula on a safe value so the discarded q = 0/1
branch never produces a nan that the graph rewriter can surface.
2. Scalar icdf. ``value`` is coerced with ``pt.as_tensor_variable`` and the methods
use PyMC's ``check_icdf_value`` / ``check_icdf_parameters``; direct calls like
``GenPareto.icdf(0.0, ...)`` no longer raise ``TypeError``. q outside [0, 1]
returns ``nan`` (the standard PyMC icdf contract).
3. Pure-PyTensor kernels. ``gen_pareto_*`` / ``ext_gen_pareto_*`` no longer call
``check_parameters``; validation lives entirely in the ``Continuous`` wrapper
methods, so the kernels are genuinely reusable (e.g. in pytensor-distributions).
4. Docs. Register GenPareto and ExtGenPareto in docs/api_reference.rst.
5. Tests. Stop skipping the parameter outside-edge test: xi carries explicit
``(None, None)`` edges (unconstrained, no invalid probe) while sigma / kappa
keep 0 / inf sentinels so the harness probes sigma <= 0 and kappa <= 0 (must
raise). The xi grid stays within (-1, 1]: at xi <= -1 the GPD has a finite
density at its closed upper endpoint (a measure-zero point the open-support
convention here legitimately handles as -inf), and at xi > 1 the q = 0.99
quantile (~1e10) exceeds ``check_icdf``'s absolute tolerance though it is
correct to ~1e-15 relative. Scope the FPE RuntimeWarning filters to the GPD
test classes via per-class ``pytestmark`` instead of muting the whole module.
Add ``TestGenParetoBoundaries`` covering x = inf, q = 0, q = 1, q outside
[0, 1], and invalid sigma / kappa.
tests/distributions/test_continuous.py: 39 passed; full tests/distributions/:
174 passed, 8 skipped (stable across repeated runs); ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
494cafd to
46cd2bc
Compare
…heavy tail
1. ``_safe_mul`` now repairs ONLY the indeterminate ``0 * inf`` (the two exact
``{0} x {+-inf}`` cases) instead of every NaN product. A genuine ``nan``
parameter (e.g. ``xi = nan`` from bad input) again propagates instead of
being silently turned into the ``xi = 0`` exponential branch. Added
``test_nan_xi_does_not_become_exponential``.
2. Documented the open-support convention explicitly. The Support table lists the
bounded tail as closed, but the implementation returns ``-inf`` / ``0`` at the
xi < 0 right endpoint. A Notes section on GenPareto (referenced from
ExtGenPareto) now states this is exact as a density limit for -1 < xi < 0 and
only differs from SciPy at the measure-zero boundary for xi <= -1.
3. Added ``TestGenParetoHeavyTail`` with *relative*-tolerance checks for xi > 1
(the infinite-mean regime the absolute-tolerance ``check_icdf`` harness can't
reach): logp / logcdf / icdf vs SciPy at rtol, plus median ``support_point``
for GenPareto and ExtGenPareto where the mean is infinite.
4. Softened the migration wording: the pure functions are portable math kernels
(density / CDF / quantile), not a full functional API (no pdf/sf/isf/rvs/
moments), so they are "portable kernels", not a distribution "lifted unchanged".
tests/distributions/test_continuous.py: 46 passed; full tests/distributions/:
181 passed, 8 skipped (stable across repeated runs); ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
aeed57b to
f8e2483
Compare
Deep-dive hardening, prioritising portability to pytensor-distributions and following the conventions of existing pymc / pymc-extras continuous distributions. logccdf (log complementary CDF = log survival function): - Add ``gen_pareto_logccdf`` / ``ext_gen_pareto_logccdf`` pure kernels and wire ``logccdf`` classmethods on both distributions. PyMC auto-registers it (like ``logcdf``) and ``pm.logccdf`` now works for both. - The survival exponent is computed directly (``-m`` for GPD; ``log1mexp(kappa * log H)`` for the extended GPD), so it is exact in the heavy upper tail -- the regime where PyMC's generic ``log1mexp(logcdf)`` fallback collapses (logcdf -> 0 there). This is the natural ``logsf`` primitive for a peaks-over-threshold model and the direct counterpart of pytensor-distributions' ``logsf``. - Tests: ``check_logccdf`` vs ``scipy.genpareto.logsf`` (GPD) and a tail-stable reference (ExtGPD), an explicit tail-stability test out to x = 1e8, ``check_selfconsistency_icdf`` (cdf(icdf(q)) == q), and a kappa = 1 reduction. Moments documentation (no methods -- matching the pymc/pymc-extras convention of documenting moments in the Support/Mean/Variance table and exposing only ``support_point``): - GPD table gains Median, Mode and Entropy rows (all closed-form, validated against scipy). - ExtGPD table gains exact closed-form Mean and Variance via the Beta function ``kappa * B(kappa, 1 - xi)`` (valid xi < 1 and xi < 1/2 respectively), Median, and the Mode (T* = (1 + xi)/(kappa + xi) for kappa > 1, else mu), with the xi -> 0 digamma / trigamma limits noted. These are exact Naveau-construction results -- not numerical-truncation approximations -- verified numerically against integration to <= 1e-8. ``support_point`` remains the median. tests/distributions/test_continuous.py: 52 passed; full tests/distributions/: 187 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ms, docs
Acts on a third-pass review of the GenPareto / ExtGenPareto distributions.
1. ExtGPD logccdf far-tail collapse (blocker). ``log1mexp(kappa * log H)``
returned -inf deep in the tail because ``log H -> 0`` underflows there, even
though the survival is finite. Recomputed from the GPD log survival ``a =
log(1 - H) = -m`` (exact in the tail): generic ``log1mexp(kappa * log1mexp(a))``
when ``a`` is not tiny, and a second-order series ``log(kappa) + a +
log1p(-(kappa-1)/2 S + ...)`` for ``a < -30``. Verified against 60-digit
reference to <= 4e-15 across xi/kappa; e.g. xi=0, kappa=2.5, x=1000 now gives
-999.084, not -inf. Factored out ``_gpd_log_S``.
2. Non-finite ``xi`` now propagates as nan (blocker). The support switch masked
nan to -inf ("valid parameter, impossible value"). logp now propagates nan,
matching logcdf/logccdf and pymc's own distributions (Normal propagates nan
for a nan parameter; pymc never uses isfinite parameter checks). Strengthened
the test to assert nan across logp/logcdf/logccdf, not merely "non-finite".
3. Default transforms (production blocker). Registered an ``Interval`` default
transform on both distributions via ``bounds_fn`` returning ``(mu,
switch(xi < 0, mu - sigma/xi, +inf))`` -- a single parameter-dependent
expression that covers the heavy (xi >= 0, one-sided) and bounded (xi < 0,
two-sided) regimes, and works when mu/sigma/xi are themselves random. Without
it a latent GPD samples on all of R (constant -inf below mu). Tests confirm
latent sampling stays in support with zero divergences and observed RVs are
unaffected.
4. Docs corrected to match the implementation / domain:
- Support table now shows the open right endpoint ``mu <= x < mu - sigma/xi``.
- GPD/ExtGPD Mode noted as valid only for xi > -1 (for xi <= -1 the supremum
is at the right endpoint, not mu).
- ExtGPD median wording fixed: it is the GPD quantile at CDF probability
H = 2^(-1/kappa) (equivalently survival 1 - 2^(-1/kappa)), not "survival
2^(-1/kappa)".
tests/distributions/test_continuous.py: 58 passed; full tests/distributions/:
193 passed, 8 skipped (stable across repeated runs); ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previously committed ``Interval`` default transform has a log-Jacobian that is discontinuous in xi at 0: the bounded two-sided (sigmoid) map and the one-sided (exp) map do not connect as the upper endpoint ``mu - sigma/xi`` diverges, so the transformed density has a ~1e12 gradient kink at xi = 0. With xi a *random* variable this triggers divergences -- measured 163/1600 with xi sampled across 0, versus 65 for a lower-bound-only transform. (The earlier "verified zero divergences" only exercised *fixed* xi, where the kink never activates -- a real gap in that check.) Replace it with a probability-integral transform ``y = logit(F(x))``. Because F is the family's own CDF, the transformed prior density is exactly Logistic and free of mu/sigma/xi, so it is C1 in every parameter (no kink anywhere), while the inverse ``icdf`` enforces the correct support -- including the moving upper wall for xi < 0 -- for all xi. Implemented as ``_GPDProbabilityIntegralTransform`` with kappa-aware GenPareto / ExtGenPareto subclasses, registered as the default transform for both. Measured divergences (xi random, 1600 draws): across 0 -> 3 (was 163); xi < 0 -> 0 (was 33); xi > 0 -> 0 (was 33). Validated: transformed density integrates to 1 for every xi, ``forward(backward(y)) == y`` exactly, and the latent posterior mean matches the untransformed target (so the divergence reduction is not from sampling a different distribution). Tests gain a Jacobian-integrates-to-1 check across both classes and a direct gradient-continuity-through-xi=0 assertion (the property an Interval transform fails) -- the regression guard the original check lacked. tests/distributions/test_continuous.py: 64 passed; full tests/distributions/: 199 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… NaN; docs Acts on a fourth-pass review. Blocking -- PIT transform invalid in the positive unconstrained tail. The old ``backward`` did ``icdf(sigmoid(y))``; ``sigmoid(y)`` rounds to exactly 1 for y >= 37 in float64, so ``icdf(1)`` returned inf/the wall and the transformed logp became nan/-inf on valid unconstrained values. Reworked the transform to stay in survival/log space: ``backward`` builds the GPD excess ``m = softplus(y)`` (base family) / ``-log(-expm1(-softplus(-y)/kappa))`` (extended) directly, and ``log_jac_det`` is the analytic ``log F + log S - logp`` rather than autodiffing the quantile graph. The transformed density is now exactly Logistic and finite far into the tail. Isolated root cause for the residual bounded-xi case: once ``backward`` materialises x at the finite wall, ``1 + xi z`` cancels to 0 (its true value is ``exp(-m)``); for xi >= 0 there is no wall and the transform is finite to |y| ~ 1000, while the bounded xi < 0 case is limited by float64 representability near the wall (|y| ~ 70), far past any sampler's reach. High -- ExtGenPareto.logccdf wrong for large kappa. The tail branch fired on ``a < -30`` but assumed ``kappa * S_gpd`` small; for large kappa it returned a positive "log probability". Now keyed on ``log(kappa) + a < -30`` (i.e. on ``kappa * S_gpd``), verified <= 0 and accurate to <= 5e-16 vs an 80-digit reference for kappa up to 1e20. Medium -- NaN propagation made consistent. A non-finite xi now yields nan from logp/logcdf/logccdf at every value (below support, in support, above support) via a shared ``_propagate_nonfinite_shape`` helper, instead of being masked to -inf/0 at the boundaries. Medium -- tests now probe the tails (y up to +-45 bounded, +-400 unbounded) and forward(backward(y)) round-trip, plus a large-kappa logccdf check -- the coverage the previous "integrate over [-30, 30]" tests lacked. Medium -- ExtGenPareto mode docs give the xi = 0 limit (mu - sigma ln T*). tests/distributions/test_continuous.py: 74 passed; full tests/distributions/: 209 passed, 8 skipped (stable across repeated runs); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd xi=+-inf
Fifth-pass review cleanup -- no new mechanism, just honesty and a policy fix.
PIT transform range of validity (was overclaimed). The framework adds
logp(backward(y)) to log_jac_det(y), so the map is only valid where the quantile
x = backward(y) is *representable*. For xi <= 0 (no upper wall) x grows at most
linearly and the map is exact essentially everywhere (verified past |y| = 1000).
For a heavy upper tail x ~ exp(xi * m), so it overflows float64 at roughly
y ~ 709 / xi (e.g. y ~ 142 for xi = 5); the bounded xi < 0 case is limited the
other way by the wall's ULP near |y| ~ 60. This is a representability limit on x,
not a bug, and is far past any sampler's reach -- but the docstring no longer
claims the map is finite on all of R for every parameter. Replaced the
"finite arbitrarily far" test with one that probes each regime to its real reach
and an explicit test that the heavy-tail map overflows only past y ~ 709 / xi.
Non-finite xi policy (was inconsistent). ``_propagate_nonfinite_shape`` keyed on
``isnan(1 + xi * z)``, which is finite at x = mu (z = 0) and for several regions
when xi = +-inf, so a bad shape leaked the ordinary boundary values there. Now
keys on ``isfinite(xi)`` directly, so xi in {nan, +inf, -inf} yields nan from
logp / logcdf / logccdf at *every* value -- in support, at the endpoint x = mu,
below, and above. Test parametrized over all three non-finite shapes and all
four value regions.
tests/distributions/test_continuous.py: 79 passed; full tests/distributions/:
214 passed, 8 skipped; ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sixth-pass review. The ExtGenPareto transform's backward built the GPD excess m = -log(S_F) from log F_ext = -softplus(-y). In the upper tail softplus(-y) rounds to exactly 0 near y ~ 745, so log F_ext -> -0, S_F -> 0 and m -> inf even though the quantile is still perfectly representable -- e.g. ExtGenPareto(xi=0, kappa=2) returned -inf at y = 1000 (true x ~ 1000.69) and ExtGenPareto(xi=0.3, kappa=2) returned nan at y = 1000 (well below the y ~ 709/xi ~ 2363 overflow boundary). This was an early underflow, not the documented representability limit. Recover m from the survival side instead. With t = softplus(y) = -log S_ext (stable for all y) and S_ext = exp(-t), S_F = 1 - (1 - S_ext)**(1/kappa) ~ S_ext / kappa in the far tail, so m -> t + log(kappa) - log(_log1p_div(-exp(-t))). That form never builds exp(-y) before a log: when exp(-t) underflows to 0 the C1 helper returns its limit 1 and m = t + log(kappa) stays finite until t itself overflows float64. The exact bulk form (m = -log(1 - F_ext**(1/kappa))) and this tail form agree to O(exp(-t)), so they are switched at t = 40 with both branches evaluated on clamped inputs to keep the discarded branch (and its gradient) finite. Verified against the previous (correct-in-bulk) implementation: ABS error <= 2.2e-16 across kappa in [0.5, 100] wherever both are valid; the switch is C1 (value and gradient match to float64, gradient exactly 1 either side); the reviewer's probes are now exact (y = 1000, xi = 0 -> 1000 + log 2; y = 1e6 still finite); round-trip forward(backward(y)) holds to 2e-13 (xi=0.3) and 1.9e-9 (xi=0) far past y = 1000. Bumped the ExtGenPareto tail tests from y_max 700/400 to 1000 so the formerly-failing probes are now in the standing suite, and tightened the transform docstring's "Range of validity" (xi < 0 is bounded, not "light/exponential"; ExtGPD now reaches as far as the base family). tests/distributions/test_continuous.py: 79 passed; full tests/distributions/: 214 passed, 8 skipped (stable x2); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Seventh-pass review. The ExtGenPareto transform's bulk/tail switch keyed on t = softplus(y) alone, ignoring kappa. The tail approximation S_F ~ S_ext / kappa is only valid when that ratio is small, so for tiny kappa the tail branch was selected too early and returned a *negative* excess -- e.g. ExtGenPareto(xi=0, kappa=1e-20) at y=40 gave an excess of ~ -6, an x below mu, and logp = -inf instead of the Logistic value. Same class of bug as the earlier large-kappa logccdf: the branch condition was on the wrong small quantity. Two changes: 1. Recover the bulk excess with -log1mexp(log F_ext / kappa) instead of -log(-expm1(log F_ext / kappa)). They are algebraically identical, but log1mexp's log1p branch never forms 1 - H, so a tiny GPD survival no longer rounds the excess to 0. This makes the inverse exact across kappa (the bulk now matches the prior implementation to <= 2.2e-16 for every kappa in [0.5, 100], including kappa = 100 where the intermediate form had drifted to ~5e-6), and keeps the excess >= 0 -- in support -- for *all* kappa > 0. The bulk/tail switch moved to value = 700 (the bulk is exact up to the log F_ext underflow near 745, so it now covers every sampler-reachable y for any kappa). 2. Documented the transform's kappa domain on the class. Inverting the carrier needs H = F_ext ** (1/kappa) resolvable from 1, i.e. ~exp(-|y|) of headroom in kappa: exact for kappa down to ~exp(-|y|) (kappa >~ 1e-15 at |y| = 40, >~ 1e-32 at |y| = 80), so any fixed kappa > 0 is exact out to |y| ~ log(1/kappa) -- far past a sampler's reach. Smaller kappa underflows the carrier and the excess rounds toward 0 (x -> mu, still in support, logp -> -inf); it never leaves the support. (Small kappa also has essentially no representable lower tail -- |y| / kappa >~ 745 -- which is the ExtGPD concentrating toward mu.) Tests: parametrized exactness across ten orders of magnitude in kappa (1e-2 .. 1e8, including the kappa < 1 upper tail that used to go out of support), and a regression that tiny kappa (1e-20 .. 1e-300) stays >= mu and never NaN. tests/distributions/test_continuous.py: 87 passed; full tests/distributions/: 222 passed, 8 skipped (stable x2); ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eighth-pass review. Last round fixed only the default transform's excess
recovery; ext_gen_pareto_icdf, ExtGenParetoRV.rv_op and support_point still used
the unstable -log(-expm1(log q / kappa)) route, so the distribution's quantile,
sampler, init point and transform disagreed in exactly the small-kappa regime the
previous commit documented. The naive form rounds the tiny GPD survival
1 - q ** (1/kappa) to 1, collapsing the excess to 0 and the quantile to mu.
Extracted the stable inverse into a shared helper
_ext_gpd_excess_from_log_prob(log_q, kappa) = -log1mexp(log_q / kappa) (whose
log1p branch never forms 1 - H) and routed all four paths through it. Concrete
effects (mu = 0):
* icdf(0.5, kappa=0.01) now returns 7.89e-31 (= 0.5 ** 100), not 0; matches
ref_ext_icdf to rtol 1e-9 and the transform's backward(0) exactly.
* support_point is the representable median again, so the default initial point
has a finite logp (was -inf) wherever the median clears mu in float64.
* random draws stop collapsing onto mu: at kappa=0.01 the fraction landing
exactly at mu drops from ~0.69 to ~0.001 (kappa=0.001: ~0.96 -> ~0.47, the
remainder being genuine mass below the smallest denormal).
Ordinary kappa is unchanged (icdf still matches SciPy to ~1e-16; the support_point
and KS RNG tests pass untouched). For kappa so small that the median is itself
float64-indistinguishable from mu (or mu's ULP swamps the tiny excess) the
distribution is a numerical point mass at mu and the init logp is -inf -- the same
representability floor documented for the transform's kappa domain, now shared
consistently by all four inverses rather than disagreeing.
Tests: parametrized agreement of icdf / transform-backward / support_point / init
logp for small kappa, and a draws-do-not-collapse regression.
tests/distributions/test_continuous.py: 91 passed; full tests/distributions/:
226 passed, 8 skipped (stable x2); ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ninth-pass review. support_point returned the ExtGPD median; for small kappa that
median is sub-ULP from mu (or the tiny excess is swamped by mu's ULP), so it
rounds onto mu, whose transformed value is -inf -- a broken default initial point
for a valid parameter such as kappa = 1e-4.
A support point only has to be a usable initialization, not the exact median when
the median is unrepresentable. When the median collapses onto mu, fall back to the
underlying GPD median (carrier H = 1/2, excess = log 2) -- a higher ExtGPD
quantile (F = 0.5 ** kappa) that is representably interior (mu + O(sigma)) for any
kappa. Because the transform's forward map is logcdf - logccdf in log space (never
logit(F) with F rounded to 1), the resulting initial logp is finite across the
whole kappa > 0 domain (e.g. kappa = 1e-300 starts at y ~ 691, logp ~ -691, not
-inf). At kappa = 1 the fallback equals the ExtGPD median, and for ordinary kappa
the median never collapses, so support_point is unchanged there (the existing
support_point and the small-kappa median tests pass untouched).
Documented on the transform that the kappa underflow floor affects only deep-tail
*evaluation* accuracy, not the initial point, which is now finite for all kappa.
Tests: support_point falls back to the GPD median and yields a finite initial logp
for kappa in {1e-4, 1e-8, 1e-300}.
tests/distributions/test_continuous.py: 94 passed; full tests/distributions/:
229 passed, 8 skipped (stable x2); ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tenth-pass review (docs only, no behaviour change). The ExtGenPareto class docstring still said the support_point "is the median"; it now falls back to the underlying GPD median when the ExtGPD median rounds onto mu. Updated the class docstring to: the ExtGPD median when representable, otherwise the GPD-median fallback. Softened the transform's "finite initial point for every kappa > 0" claim. That holds at ordinary scales, but the fallback mu + O(sigma) still rounds back to mu in the general representability limit where the support has no distinct interior point -- sigma far below ulp(mu) (e.g. mu=1e12, sigma=1e-12), or a bounded xi < 0 whose whole width is sub-ULP -- where the initial logp is -inf. Noted as the (parameter- agnostic) representability caveat it is. tests/distributions/test_continuous.py: 94 passed; ruff check + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a Generalized Pareto distribution family to pymc_extras, including a base GPD (GenPareto) and an extended carrier version (ExtGenPareto), along with API exports, documentation entries, and an extensive test suite validating logp/CDF/CCDF/ICDF behavior, boundary handling, RNG behavior, and default transforms.
Changes:
- Implement
GenPareto/ExtGenParetodistributions with stable logp/logcdf/logccdf/icdf kernels, inverse-CDFSymbolicRandomVariableRNG ops, and a probability-integral default transform. - Export new distributions via
pymc_extras.distributionsand include them in the Sphinx API reference. - Add comprehensive tests for correctness, boundary cases, tail stability, transform behavior, and sampling behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
pymc_extras/distributions/continuous.py |
Implements GenPareto/ExtGenPareto math, RNG ops, distribution wrappers, and PIT default transforms. |
tests/distributions/test_continuous.py |
Adds extensive validation tests for the new GPD family (logp/CDF/CCDF/ICDF, transforms, RNG, edge cases). |
pymc_extras/distributions/__init__.py |
Re-exports GenPareto and ExtGenPareto in the public distributions namespace. |
docs/api_reference.rst |
Adds GenPareto and ExtGenPareto to the autosummary API reference list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…idate s
Numerical-robustness pass for the bounded (xi < 0) tail, where the family's only
data-dependent quantity s = 1 + xi*z cancels toward 0 as value approaches the wall
mu - sigma/xi. The log-density *value* survives (it is ~|1+1/xi|*log s), but its
sigma/xi gradient carries ~z/s terms that inherit s's lost low bits, so they are
accurate only to ~ulp/s. This is a representability limit of the float64 input
value, not of the formula -- so this commit documents and pins it rather than
pretending to recover precision the inputs no longer carry.
* Consolidate: form (t, log_s) = (xi*z, log(1 + xi*z)) once per call via a new
_gpd_tail(z, xi) and thread it through logp/logcdf/logccdf for both families, so
the single cancellation site is explicit and the support mask, density and
survival never re-derive s inconsistently. Pure refactor -- bit-identical to the
prior graph on a boundary-spanning grid (verified by hash); all existing tests
unchanged.
* Tests (TestGenParetoBoundaryPrecision): sweep value -> the xi<0 wall for
xi in {-0.05, -0.3, -0.7} and margins s in {1e-2 .. 1e-12} against a 100-digit
decimal reference built from the exact s. Assert logp & d/d(sigma,xi,kappa) stay
finite, correctly signed and monotone in s; the logp value stays accurate to
<1e-4 relative at every margin; the sigma/xi gradient matches to the achievable
~ulp/s bound (and to machine precision far from the wall); and the kappa gradient
-- whose carrier term has no 1/s factor -- stays exact even at the wall.
* Docs: a precision note on GenPareto (and a cross-reference + the kappa-exempt
detail on ExtGenPareto) stating the relative-precision limit near the wall and
pointing boundary-critical callers to parameterize in terms of the margin s.
tests/distributions/test_continuous.py: 98 passed; full tests/distributions/:
233 passed, 8 skipped; ruff check + format clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/ExtGenPareto
The previous Examples used `sigma ~ HalfNormal`, `xi ~ Normal` -- a free (sigma,xi)
pair, which is exactly the parameterization that diverges as the support wall
mu - sigma/xi approaches max(data) (the support edge depends on the parameters).
Replace both Examples with the idiomatic, sampler-stable peaks-over-threshold
recipe: keep natural priors on the GPD scale (Exponential) and shape, but bound
the shape below by the data-in-support constraint,
max(data) < mu - sigma/xi <=> xi > -sigma / (max(data) - mu),
with no upper bound (xi >= 0 is an unbounded tail that always contains the data).
A TruncatedNormal on xi with that derived lower bound lets NUTS sample the whole
shape range -- bounded and unbounded tails alike -- with zero support-wall
divergences (its transform stretches the wall to infinity in unconstrained space),
recovering the SciPy MLE; verified for GPD and ExtGPD across xi in [-0.9, +0.3],
mu fixed or free. GPD parameters are named pareto_* to keep them distinct from the
mu/sigma *of the prior distributions*. A note flags that this is sampling
stability, separate from the deeper float64 precision limit right at the wall.
Docs only; no code change. ruff check + format clean; tests unchanged (98 passed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ppa logp
test_extgenpareto_transform_stays_in_support_for_tiny_kappa asserted the
transformed logp is not NaN at kappa in {1e-20, 1e-100, 1e-300}. At that depth the
carrier excess underflows to exactly 0, so the quantile collapses onto mu, where a
kappa < 1 GPD density diverges (logp = +inf). PyMC forms the transformed logp as
logp(backward(y)) + log_jac_det(y) = (+inf) + (-inf) -- a genuine indeterminate
that pytensor resolves to -inf on Linux but NaN on Windows (the CI failure), the
documented sub-~1e-15 kappa floor (a numerical point mass at mu), not a defect the
transform can resolve. Practical kappa keep the excess nonzero (e.g. kappa=1e-2
gives a ~4e-218 excess) so x stays > mu and the transformed logp is finite -- the
exact-across-kappa test (kappa >= 1e-2) is unaffected.
Drop the logp assertion; keep the robust, meaningful guarantee this test exists for
(the regression for the earlier negative-excess bug): the quantile is finite and
>= mu (in support) at every probed y. Comment explains why the logp is not asserted.
tests/distributions/test_continuous.py: 98 passed; ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An Exponential prior on kappa has its mode at 0, so it does not regularize away from kappa -> 0, where the ExtGPD collapses toward a point mass at mu and sigma can run off to compensate (a kappa->0, sigma->inf ridge). Gamma(2, 1) has density -> 0 at the origin (mode at 1, mean 2), keeping kappa off that degenerate corner without being informative about the bulk. Verified: 0 divergences, kappa recovered and held well away from 0 (posterior min ~0.75) across xi in [-0.5, +0.2]. Docs only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kappa tails
Addresses three review findings on the (Ext)GPD tail/boundary numerics.
1. Transform NaN for small kappa (blocker). A latent ExtGenPareto with
kappa < ~|y|/745 (e.g. kappa=0.01 at y=-10, not just absurdly small kappa) hit
a NaN transformed logp: the carrier excess underflows, the quantile collapses
onto mu, and the kappa<1 density diverges there (logp(mu)=+inf), so PyMC's
logp(backward(y)) + log_jac_det(y) became +inf - inf (-inf on Linux, NaN on
Windows -- the prior CI failure). Fix: the PIT transformed density is exactly
Logistic(y) for all parameters, so log_jac_det now returns that target directly
(-softplus(y) - softplus(-y) - logp(backward)); the logp terms cancel and
backward floors x just inside the open support so logp(backward) stays finite.
The transformed logp is now finite and exactly Logistic over all of R -- this
also removes the old heavy-tail (xi>0) "overflow past y~709/xi" limitation;
only the *recovered quantile* backward(y) saturates to +inf / the endpoints
there (the latent value is genuinely unrepresentable, which is honest).
Verified: latent ExtGenPareto(kappa in {0.5, 0.05}) samples with 0 divergences,
no NaN, in support; integrate-to-1 holds for bounded/unbounded, GPD/ExtGPD.
2. logccdf NaN at huge finite kappa. The tail series formed (kappa-1)(kappa-2),
which overflows float64 (~1e310) for kappa>~1e155 and then multiplies an
underflowed S^2 -> NaN. Keep only the first-order correction (single kappa
factor): the tail branch is taken when kappa*S < e^-30, so the dropped
second-order term is < e^-60, below the result's float64 precision -- bit-
identical for ordinary kappa, finite and correct for kappa up to 1e300.
3. Examples: assert pareto_mu < xmin (strict). For kappa < 1 the density diverges
at x = mu, so a datum exactly at the threshold gives +inf logp; exceedances are
strictly above the threshold anyway.
Docs: rewrote the transform's "Range of validity" -- the transformed logp is always
finite Logistic; it is only x = backward(y) that saturates in the far tails.
Tests: heavy-tail test now asserts finite-Logistic-with-overflowing-quantile;
small-kappa test restored to assert finite == Logistic (down to kappa=1e-300);
large-kappa logccdf test extended to 1e155 / 1e300.
tests/distributions/test_continuous.py: 101 passed; full tests/distributions/:
236 passed, 8 skipped; ruff + format + doctest clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tear-down round. Three real defects in 652a865. 1. logccdf tail expansion was mis-gated. The tail series is a Taylor expansion in *both* S_gpd and kappa*S_gpd, but it was gated only on kappa*S_gpd < e^-30 -- true for tiny kappa even when S_gpd ~ 1 (the body), so it used the wrong asymptotic there (log(kappa)+log(1-H) instead of log(kappa)+log(-log H)); e.g. kappa=1e-100 at x=0.01 was off by ~1.1. Now gate on BOTH a < -30 (S_gpd small) AND log(kappa)+a < -30, so the body uses the exact generic branch. The correction is written in r = kappa*S_gpd and s = S_gpd (with the quadratic r^2-3rs+2s^2 term) -- bounded, no kappa^k powers -- so it is valid where both are small and never overflows. New test pins small-kappa body values to the exact reference; the >=1e155 large-kappa cases stay finite. 2. The "finite everywhere" PIT logp relied on the optimizer cancelling logp(backward) against log_jac_det; an unoptimized (FAST_COMPILE) graph evaluated -inf + inf = NaN at the unreachable upper saturation (xi>0 quantile overflow at y~709/xi; the bounded wall). Fix: floor the subtracted logp at -1e300 in log_jac_det, so there the term stays finite and the transformed logp is -inf (robust, mode-independent), never NaN -- without relying on cancellation. No reachable x has logp below ~-1e3, so the floor never perturbs a valid value. The transformed logp is exactly Logistic for the whole reachable range and -inf only at the unreachable saturation; claim corrected accordingly. New test checks no NaN and == Logistic across the reachable range under FAST_COMPILE, for bounded/unbounded GPD and small-kappa ExtGPD. 3. The lower floor underflowed. floor = |mu|*8eps + sigma*1e-300 vanished for tiny sigma (e.g. sigma=1e-50 -> 1e-350 -> 0), so x stayed at mu and logp(mu) = +inf for kappa<1 -> NaN. Use a fixed absolute term: floor = |mu|*8eps + 1e-300 (1e-300 is a normal double; covers mu=0 and tiny sigma). The kappa<1 lower collapse (the blocker) is now robust even in FAST_COMPILE (logp(backward) finite via the floor, so the cancellation is between finite numbers). tests/distributions/test_continuous.py: 105 passed; full tests/distributions/: 240 passed, 8 skipped; ruff + format + doctest clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… docs
Follow-up review. The logccdf series fix stood; these address the rest.
1. Stale docstrings (the comment that re-breaks the next person). The logccdf
docstring still described the OLD gating ("keyed on log(kappa)+a ... correct for
any kappa") and the old single-variable series; rewritten to match the code --
gated on BOTH a < -30 and log(kappa)+a < -30, correction in r = kappa*S and
s = S, with why (the body needs the generic branch: log(kappa)+log(-log H), not
log(kappa)+log(1-H)). A stale "first-order single-kappa-factor" test comment and
the over-broad transform "covers every parameter" claim are corrected too.
2. dtype portability (float32). The transform floors were hard-coded float64:
1e-300 casts to 0 under float32 (and -1e300 to -inf), so the small-kappa
collapse left logp(mu) = +inf and the transformed logp NaN in an unoptimized
float32 graph. Now dtype-aware: the lower floor is |mu|*8*eps + finfo.tiny, and
log_jac_det's guard uses finfo(dtype).eps. New float32 test (kappa = 1e-20).
3. Scale: mu != 0, sigma << ulp(mu). The lower floor lifts x by ~ulp(mu), so for a
near-delta sigma the floored x sits at z ~ 1/ulp and the Logistic residue is lost
to cancellation -- previously a silent wrong finite value. log_jac_det now
returns -inf wherever |logp(backward)| > 1/sqrt(eps) (the cancellation would lose
more than half the digits), so this regime -- and the unreachable upper
saturation -- give a clean -inf reject, never a wrong finite or a NaN, and
without relying on the optimizer cancelling logp. The "finite and Logistic"
claim is narrowed to the representable range; the limits are documented.
New tests: small-kappa logccdf body vs exact reference; transformed logp under
FAST_COMPILE (no NaN, == Logistic) over the reachable range; upper saturation and
degenerate near-delta give -inf not NaN; float32 small-kappa collapse is finite.
tests/distributions/test_continuous.py: 107 passed; full tests/distributions/:
242 passed, 8 skipped; ruff + format + doctest clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…non-finite kappa Addresses the remaining review findings on fa003da. - Non-finite kappa (finding 4): kappa = inf passes the `kappa > 0` check but is not a valid shape, and leaked inconsistent nan / -inf / 0 across logp / logcdf / logccdf / icdf. Extend `_propagate_nonfinite_shape` with a `kappa` arg and map result -> nan wherever kappa is non-finite, so all four agree. (kappa = nan is already caught by `kappa > 0`, which raises.) New test `test_nonfinite_kappa_propagates_consistently`. - Transform contract (findings 2 & 3): add a "Bijectivity and the saturated readout" paragraph stating plainly that once the floor/wall binds the map is no longer invertible -- `forward(backward(y)) != y`, the saturated `backward` has derivative 0, and `log_jac_det` is then the exact-PIT density correction (log F + log S - logp(x)), NOT the Jacobian of the floored readout. The transformed density the sampler targets is still the correct Logistic; only the recovered latent x is quantized at the boundary. New test `test_transform_is_a_saturated_readout_when_the_quantile_collapses` proves both y collapse to the same x (so forward(backward(y)) != y) while the transformed logp stays Logistic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e NaN in icdf endpoints The previous round's non-finite-shape propagation was bypassed by the PIT transform: log_jac_det returns `logistic - logp(x)` and the framework adds `logp(backward(y))`, so the optimizer cancels the two `logp(x)` terms. For a non-finite xi/kappa that `logp(x)` is NaN, and the cancellation made the NaN disappear -- the transformed latent logp came back as a *finite* Logistic, silently accepting an invalid parameter as a valid latent. Fixes, all matching PyMC's own conventions (verified against pm.Normal / pm.Gamma / pm.StudentT): - log_jac_det: route a NaN `logp(x)` through a switch whose other branch is a bare constant (no `logp_x` to cancel), so the transformed logp stays NaN for a non-finite shape. PyMC's check_start_vals then rejects it loudly -- exactly as pm.Gamma(alpha=inf) / pm.StudentT(nu=inf) do (their transformed latent logp is also NaN, not a spurious finite density). - icdf (both GPD and ExtGPD): wrap the result in `_propagate_nonfinite_shape` so the q = 0 / 1 endpoints propagate a non-finite shape as NaN too, instead of handing back mu / the upper bound while the interior is NaN. - sigma = inf: confirmed it already matches pm.Normal's scale-of-infinity convention (logp = -inf, a clean reject, not a finite-density leak). The shape inf -> NaN / scale inf -> -inf asymmetry is PyMC's, not ours; documented and tested rather than diverged from. Tests: a parametrized regression that the transformed logp is non-finite (and check_start_vals raises SamplingError) for xi=+-inf and kappa=inf on both classes; icdf endpoint coverage added to the non-finite xi/kappa tests; an infinite-scale test asserting parity with pm.Normal. test_continuous 114 passed; full tests/distributions/ 249 passed, 8 skipped; ruff + format + doctest clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xi=-inf Test sharpness only (no behaviour change). The transformed logp for a non-finite shape is specifically NaN (verified across all cases), so assert np.isnan rather than `not np.isfinite` -- a future regression that turned it into a -inf clean reject would otherwise pass. Also add GenPareto xi=-inf to the matrix so both signs are covered for the no-kappa family. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The integrand is the smooth, exactly-Logistic transformed density, so the quadrature error is dominated by the +-30 tail truncation (~2e-13), not the grid spacing -- 2001 points already integrate to ~1e-13, far inside the 1e-3 tolerance, without evaluating the compiled function 30k times in a Python loop (x5 parametrized cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _ExtGenParetoPIT._excess_from_y: the bulk/tail crossover was a hard-coded float64 700.0. Under float32 that routed y in ~[88, 700) through the bulk branch where log F_ext has already underflowed, blowing the excess (and transformed logp) up to +inf well inside the reachable range. Derive the cutoff from finfo(value.dtype) instead -- ~700 for float64 (min(700, 708.4 - 8) = 700, so float64 is bit-identical), ~80 for float32. New regression test_excess_from_y_upper_tail_is_finite_under_float32. - test_latent_sampling_stays_in_support: this runs real NUTS x3 cases; cut draws/tune 200/300 -> 100/200 and force cores=1 so it stays cheap and deterministic in CI (the clean PIT geometry still gives 0 divergences in support). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Profiling tests/distributions/ showed our GPD/ExtGPD tests were ~55% of the suite runtime, dominated by (a) tests that recompiled a fresh pytensor function on every .eval() inside a loop, and (b) one real-NUTS test that compiled the sampler three times. Both are compile-bound, not compute-bound. - test_nonfinite_xi / test_nonfinite_kappa: probe all points in one vectorised eval per method (value/q as a vector) instead of recompiling per (point, method). ~93 compiles -> ~9 for the xi test (2.9s -> 1.8s). - test_logp_logcdf_at_infinity / test_icdf_endpoints: batch the three xi into one dist so each method compiles once instead of once per xi. - test_latent_sampling_stays_in_support: sample the three geometries (heavy, exponential, bounded) as three independent latents in ONE model. The cost is the per-model sampler compile, not the draws, so one combined model is far cheaper than three parametrized cases and still checks each geometry stays in support with zero divergences. Net: tests/distributions/ 57s -> 52s; test_continuous.py 34.6s -> 30.6s. All pass; ruff + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uplicate params The transform suite had four tests all asserting "transformed logp == Logistic(y), finite where representable" -- finite-in-tails, deep-into-tail, exact-across-kappa, small-kappa -- accumulated one per review round (~17 parametrized cases). Fold them into one parametrized test with one row per *distinct* regime (saturation at y~37, the y~745 deep-tail underflow, the y~709/xi overflow boundary, large kappa, the kappa<1 collapse onto mu), so each mechanism is still guarded but with 8 model builds instead of 17. The merged test also checks the recovered quantile stays in support (x >= mu), which the deep-tail cases previously skipped. Also drop two duplicate parametrize cases on kept tests: logccdf large-kappa 1e6 (subsumed by 1e20; the overflow guards are 1e155/1e300) and heavy-tail xi=3.0 (bracketed by 1.5 and 5.0). Net: our tests 102 -> 89; tests/distributions/ 249 -> 238. The wall-clock gain is small (~1s) -- the per-test cost is a fixed pytensor compile the merged test still pays per case -- but the suite is less redundant and each transform regime is now documented in one place. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two small per-test-compile reductions found while profiling: - test_nonfinite_shape_does_not_leak...: the parametrized (x5) test recompiled the model + initial point for check_start_vals on every case. The NaN-logp check is the real per-shape assertion; the loud-init symptom is one mechanism, so check it once in a separate test instead of five times. - test_nonfinite_xi_propagates_consistently: drop the -inf case. The guard is pt.isfinite(xi), which is sign-agnostic, so -inf exercises the exact same path as +inf; nan + inf already cover both IEEE kinds. Profiling note (not a code change): the suite is pytensor-compile-bound, so the only large lever is parallelism -- tests/distributions/ runs ~50s serially vs ~18s under `pytest -n auto` (xdist, already a dev dependency). The single longest test is the real-NUTS test_latent_sampling_stays_in_support (~5s of sampler compile), which becomes the long pole once tests run in parallel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sform merge cases - TestGenParetoBoundaryPrecision pins a documented *numerical limit* (the sigma/xi gradient degrades to ~ulp/s at the xi<0 wall), not a feature. The s-cancellation mechanism is xi-independent, so one representative xi=-0.3 over a margin sweep (1e-2 .. 1e-12) documents it fully: the largest margin is the machine-accurate anchor away from the wall (subsuming the separate far-from-boundary test, now dropped), the smallest is the ~ulp/s limit. Drops 3 of the 4 instances of the slowest non-NUTS group (gradient-graph + Decimal compiles). - Give the consolidated test_transformed_logp_is_logistic_where_representable descriptive parametrize ids (bounded-gpd, ext-collapse, ...) so a failure names the regime instead of reporting opaque [builder3-kwargs3-...] indices -- the one real downside of having merged the four tail/kappa transform tests. tests/distributions/ 238 -> 235 tests, all pass; ruff + format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_uniform_draw was a two-call-site one-liner around `uniform(size=size, rng=rng, return_next_rng=True)` whose only content was a docstring. Inlining it at the two rv_op sites is the same length and clearer -- the reader sees the idiom instead of chasing a helper for it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@ricardoV94, this is a monster, but it's been battle-tested. I've also tried to ensure that all your feedback from previous PRs has been taken into account. |
797ef29 to
9198af3
Compare
The bounded-POT example priors bounded xi only from below by the data-fit constraint -sigma/(xmax-mu). That permits xi <= -1 whenever sigma > xmax-mu (which Exponential(1) always allows), and for xi <= -1 the GPD density diverges at the upper endpoint -> unbounded likelihood -> improper posterior. The sampler runs away: on small-range exceedances the chain pins the upper wall onto max(data) with xi well below -1 (observed 47 GPD / 18 ExtGPD divergences, xi down to -5.7, wall margin ~1e-9). Fix: floor the prior at xi > -1, the standard regularity condition, via lower=pm.math.maximum(-1.0, -sigma/(xmax-mu)). With the floor the posterior is proper and the runaway is gone (divergences 47->2 / 18->4, no xi < -1). Also document why mu is held fixed: the lower endpoint is mu, where a kappa < 1 ExtGPD density diverges. Fixing the threshold keeps that singularity away from the data; a free mu would slide up to min(data) and sit on it (the lower-endpoint mirror of the xi <= -1 wall). The GPD endpoint density is finite, so a free mu there is only the harder 3-parameter problem, not a blow-up. Docstring examples only; no kernel change. The kernels are mathematically correct (the densities genuinely diverge in those regimes) -- the flaw was the example priors permitting the non-regular region. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…u caveat The example priors used Exponential(1) on sigma, the one dimensional parameter. That hard-codes sigma ~ O(1), so the dimensionless xi inference is not invariant to the data's units: rescaling the same data x1->x1e6 drove xi-hat from 0.33 to 8.84 (true 0.2), with clean sampling -- silent, confident garbage. A log-scale prior (LogNormal(0, 10)) makes xi invariant to scale across ~9 orders of magnitude without the prior depending on the data (the Jeffreys 1/sigma limit is the exact, improper version). Also correct the GPD free-mu caveat: a free threshold is NOT safe for the GPD -- it has the classic three-parameter unbounded likelihood (mu -> min(data), sigma -> 0) whenever xi > n - 1. ExtGPD just hits it far more easily (mu -> min(data) alone for kappa < 1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_ExtGenParetoPIT._excess_from_y used the asymptotic S_F ~ S_ext/kappa in the tail branch, valid only when exp(-y)/kappa << 1. For subnormal kappa that ratio is O(1) at representable y (e.g. y=710, kappa=4e-309: exp(-y)/kappa ~ 1.1), so it returned a negative "excess" (-0.112 vs the true 0.395), collapsing backward onto the floor and breaking the round-trip (forward(backward(710)) = 703.5, not 710). Invert the carrier exactly instead: a = -log(F_ext)/kappa, m = -log(1 - exp(-a)), carried as log_a = -t + log(_log1p_div(-S_ext)) - log(kappa) so it survives S_ext underflow, with the a->0 (m ~ -log_a) and a->inf (m ~ 0) limits split out. This preserves the large-kappa tail (m ~ y + log kappa) and fixes the subnormal case; regression test asserts excess(710, 4e-309) ~ 0.3954 and a clean round-trip. Doc fixes: the GPD wall density diverges for xi < -1, not xi <= -1 (xi = -1 is the finite uniform boundary); the LogNormal(0,10) sigma prior is robust across scales, not scale-invariant (its location is fixed in the data's units). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hreshold-stability role
ricardoV94
left a comment
There was a problem hiding this comment.
@maresb content looks great, delivery leaves a lot to be desired.
Can we:
- Move functionality to new file to keep it organized (can still allow importing from distributions/continuous)
- Aggressivepy pair down comment and docstring verbosity. When everything is emphasized ad nauseum nothing is. If a human or llm can infer what the comments are saying from the code we don't need it. Keep what's actually tricky/hard to recover.
- Measure the total CI time incurred by the new tests? It's hard to gauge locally because of caching/different backends/GA being single core by default. Gotta assess if reasonable.
- Some guards seem overly defensive and beyond what we do elsewhere or have felt the need for. Specially the fear of masking infinite/nan values. May be fine or may ne too much. Need your own opinion here.
- Whenever I bash on style I'm bashing on the tool not you. All grievances are my own problem.
| ) | ||
|
|
||
|
|
||
| # =========================================================================== |
There was a problem hiding this comment.
Please no dangling comments in global space. Put them inside class/functions where they make sense.
Assuming they are even needed. They don't seem to be
| Only ``xi * z`` needs this: at ``xi = 0`` with an infinite observation the | ||
| product is mathematically ``0`` (the exponential GPD carries no shape term), | ||
| but ``0.0 * inf`` is ``nan`` under IEEE, and a ``nan`` in the unused branch of | ||
| a ``switch`` can still leak (the backend may lower it to ``cond*a + ...``). |
| return pt.and_(z >= 0, 1 + t > 0) | ||
|
|
||
|
|
||
| def _propagate_nonfinite_shape(result, xi, kappa=None): |
There was a problem hiding this comment.
is this really needed? We don't worry about say inf sigma in normal likelihood and things like that
|
|
||
|
|
||
| def ext_gen_pareto_logp(value, mu, sigma, xi, kappa): | ||
| """Pure-PyTensor extended-GPD log-density; out-of-support values map to ``-inf``.""" |
There was a problem hiding this comment.
pure PyTensor is such a useless statement. Really can we trim 90% of these comments and docstrings. Just visual noise
| wall noted above (reached only when the wall is pinned within ``~1e-13`` of the | ||
| data). | ||
|
|
||
| The threshold ``mu`` is held fixed -- the standard peaks-over-threshold setup, |
There was a problem hiding this comment.
Can we make this paragraph sound less like llm speech? At least remove the load bearing. I believe there are exactly 0 occurrences of this term in the first 2 decades of PyMC history :)
|
|
||
|
|
||
| class _GPDProbabilityIntegralTransform(Transform): | ||
| """Default transform for the GPD family: ``y = logit(F(x))``. |
| return pt.switch(pt.isnan(logp_x), np.nan, jac) | ||
|
|
||
|
|
||
| class _GenParetoPIT(_GPDProbabilityIntegralTransform): |
There was a problem hiding this comment.
What are these classes about/for?
|
|
||
| # The Generalized Pareto family legitimately evaluates to +/-inf at the | ||
| # (measure-zero) support boundary and at probabilities 0 / 1; NumPy flags those | ||
| # as divide-by-zero / invalid / overflow during ``.eval()``. They are the correct |
There was a problem hiding this comment.
we suppress these warnings are you sure we are the ones firing them, not the test? if so suppress in the tests with np.errstate context
| if xi < 0: | ||
| assert np.all(xs <= mu - sigma / xi + 1e-9), f"x{i} (xi={xi}) past wall" | ||
|
|
||
| def test_observed_is_unaffected(self): |
There aren't so strict conventions. Parameter bounds are usually wrapped in a check_parameter_values that can be ignored ( |
|
Thanks a lot @ricardoV94 for the speedy and savage review. I really appreciate the time you put in to read over this, and of course I don't take the criticism personally since I didn't really "write" this stuff myself. And probably I should have put in more effort before asking for review, but I had to stop somewhere. (I did put in a lot of time doing fairly rigorous correctness testing though; it's probably not perfect but I think it's close.) This is probably overengineered, but I was going for correctness first. I actually feel like you're much better qualified to give an opinion on the various nan/inf checks than I am. I'm curious if the checks will get in the way of rewrites. Regarding the tests, I did some extensive profiling, and I don't think there are likely many obvious wins beyond removing certain tests altogether. I'll reproduce the profiling for you, but from memory, no individual test went beyond a few seconds, and the time to run Thanks so much for your time and patience, I really appreciate it!!! |
|
While some of the complicated numerics were motivated purely by aggressive parameter scans, a very large proportion were actually necessary to preserve stability that was leading to real divergences during sampling. |
Summary
Adds two continuous distributions to
pymc_extras.distributions:GenPareto— the Generalized Pareto distribution (GPD), the peaks-over-threshold counterpart ofGenExtreme. Parametrized as in Coles (2001) / SciPy'sgenpareto(c = ξ).ExtGenPareto— the extended GPD of Naveau et al. (2016):F = H**κwithHthe GPD CDF, adding a lower-tail shapeκwhileξkeeps controlling the upper tail (κ = 1recovers the GPD).Each provides stable pure-PyTensor
logp/logcdf/logccdf/icdfkernels, aSymbolicRandomVariable(inverse-CDF sampling), asupport_point, and a probability-integral default transform so they work as latent variables. Both are exported frompymc_extras.distributionsand added to the Sphinx API reference.Supersedes #638
This reimplements #638 following the review there: the RVs are now
SymbolicRandomVariables doing inverse-CDF sampling, so the random methods work across backends without SciPy. (The original GPD attempt was #294.)Earlier review feedback addressed
SymbolicRandomVariable/ inverse-CDF (Add Generalized Pareto (GPD) and Extended GPD distributions #638) — done for both distributions.docs/api_reference.rst.skip_paramdomain_outside_edge_test); boundary behaviour is covered explicitly inTestGenParetoBoundaries.support_pointonly needs a finite-logp point (Implement Generalized Pareto distribution #294) — returns the median (finite for everyξ, unlike the mean, which diverges forξ ≥ 1), broadcast oversize.ξ = 0limit (Implement Generalized Pareto distribution #294) — the whole family is routed throughlog1p(ξz)/ξ, keeping value and gradient C¹ throughξ = 0(regression-tested).Numerical behaviour
The kernels stay finite and accurate deep into the tails and near the bounded
ξ < 0wall (down to the documented float64 representability limits); the probability-integral default transform avoids theξ = 0Jacobian kink anIntervaltransform would introduce (the headline reason for it); and boundary / non-finite-parameter behaviour follows PyMC's own conventions (e.g. a non-finite shape gives NaN likepm.Gamma, a non-finite scale gives-inflikepm.Normal). All of this is covered intests/distributions/test_continuous.py.