[BUG] Fix Pareto _pdf and _log_pdf returning nonzero values outside support by ANANYA542 · Pull Request #968 · sktime/skpro

ANANYA542 · 2026-03-17T21:57:34Z

Description

The Pareto distribution is defined only for x >= scale. However, _pdf and _log_pdf evaluated the formula blindly for all x, returning large wrong positive values for x < scale (e.g., pdf(1.0) = 24.0 when scale=2.0, should be 0.0).
This PR adds np.where support boundary checks to both methods, matching the pattern already used by _cdf in the same file.

Reference Issues/PRs

Fixes #967

What does this implement/fix? Explain your changes.

Added np.where(x >= scale, ...) guards to _pdf and _log_pdf in pareto.py:

_pdf: returns 0.0 for x < scale (was returning large positive values like 384.0)
_log_pdf: returns -np.inf for x < scale (was returning finite positive values like 5.95)
The _cdf method in the same file (line 145) already handled this correctly:

cdf_arr = np.where(x < scale, 0, 1 - np.power(scale / x, alpha))

The fix simply applies the same pattern to _pdf and _log_pdf.
Verification:

========================================================================
VERIFY FIX: Pareto _pdf support boundary check
Parameters: alpha=3.0, scale=2.0
========================================================================
    x   in support?    skpro _pdf     scipy pdf    status
------------------------------------------------------------------------
  0.5            NO        0.0000        0.0000      PASS
  1.0            NO        0.0000        0.0000      PASS
  1.5            NO        0.0000        0.0000      PASS
  2.0           YES        1.5000        1.5000      PASS
  3.0           YES        0.2963        0.2963      PASS
  5.0           YES        0.0384        0.0384      PASS
------------------------------------------------------------------------
VERIFY FIX: Pareto _log_pdf support boundary check
------------------------------------------------------------------------
  x=0.5  in_support=NO  skpro=      -inf  scipy=      -inf  PASS
  x=1.0  in_support=NO  skpro=      -inf  scipy=      -inf  PASS
  x=1.5  in_support=NO  skpro=      -inf  scipy=      -inf  PASS
  x=2.0  in_support=YES  skpro=    0.4055  scipy=    0.4055  PASS
  x=3.0  in_support=YES  skpro=   -1.2164  scipy=   -1.2164  PASS
------------------------------------------------------------------------
ALL TESTS PASSED -- fix verified

Screenshot for the same is attatched below:

Does your contribution introduce a new dependency? If yes, which one?

no

What should a reviewer concentrate their feedback on?

Verifying the np.where guard is placed correctly in both _pdf and _log_pdf.
Confirming consistency with the existing _cdf boundary check.

Did you add any tests for the change?

No new tests needed. The existing test suite validates PDF/CDF/PPF correctness for the Pareto distribution.

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

…upport The Pareto distribution is defined only for x >= scale. However, _pdf and _log_pdf evaluated the formula for all x, returning large wrong positive values for x < scale (e.g., pdf(1.0) = 24.0 when scale=2.0, should be 0.0). Added np.where support boundary checks to both methods, matching the pattern already used by _cdf in the same file.

ANANYA542 requested review from felipeangelimvieira and fkiraly as code owners March 17, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fix Pareto _pdf and _log_pdf returning nonzero values outside support#968

[BUG] Fix Pareto _pdf and _log_pdf returning nonzero values outside support#968
ANANYA542 wants to merge 1 commit intosktime:mainfrom
ANANYA542:fix/pareto-pdf-support-check

ANANYA542 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ANANYA542 commented Mar 17, 2026

Description

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

For all contributions

For new estimators

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant