Skip to content

[BUG] Fix Pareto _pdf and _log_pdf returning nonzero values outside support#968

Open
ANANYA542 wants to merge 1 commit intosktime:mainfrom
ANANYA542:fix/pareto-pdf-support-check
Open

[BUG] Fix Pareto _pdf and _log_pdf returning nonzero values outside support#968
ANANYA542 wants to merge 1 commit intosktime:mainfrom
ANANYA542:fix/pareto-pdf-support-check

Conversation

@ANANYA542
Copy link

Description

The Pareto distribution is defined only for x >= scale. However, _pdf and _log_pdf evaluated the formula blindly for all x, returning large wrong positive values for x < scale (e.g., pdf(1.0) = 24.0 when scale=2.0, should be 0.0).
This PR adds np.where support boundary checks to both methods, matching the pattern already used by _cdf in the same file.

Reference Issues/PRs

Fixes #967

What does this implement/fix? Explain your changes.

Added np.where(x >= scale, ...) guards to _pdf and _log_pdf in pareto.py:

  • _pdf: returns 0.0 for x < scale (was returning large positive values like 384.0)
  • _log_pdf: returns -np.inf for x < scale (was returning finite positive values like 5.95)
    The _cdf method in the same file (line 145) already handled this correctly:
cdf_arr = np.where(x < scale, 0, 1 - np.power(scale / x, alpha))

The fix simply applies the same pattern to _pdf and _log_pdf.
Verification:

========================================================================
VERIFY FIX: Pareto _pdf support boundary check
Parameters: alpha=3.0, scale=2.0
========================================================================
    x   in support?    skpro _pdf     scipy pdf    status
------------------------------------------------------------------------
  0.5            NO        0.0000        0.0000      PASS
  1.0            NO        0.0000        0.0000      PASS
  1.5            NO        0.0000        0.0000      PASS
  2.0           YES        1.5000        1.5000      PASS
  3.0           YES        0.2963        0.2963      PASS
  5.0           YES        0.0384        0.0384      PASS
------------------------------------------------------------------------
VERIFY FIX: Pareto _log_pdf support boundary check
------------------------------------------------------------------------
  x=0.5  in_support=NO  skpro=      -inf  scipy=      -inf  PASS
  x=1.0  in_support=NO  skpro=      -inf  scipy=      -inf  PASS
  x=1.5  in_support=NO  skpro=      -inf  scipy=      -inf  PASS
  x=2.0  in_support=YES  skpro=    0.4055  scipy=    0.4055  PASS
  x=3.0  in_support=YES  skpro=   -1.2164  scipy=   -1.2164  PASS
------------------------------------------------------------------------
ALL TESTS PASSED -- fix verified

Screenshot for the same is attatched below:
image

Does your contribution introduce a new dependency? If yes, which one?

no

What should a reviewer concentrate their feedback on?

  • Verifying the np.where guard is placed correctly in both _pdf and _log_pdf.
  • Confirming consistency with the existing _cdf boundary check.

Did you add any tests for the change?

No new tests needed. The existing test suite validates PDF/CDF/PPF correctness for the Pareto distribution.

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

…upport

The Pareto distribution is defined only for x >= scale. However,
_pdf and _log_pdf evaluated the formula for all x, returning large
wrong positive values for x < scale (e.g., pdf(1.0) = 24.0 when
scale=2.0, should be 0.0).

Added np.where support boundary checks to both methods, matching
the pattern already used by _cdf in the same file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Pareto _pdf and _log_pdf return nonzero values outside support (x < scale)

1 participant