Skip to content

Prevent zero-variance instability in BaseProbaRegressor.predict_proba#956

Open
kindler-king wants to merge 5 commits intosktime:mainfrom
kindler-king:bugfix-zero-variance-predict-proba
Open

Prevent zero-variance instability in BaseProbaRegressor.predict_proba#956
kindler-king wants to merge 5 commits intosktime:mainfrom
kindler-king:bugfix-zero-variance-predict-proba

Conversation

@kindler-king
Copy link
Contributor

Reference Issues/PRs
Fixes #955

What does this implement/fix?

This PR fixes a numerical instability in BaseProbaRegressor.predict_proba.

When predict_var returns 0, the fallback Normal distribution is constructed with sigma=0, which leads to divide-by-zero warnings and NaN values when evaluating pdf or log_pdf.

To prevent this, the predicted variance is clipped to machine epsilon before computing the standard deviation:

pred_var = np.clip(pred_var, np.finfo(float).eps, None)
This ensures the resulting Normal distribution always has a strictly positive scale while leaving normal model outputs effectively unchanged.

Does your contribution introduce a new dependency?
No.

What should a reviewer concentrate their feedback on?

  • Whether clipping variance at machine epsilon is the appropriate safeguard.
  • Consistency with the existing probabilistic regression design.

Did you add any tests for the change?
Yes.

A regression test was added that uses a mock regressor returning zero variance and verifies that:
predict_proba().pdf() and log_pdf() remain finite
no numerical warnings are raised

kindler-king and others added 5 commits February 23, 2026 17:40
…nverter_store)

In BaseProbaRegressor._check_C, the censoring indicator C was being
converted using self._y_converter_store instead of the dedicated
self._C_converter_store. This could silently corrupt inverse-transform
state when y and C have different mtypes (e.g. pd.DataFrame vs ndarray),
causing both to share the same converter dictionary.

Also fixes the stale copy-paste comment that said 'convert y to
y_inner_mtype' inside _check_C.

Fixes: sktime#749
@fkiraly fkiraly added bug module:probability&simulation probability distributions and simulators module:regression probabilistic regression module and removed module:probability&simulation probability distributions and simulators labels Mar 21, 2026
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this is a hack. Instead of clipping it, I would instead return a Delta distribution if the variance is below machine epsilon (possibly times a factor).

Also, code formatting tests are failing. Please look at the dev guide, and pre-commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Zero predicted variance causes instability in BaseProbaRegressor.predict_proba

2 participants