-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Description
Hello, and thank you for your work on this great library!
I'm seeing a pretty big difference in probabilities when using CalibratedClassifierCV with isotonic regression together with RandomForestClassifier.
It seems like it's only happening when the max_depth parameter is set high enough.
I've provided a small snippet to reproduce the issue, with the following versions of libraries:
scikit-learn==1.6.0skl2onnx==1.18.0onnxruntime==1.20.1
import numpy as np
import onnxruntime as ort
from numpy.testing import assert_almost_equal
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn.calibration import CalibratedClassifierCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(
n_samples=400_000,
n_features=15,
n_informative=15,
n_redundant=0,
n_classes=2,
n_clusters_per_class=2,
random_state=30,
)
X = X.astype(np.float32)
rf = RandomForestClassifier(
max_depth=10,
n_jobs=-1,
random_state=1234,
).fit(X, y)
model = CalibratedClassifierCV(rf, method="isotonic", cv="prefit").fit(
X, y
)
model_onnx = convert_sklearn(
model,
initial_types=[("input", FloatTensorType([None, X.shape[1]]))],
target_opset=15,
options={"zipmap": False},
)
session = ort.InferenceSession(model_onnx.SerializeToString())
output = session.run(
["probabilities"],
{"input": X},
)
onnx_probs = output[0][:,1]
model_probs = model.predict_proba(X)[:,1].astype(np.float32)
assert_almost_equal(onnx_probs, model_probs, decimal=5)The result is:
> Mismatched elements: 4485 / 400000 (1.12%)
Max absolute difference among violations: 0.01261032
Max relative difference among violations: 0.11618411I see that IsotonicRegression is not supported on https://onnx.ai/sklearn-onnx/supported.html but I would think CalibratedClassifierCV with both methods would be supported.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels