Skip to content

Converted LGBMClassifier gives inconsistent predictions #728

@mkoruszowic

Description

@mkoruszowic

When converting a trained LGBMClassifier to ONNX, the ONNX model produces a different prediction than the original scikit-learn model for the same input.


Reproduction Steps

  1. Unzip provided model:

lgbm_class.pickle.zip

  1. Run the following code:
import pickle

import numpy as np
import onnxruntime
from lightgbm import LGBMClassifier
from onnxmltools.convert.lightgbm.operator_converters.LightGbm import convert_lightgbm
from skl2onnx import to_onnx
from skl2onnx import update_registered_converter
from skl2onnx.common.data_types import guess_data_type
from skl2onnx.common.shape_calculator import (
    calculate_linear_classifier_output_shapes,
)

update_registered_converter(
    LGBMClassifier,
    "LightGbmLGBMClassifier",
    calculate_linear_classifier_output_shapes,
    convert_lightgbm,
    options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
)

X=np.array([[ 
         3.0000000e+00,  0.0000000e+00,  4.0000000e+00,  5.0000000e+00,
         2.5000000e+02,  1.0000000e+00,  1.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  1.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  2.0000000e+00,  0.0000000e+00,
         1.0000000e+00,  0.0000000e+00,  0.0000000e+00,  1.0000000e+00,
        -3.2525266e+03, -1.1329592e+01, -3.5146961e-01, -3.7130871e-01,
         2.6571992e-01, -9.1359066e-03, -5.1521581e-01,  7.3314957e-02,
         0.0000000e+00,  6.2500000e+04,  0.0000000e+00,  1.0578929e+07,
         2.7753743e-03,  1.2353089e-01,  2.5566691e-01,  1.1486175e+00,
         7.0607074e-02,  5.3750831e-03,  4.2937987e-05,  9.2572132e-05]],
      dtype=np.float32)


with open("lgbm_class.pickle", "rb") as f:
    sklearn_model = pickle.load(f)

sklearn_pred = sklearn_model.predict(X)

onnx_model = to_onnx(
        sklearn_model,
        target_opset={"": 19, "ai.onnx.ml": 3},
        initial_types=guess_data_type(X),
    )

sess = onnxruntime.InferenceSession(
    onnx_model.SerializeToString(), providers=["CPUExecutionProvider"],
)
input_feed={sess.get_inputs()[0].name: X.tolist()}
onnx_pred = sess.run(None, input_feed)[0]

print(f"{sklearn_pred=}")
print(f"{onnx_pred=}")

Output

sk_pred = array(['Risk'], dtype=object)
onnx_pred = array(['No Risk'], dtype=object)

ONNX env:

python - 3.11

lightgbm                  4.2.0                    pypi_0    pypi
onnx                      1.16.0                   pypi_0    pypi
onnxconverter-common      1.15.0                   pypi_0    pypi
onnxmltools               1.14.0                   pypi_0    pypi
onnxruntime               1.16.0                   pypi_0    pypi
onnxruntime-extensions    0.13.0                   pypi_0    pypi
skl2onnx                  1.19.1                   pypi_0    pypi


Expected

Both models should return the same prediction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions