Skip to content

Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916

Open
@YoochanMyung

Description

@YoochanMyung

Describe the bug
Getting different results by turning on/off sklearnex with ExtraTrees and RandomForest algorithms.
This issue occurs starting with version 2024.1. I found it with my own dataset, and it's also reproducible with the breast_cancerdataset, but not with the iris dataset.

To Reproduce

  1. Setup 'scikit-learn==1.5.1' (any version from 1.2.1)
  2. Setup 'scikit-learn-intelex==2024.1' (any version from 2024.1)
  3. Run the following test code:
import pandas as pd

from sklearnex import patch_sklearn
patch_sklearn()

from xgboost import XGBClassifier
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.metrics import multilabel_confusion_matrix, confusion_matrix

from sklearn.model_selection import  cross_val_predict, train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, label_binarize
from sklearn.metrics import matthews_corrcoef, confusion_matrix
N_CORES = 16

# Toy Data

from sklearn.datasets import load_iris,load_breast_cancer
data = load_breast_cancer()
X = data['data']
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=1)

# ExtraTrees
classifier_cv = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)
classifier_test = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)

cv_results = cross_val_predict(classifier_cv, X_train, y_train, cv=10)
classifier_test.fit(X_train, y_train)

test_results = classifier_test.predict(X_test)
print("###CV###")
print(matthews_corrcoef(y_train, cv_results))
print(confusion_matrix(y_train,cv_results).ravel())

print("###TEST###")
print(matthews_corrcoef(y_test, test_results))
print(confusion_matrix(y_test,test_results).ravel())

Expected behavior
Same results between using sklearnex and original sklearn.

Output/Screenshots

Before patching sklearnex with ExtraTrees

###CV###
0.935861738490973
[144   5   7 242]
###TEST###
0.9247930594534806
[ 58   5   1 107]

After patching sklearnex with ExtraTrees

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
###CV###
0.9409328452526324
[143   6   5 244]
###TEST###
0.8992907835033845
[ 57   6   2 106]

Environment:

  • OS: Ubuntu 22.04.04 LTS
  • Scikit-learn==1.5.1 but I tested on 1.2.1, 1.3.x, 1.4.x.. etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions