Wrong results for single-prediction sparse-matrix input to SVC #1880
Open
Description
Describe the bug
The bug occurs if a single sample in sparse-matrix format is given to a SVM classifier. The prediction made by the SVM classifier differs from batch prediction, which gives the correct prediction. Furthermore, the problem occurs for both predict
and predict_proba
.
To Reproduce
The following code gives a minimal example:
from sklearnex import patch_sklearn
patch_sklearn()
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
X = np.array(['hello world there', 'hello planet earth'])
y = np.array([1, 0])
test_1 = ['earth']
test_2 = ['earth', 'earth']
svm = SVC(probability=True)
vectorizer = TfidfVectorizer()
svm.fit(vectorizer.fit_transform(X), y)
print(svm.predict_proba(vectorizer.transform(test_1)))
print(svm.predict_proba(vectorizer.transform(test_2)))
Expected behavior
The output gives different results for the probabilities of a sample predicted alone versus predicted as batch. This is not correct, instead, the probabilities should be the same. The problem vanishes if sklearnex is disabled.
Output/Screenshots
This is the output of the minimal example:
[[0.5 0.5]]
[[0.52834581 0.47165419]
[0.52834581 0.47165419]]