Skip to content

Wrong results for single-prediction sparse-matrix input to SVC #1880

Open
@Simon2496

Description

Describe the bug
The bug occurs if a single sample in sparse-matrix format is given to a SVM classifier. The prediction made by the SVM classifier differs from batch prediction, which gives the correct prediction. Furthermore, the problem occurs for both predict and predict_proba.

To Reproduce
The following code gives a minimal example:

from sklearnex import patch_sklearn
patch_sklearn()

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC

X = np.array(['hello world there', 'hello planet earth'])
y = np.array([1, 0])

test_1 = ['earth']
test_2 = ['earth', 'earth']

svm = SVC(probability=True)
vectorizer = TfidfVectorizer()

svm.fit(vectorizer.fit_transform(X), y)

print(svm.predict_proba(vectorizer.transform(test_1)))
print(svm.predict_proba(vectorizer.transform(test_2)))

Expected behavior
The output gives different results for the probabilities of a sample predicted alone versus predicted as batch. This is not correct, instead, the probabilities should be the same. The problem vanishes if sklearnex is disabled.

Output/Screenshots
This is the output of the minimal example:

[[0.5 0.5]]
[[0.52834581 0.47165419]
 [0.52834581 0.47165419]]

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions