Skip to content

BUG Using as_sklearn can lead to a broken scikit-learn model #6533

Open
@betatim

Description

@betatim

When using the as_sklearn and from_sklearn functionality to convert a cuml estimator to the equivalent scikit-learn estimator the handling of default values is inconsistent.

The following works:

import cuml
from sklearn import linear_model
from sklearn.datasets import make_classification

X, y = make_classification(n_features=32, random_state=42)

lr = cuml.LogisticRegression()
lr.fit(X, y)

lr_from_cuml = lr.as_sklearn()
print(lr_from_cuml.solver)  # set to 'lbfgs', scikit-learn default

However, explicitly setting solver leads to a broken scikti-learn model

lr = cuml.LogisticRegression(solver="qn")
lr.fit(X, y)

lr_from_cuml = lr.as_sklearn()
print(lr_from_cuml.solver)  # set to 'qn', not supported in scikit-learn

There are two problems here, one that doing the same thing (not passing a value and passing the default value) leads to different results and two that the user can end up with a broken scikit-learn model.

Part of the cause for this is a subtle inconsistency between how as_sklearn and from_sklearn work. The former uses only the explicitly passed constructor arguments when creating the scikit-learn instance. The latter uses get_params to get all the parameters. We should use get_params/all constructor arguments irrespective of whether the user passed them or not in both cases.

Doing this means we need to use our translation machinery "in reverse" or create a new set of translations. The machinery we have right now for cuml.accel works for scikit-learn -> cuml only.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions