Description
When using the as_sklearn
and from_sklearn
functionality to convert a cuml estimator to the equivalent scikit-learn estimator the handling of default values is inconsistent.
The following works:
import cuml
from sklearn import linear_model
from sklearn.datasets import make_classification
X, y = make_classification(n_features=32, random_state=42)
lr = cuml.LogisticRegression()
lr.fit(X, y)
lr_from_cuml = lr.as_sklearn()
print(lr_from_cuml.solver) # set to 'lbfgs', scikit-learn default
However, explicitly setting solver
leads to a broken scikti-learn model
lr = cuml.LogisticRegression(solver="qn")
lr.fit(X, y)
lr_from_cuml = lr.as_sklearn()
print(lr_from_cuml.solver) # set to 'qn', not supported in scikit-learn
There are two problems here, one that doing the same thing (not passing a value and passing the default value) leads to different results and two that the user can end up with a broken scikit-learn model.
Part of the cause for this is a subtle inconsistency between how as_sklearn
and from_sklearn
work. The former uses only the explicitly passed constructor arguments when creating the scikit-learn instance. The latter uses get_params
to get all the parameters. We should use get_params
/all constructor arguments irrespective of whether the user passed them or not in both cases.
Doing this means we need to use our translation machinery "in reverse" or create a new set of translations. The machinery we have right now for cuml.accel
works for scikit-learn -> cuml only.