-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Right now, when testing persistence of classifiers, we create a binary classification task, and the classifiers all pass. However, when switching to a multiclass classification task, LogisticRegression and related estimators fail (e.g. CalibratedClassifierCV which uses lr under the hood by default).
To reproduce, replace the following lines:
skops/skops/io/tests/test_persist.py
Lines 421 to 423 in 2c2dd6e
| X, y = make_classification( | |
| n_samples=N_SAMPLES, n_features=N_FEATURES, random_state=0 | |
| ) |
by these lines:
X, y = make_classification(
n_samples=N_SAMPLES, n_features=N_FEATURES, random_state=0, n_classes=3, n_redundant=1, n_informative=N_FEATURES - 1,
)(note that n_redundant and n_informative are irrelevant here, they just need to be changed for make_classification to work)
The error is that the contiguity of the coef_ attributes is not the same. Strangely enough, it is the original estimator that seems to be "wrong":
>>> estimator.coef_.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
>>> loaded.coef_.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : FalseI haven't investigated further, but I suspect that lr uses a different algorithm under the hood when dealing with binary classification, which is why it only occurs in the multiclass setting. EDIT: See below, that's not the reason.