Skip to content

Persistence tests fail for LogisticRegression et al. with multiclass classification #233

@BenjaminBossan

Description

@BenjaminBossan

Right now, when testing persistence of classifiers, we create a binary classification task, and the classifiers all pass. However, when switching to a multiclass classification task, LogisticRegression and related estimators fail (e.g. CalibratedClassifierCV which uses lr under the hood by default).

To reproduce, replace the following lines:

X, y = make_classification(
n_samples=N_SAMPLES, n_features=N_FEATURES, random_state=0
)

by these lines:

        X, y = make_classification(
            n_samples=N_SAMPLES, n_features=N_FEATURES, random_state=0, n_classes=3, n_redundant=1, n_informative=N_FEATURES - 1,
        )

(note that n_redundant and n_informative are irrelevant here, they just need to be changed for make_classification to work)

The error is that the contiguity of the coef_ attributes is not the same. Strangely enough, it is the original estimator that seems to be "wrong":

>>> estimator.coef_.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

>>> loaded.coef_.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

I haven't investigated further, but I suspect that lr uses a different algorithm under the hood when dealing with binary classification, which is why it only occurs in the multiclass setting. EDIT: See below, that's not the reason.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpersistenceSecure persistence feature

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions