Skip to content

Model training is incompatible with sklearn >=1.7: " AttributeError: '<model_class>' object has no attribute '_validate_data' " #284

@pedrobslima

Description

@pedrobslima

I tried running a code similar to example_heterogeneous.py with only the META-DES model:

import numpy as np
from sklearn.calibration import CalibratedClassifierCV
# Importing dataset and preprocessing routines
from sklearn.datasets import fetch_openml
# Base classifier models:
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

from deslib.des import METADES

rng = np.random.RandomState(42)
data = fetch_openml(name='phoneme', cache=False, as_frame=False)
X = data.data
y = data.target

# split the data into training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
                                                    random_state=rng)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Split the data into training and DSEL for DS techniques
X_train, X_dsel, y_train, y_dsel = train_test_split(X_train, y_train,
                                                    test_size=0.5,
                                                    random_state=rng)

model_perceptron = CalibratedClassifierCV(Perceptron(max_iter=100,
                                                     random_state=rng),
                                          cv=3)

model_perceptron.fit(X_train, y_train)
model_svc = SVC(probability=True, gamma='auto',
                random_state=rng).fit(X_train, y_train)
model_bayes = GaussianNB().fit(X_train, y_train)
model_tree = DecisionTreeClassifier(random_state=rng,
                                    max_depth=10).fit(X_train, y_train)
model_knn = KNeighborsClassifier(n_neighbors=7).fit(X_train, y_train)

pool_classifiers = [model_perceptron,
                    model_svc,
                    model_bayes,
                    model_tree,
                    model_knn]


# Initializing the techniques
metades = METADES(pool_classifiers).fit(X_dsel, y_dsel)

But got the following error:
AttributeError: 'METADES' object has no attribute '_validate_data'

Later I tried running the same code on Google Colab and it worked, but I received this warning:

/usr/local/lib/python3.12/dist-packages/sklearn/base.py:474: FutureWarning: `BaseEstimator._validate_data` is deprecated in 1.6 and will be removed in 1.7. Use `sklearn.utils.validation.validate_data` instead. This function becomes public and is part of the scikit-learn developer API.

I assume this is a compatibility issue with version 1.7 of scikit-learn. I also tried the other models on the example, but the same thing happened, on my local environment and on Google Colab.


  • Local environment info:

    • Linux-6.17.4-76061704-generic-x86_64-with-glibc2.35
    • Python 3.10.12 (main, Nov 4 2025, 08:48:33) [GCC 11.4.0]
    • NumPy 2.2.6
    • SciPy 1.15.3
    • Scikit-Learn 1.7.2
  • Google Colab environment info:

    • Linux-6.6.105+-x86_64-with-glibc2.35
    • Python 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
    • NumPy 2.0.2
    • SciPy 1.16.3
    • Scikit-Learn 1.6.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions