Skip to content

Unable to pass sample weights properly to MAPIE Underlying classifier and regressor #798

@VishnoiAman777

Description

@VishnoiAman777

Describe the bug
Unable to use Sklearn pipeline estimators with SplitConfomalRegressor and SplitConformalClassifier. Sample weights are not passed to appropriately pipeline estimator in case of fit in case of CrossConformalRegressor and CrossConfomalClassifier.

To Reproduce
Steps to reproduce the behavior:

import numpy as np
from numpy.typing import NDArray
from sklearn.neural_network import MLPRegressor
from mapie.metrics.regression import regression_coverage_score
from mapie.regression import SplitConformalRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from mapie.utils import train_conformalize_test_split

RANDOM_STATE = 42

X, y = make_regression(
    n_samples=1000, n_features=10, noise=20, random_state=RANDOM_STATE
)
(X_train, X_conformalize, X_test,
 y_train, y_conformalize, y_test) = train_conformalize_test_split(
    X, y,
    train_size=0.8, conformalize_size=0.1, test_size=0.1,
    random_state=RANDOM_STATE
)
sample_weight_train = np.random.rand(len(X_train))

pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=2)),
    ('linear_regression', LinearRegression())
])

confidence_level = 0.95
mapie_regressor = SplitConformalRegressor(
    estimator=pipeline, confidence_level=confidence_level, prefit=False
)
mapie_regressor.fit(X_train, y_train, {"sample_weight": sample_weight_train})

Error:

ValueError: Pipeline.fit does not accept the sample_weight parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. `Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight)`.

Further in case of CrossConformalRegressor and CrossConformalClassifer sample_weights are completely ignored because EnsembleRegressor and EnsembleClassifier fit's estimator using this function present in utils.py

def _fit_estimator(
    estimator: Union[RegressorMixin, ClassifierMixin],
    X: ArrayLike,
    y: ArrayLike,
    sample_weight: Optional[NDArray] = None,
    **fit_params,
) -> Union[RegressorMixin, ClassifierMixin]:
    fit_parameters = signature(estimator.fit).parameters
    supports_sw = "sample_weight" in fit_parameters
    if supports_sw and sample_weight is not None:
        estimator.fit(X, y, sample_weight=sample_weight, **fit_params)
    else:
        estimator.fit(X, y, **fit_params)
    return estimator

Since pipeline object don't have sample_weight as estimator.fit parameters, so support_sw will always be None in case of pipeline estimator.

Expected behavior
I should be able to pass sample_weights in case of prefit=False case to SplitConformalRegressor, and SplitConfromalClassifer. For CrossConformalRegressor and CrossConformalClassifer I should pass sample_weight to each of the estimator that could use it. Also we should implement test cases for the above classes.

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Chrome
  • MAPIE Version: v1.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BacklogThis has a good chance to be implemented at some point.Contributors welcome 👋🏻Especially relevant issue/PR for contributors to work on.Other or internalIf no other grey tag is relevant or if issue from the MAPIE team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions