Skip to content

Add support for returning only returning x in PipelineElement's transform api #104

@Paul-B98

Description

@Paul-B98

I like to propose a bahivour where only not falsy elements are returned. The reason for this is that some transformers like the ColumnTransformer break when they get more elements than x, as in the case of the PipelineElement. This would also be more compatible with Sklearn's preprocessing transformers, which in most cases also only run X.

def transform(self, X: np.ndarray, y: np.ndarray = None, **kwargs) -> (np.ndarray, np.ndarray, dict):
"""
Calls transform on the base element.
In case there is no transform method, calls predict.
This is used if we are using an estimator as a preprocessing step.
Parameters:
X:
The array-like data with shape=[N, D], where N is the
number of samples and D is the number of features.
y:
The truth array-like values with shape=[N], where N is
the number of samples.
**kwargs:
Keyword arguments, passed to base_element.transform.
Returns:
(X, y) in transformed version and original kwargs.
"""
if self.batch_size == 0:
Xt, yt, kwargs = self.__transform(X, y, **kwargs)
else:
Xt, yt, kwargs = self.__batch_transform(X, y, **kwargs)
if all(hasattr(data, "shape") for data in [X, Xt]) and all(len(data.shape) > 1 for data in [X, Xt]):
self.reduce_dimension = (Xt.shape[1] < X.shape[1])
return Xt, yt, kwargs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions