-
Notifications
You must be signed in to change notification settings - Fork 799
Open
Description
Overview of Issue
The DataSummaryPreset is calculating metrics on columns not defined in the data definition, but just happen to exist on the dataset.
Sample
data_definition = DataDefinition(
numerical_columns=numeric_features, # list
categorical_columns=categorical_features, # list
regression=regression, # [Regression(name='default', target='acquisition_price', prediction='predicted')]
)
# Convert datasets to evidently objects
reference_evidently = Dataset.from_pandas(
reference, # pd.DataFrame
data_definition=data_definition, # DataDefinition object
)
current_evidently = Dataset.from_pandas(
current, # pd.DataFrame
data_definition=data_definition, # DataDefinition object
)
# List of reports
report_list = [
DataDriftPreset(columns=model_features_list),
DataDriftPreset(columns=target_predictions_cols),
DataSummaryPreset(columns=model_features + target_prediction_cols),
RegressionPreset()
]
# Report
report = Report(report_list, include_tests=True)
executed_report = report.run(current_data=current_evidently, reference_data=reference_evidently) Issue
The DatasetSummaryPreset() is calculating summary metrics and tests are failing on columns that are not defined in my data definition. If this is because the dataset has these extra columns that are not defined in model_features or target_prediction_cols, I ask what is the point of the DataDefintion object? I am hoping I am just doing something wrong and it's not using it. Otherwise, for the DataDriftPreset, I already need to define the columns in the Report object, so I don't see the necessity of the DataDefinition object there either,.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels