Skip to content

Data Definition Object Useless or Not Working #1714

@yiannimercer

Description

@yiannimercer

Overview of Issue

The DataSummaryPreset is calculating metrics on columns not defined in the data definition, but just happen to exist on the dataset.

Sample

    data_definition = DataDefinition(
        numerical_columns=numeric_features,  # list
        categorical_columns=categorical_features, # list
        regression=regression,  # [Regression(name='default', target='acquisition_price', prediction='predicted')]
    )


# Convert datasets to evidently objects
    reference_evidently = Dataset.from_pandas(
        reference,  # pd.DataFrame
        data_definition=data_definition,  # DataDefinition object
    )
    current_evidently = Dataset.from_pandas(
        current,   # pd.DataFrame
        data_definition=data_definition,  # DataDefinition object
    )

# List of reports
report_list = [
   DataDriftPreset(columns=model_features_list),
   DataDriftPreset(columns=target_predictions_cols),
   DataSummaryPreset(columns=model_features + target_prediction_cols),
   RegressionPreset()
]

# Report 
report = Report(report_list, include_tests=True)
executed_report = report.run(current_data=current_evidently, reference_data=reference_evidently) 

Issue

The DatasetSummaryPreset() is calculating summary metrics and tests are failing on columns that are not defined in my data definition. If this is because the dataset has these extra columns that are not defined in model_features or target_prediction_cols, I ask what is the point of the DataDefintion object? I am hoping I am just doing something wrong and it's not using it. Otherwise, for the DataDriftPreset, I already need to define the columns in the Report object, so I don't see the necessity of the DataDefinition object there either,.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions