-
Notifications
You must be signed in to change notification settings - Fork 799
Open
Description
IndexError in infer_column_type when column contains only null values
Description
When creating a Dataset from a pandas DataFrame that contains a column with all null/NaN values, Evidently raises an IndexError: single positional indexer is out-of-bounds. This occurs because infer_column_type attempts to access .iloc[0] on an empty series after dropping NA values.
Steps to Reproduce
import pandas as pd
from evidently import Dataset
# Create DataFrame with an all-null column
df = pd.DataFrame({
"valid_column": [1, 2, 3, 4, 5],
"all_null_column": [None, None, None, None, None]
})
# This raises IndexError
dataset = Dataset.from_pandas(df)Expected Behavior
Evidently should gracefully handle columns that are entirely null, either by:
- Skipping them during type inference
- Assigning a default column type
- Raising a descriptive error message indicating which column is problematic
Actual Behavior
IndexError: single positional indexer is out-of-bounds
Full traceback points to evidently/core/datasets.py line 653:
def infer_column_type(column_data):
# ...
if column_data.dtype.name == "object":
without_na = column_data.dropna()
if isinstance(without_na.iloc[0], str) and isinstance(without_na.iloc[-1], str): # <- crashes hereRoot Cause
In infer_column_type, when the column dtype is object, the function drops NA values and then immediately tries to access iloc[0] and iloc[-1] without checking if the resulting series is empty.
Suggested Fix
Add an empty check before indexing:
if column_data.dtype.name == "object":
without_na = column_data.dropna()
if len(without_na) == 0:
return ColumnType.Unknown # or handle appropriately
if isinstance(without_na.iloc[0], str) and isinstance(without_na.iloc[-1], str):
# ...Workaround
For anyone encountering this issue, you can filter out all-null columns before passing to Evidently:
df_clean = df.dropna(axis=1, how='all')
dataset = Dataset.from_pandas(df_clean)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels