Skip to content

IndexError in infer_column_type when column contains only null values #1764

@yiannimercer

Description

@yiannimercer

IndexError in infer_column_type when column contains only null values

Description

When creating a Dataset from a pandas DataFrame that contains a column with all null/NaN values, Evidently raises an IndexError: single positional indexer is out-of-bounds. This occurs because infer_column_type attempts to access .iloc[0] on an empty series after dropping NA values.

Steps to Reproduce

import pandas as pd
from evidently import Dataset

# Create DataFrame with an all-null column
df = pd.DataFrame({
    "valid_column": [1, 2, 3, 4, 5],
    "all_null_column": [None, None, None, None, None]
})

# This raises IndexError
dataset = Dataset.from_pandas(df)

Expected Behavior

Evidently should gracefully handle columns that are entirely null, either by:

  • Skipping them during type inference
  • Assigning a default column type
  • Raising a descriptive error message indicating which column is problematic

Actual Behavior

IndexError: single positional indexer is out-of-bounds

Full traceback points to evidently/core/datasets.py line 653:

def infer_column_type(column_data):
    # ...
    if column_data.dtype.name == "object":
        without_na = column_data.dropna()
        if isinstance(without_na.iloc[0], str) and isinstance(without_na.iloc[-1], str):  # <- crashes here

Root Cause

In infer_column_type, when the column dtype is object, the function drops NA values and then immediately tries to access iloc[0] and iloc[-1] without checking if the resulting series is empty.

Suggested Fix

Add an empty check before indexing:

if column_data.dtype.name == "object":
    without_na = column_data.dropna()
    if len(without_na) == 0:
        return ColumnType.Unknown  # or handle appropriately
    if isinstance(without_na.iloc[0], str) and isinstance(without_na.iloc[-1], str):
        # ...

Workaround

For anyone encountering this issue, you can filter out all-null columns before passing to Evidently:

df_clean = df.dropna(axis=1, how='all')
dataset = Dataset.from_pandas(df_clean)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions