Skip to content

Pandera MultiIndex tz-aware level loses timezone during validation when full materialization path is used #2205

@Owen-OptiGrid

Description

@Owen-OptiGrid

Describe the bug
When validating a tz‑aware MultiIndex level, Pandera’s full‑materialization path converts the level via .values, which drops timezone info. This leads to a dtype mismatch error: expected datetime64[ns, ] but got
datetime64[ns], even though the MultiIndex level is tz‑aware.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera. (0.28.1)
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide (https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# /// script
# dependencies = [
#   "pandas>=2.0",
#   "pandera==0.28.1",
# ]
# ///

from __future__ import annotations

from zoneinfo import ZoneInfo

import pandas as pd
import pandera.pandas as pa
from pandera.engines import pandas_engine


def build_data() -> pd.DataFrame:
    tz = ZoneInfo("America/New_York")
    idx = pd.MultiIndex.from_arrays(
        [
            pd.DatetimeIndex(
                ["2024-01-01 00:00", "2024-01-01 01:00"], tz=tz, name="LEVEL_ONE"
            ),
            pd.Index(["A", "B"], name="LEVEL_TWO"),
        ]
    )
    return pd.DataFrame({"value": [1, 2]}, index=idx)


def build_schema(tz: ZoneInfo) -> pa.DataFrameSchema:
    return pa.DataFrameSchema(
        columns={"value": pa.Column(int)},
        index=pa.MultiIndex(
            [
                pa.Index(
                    pandas_engine.DateTime(tz=tz),
                    name="LEVEL_ONE",
                    # Force full-materialization path where Pandera uses .values.
                    checks=pa.Check(lambda s: s.notna()),
                ),
                pa.Index(pa.String, name="LEVEL_TWO"),
            ]
        ),
    )


def main() -> None:
    df = build_data()
    schema = build_schema(ZoneInfo("America/New_York"))
    schema.validate(df)


if __name__ == "__main__":
    main()

Expected behavior

Validation should pass, since the MultiIndex level is tz‑aware and matches the schema’s tz. It should not drop timezone info when validating.

Desktop (please complete the following information):

  • OS: Linux (ubuntu/debian family)
  • Browser: N/A
  • Version: N/A

Screenshots

N/A

Additional context

The failure occurs only when a non‑“unique‑only” check is present, which forces the full‑materialization path. That path calls .values on the tz‑aware level (multiindex.get_level_values(...)), stripping the timezone and
causing a dtype mismatch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions