-
-
Notifications
You must be signed in to change notification settings - Fork 370
Description
Describe the bug
When validating a tz‑aware MultiIndex level, Pandera’s full‑materialization path converts the level via .values, which drops timezone info. This leads to a dtype mismatch error: expected datetime64[ns, ] but got
datetime64[ns], even though the MultiIndex level is tz‑aware.
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera. (0.28.1)
- (optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide (https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# /// script
# dependencies = [
# "pandas>=2.0",
# "pandera==0.28.1",
# ]
# ///
from __future__ import annotations
from zoneinfo import ZoneInfo
import pandas as pd
import pandera.pandas as pa
from pandera.engines import pandas_engine
def build_data() -> pd.DataFrame:
tz = ZoneInfo("America/New_York")
idx = pd.MultiIndex.from_arrays(
[
pd.DatetimeIndex(
["2024-01-01 00:00", "2024-01-01 01:00"], tz=tz, name="LEVEL_ONE"
),
pd.Index(["A", "B"], name="LEVEL_TWO"),
]
)
return pd.DataFrame({"value": [1, 2]}, index=idx)
def build_schema(tz: ZoneInfo) -> pa.DataFrameSchema:
return pa.DataFrameSchema(
columns={"value": pa.Column(int)},
index=pa.MultiIndex(
[
pa.Index(
pandas_engine.DateTime(tz=tz),
name="LEVEL_ONE",
# Force full-materialization path where Pandera uses .values.
checks=pa.Check(lambda s: s.notna()),
),
pa.Index(pa.String, name="LEVEL_TWO"),
]
),
)
def main() -> None:
df = build_data()
schema = build_schema(ZoneInfo("America/New_York"))
schema.validate(df)
if __name__ == "__main__":
main()Expected behavior
Validation should pass, since the MultiIndex level is tz‑aware and matches the schema’s tz. It should not drop timezone info when validating.
Desktop (please complete the following information):
- OS: Linux (ubuntu/debian family)
- Browser: N/A
- Version: N/A
Screenshots
N/A
Additional context
The failure occurs only when a non‑“unique‑only” check is present, which forces the full‑materialization path. That path calls .values on the tz‑aware level (multiindex.get_level_values(...)), stripping the timezone and
causing a dtype mismatch.