-
-
Notifications
You must be signed in to change notification settings - Fork 370
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
When using add_missing_columns = True with a pa.Date field that has nullable=True and a default value of pd.NaT, Pandera raises a SchemaError claiming the series doesn't have type date, even though the field is properly configured to accept null values.
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera.
- (optional) I have confirmed this bug exists on the main branch of pandera.
Code Sample:
import pandas as pd
import pandera as pa
class BaseValidationSchema(pa.DataFrameModel):
class Config:
add_missing_columns = True
coerce = True
strict = 'filter'
index: int = pa.Field(ge=0)
confirmation_date: pa.Date = pa.Field(nullable=True, coerce=True, default=pd.NaT)
data = [
{'index': 1},
{'index': 1}
]
# This raises SchemaError
validated_df = BaseValidationSchema.validate(pd.DataFrame(data))Error Output
pandera.errors.SchemaError: expected series 'confirmation_date' to have type date:
failure cases:
index failure_case
0 0 NaT
1 1 NaT
Expected behavior
The validation should pass successfully. When add_missing_columns = True is set and a column is missing from the input DataFrame, Pandera should add the column with the specified default value (pd.NaT) and recognize it as a valid null value for the pa.Date type since nullable=True is specified.
The expected output would be a DataFrame with both columns:
children: int64 with values [1, 1]confirmation_date: datetime64[ns] (or appropriate date dtype) with NaT values
Desktop:
- OS: Ubuntu
- Browser: /
- Version: /
Additional context
I am aware of using datetime and then after converting to date but in my context this is not a viable option.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working