-
-
Notifications
You must be signed in to change notification settings - Fork 370
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
A clear and concise description of what the bug is.
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera.
- (optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandera.pyspark as pa
import pyspark.sql.types as T
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql import DataFrame
from pandera.pyspark import DataFrameModel
spark = SparkSession.builder.getOrCreate()
class PanderaSchema(DataFrameModel):
id: T.IntegerType() = pa.Field(gt=5)
product_name: T.StringType() = pa.Field(str_matches="^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$")
price: T.DecimalType(20, 5) = pa.Field()
data = [
(5, "fgsdgmaill.com", Decimal(44.4)),
(15, "[email protected]", Decimal(99.0)),
]
spark_schema = T.StructType(
[
T.StructField("id", T.IntegerType(), False),
T.StructField("product_name", T.StringType(), False),
T.StructField("price", T.DecimalType(20, 5), False),
]
)
df = spark.createDataFrame(data, spark_schema)
df.show()
df_out = PanderaSchema.validate(check_obj=df)
df_out
import json
df_out_errors = df_out.pandera.errors
print(json.dumps(dict(df_out_errors), indent=4))Expected behavior
Expected to give the error report for pyspark and show that i got a error when verifying the email string
Desktop (please complete the following information):
- OS: MacOS
- Browser: chrome
- Version: 142.0.7444.176 (Official Build) (arm64)
Screenshots
"CHECK_ERROR": [
{
"schema": "PanderaSchema",
"column": "product_name",
"check": "str_matches('^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')",
"error": "Error while executing check function: KeyError(\"<class 'pandera.api.pyspark.types.PysparkDataframeColumnObject'>\")Traceback (most recent call last): File \"/usr/local/lib/python3.11/site-packages/pandera/backends/pyspark/components.py\", line 132, in run_checks self.run_check( File \"/usr/local/lib/python3.11/site-packages/pandera/backends/pyspark/base.py\", line 82, in run_check check_result = check(check_obj, *args) ^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/local/lib/python3.11/site-packages/pandera/api/checks.py\", line 237, in __call__ return backend(check_obj, column) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/local/lib/python3.11/site-packages/pandera/backends/pyspark/checks.py\", line 92, in __call__ check_output = self.apply(check_obj, key, self.check._check_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/local/lib/python3.11/site-packages/pandera/backends/pyspark/checks.py\", line 67, in apply return self.check._check_fn(check_obj_and_col_name, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/local/lib/python3.11/site-packages/pandera/api/function_dispatch.py\", line 24, in __call__ fn = self._function_registry[input_data_type] ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^KeyError: <class 'pandera.api.pyspark.types.PysparkDataframeColumnObject'>"
}
]
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working