You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+38-10Lines changed: 38 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,10 @@ First, install `pandantic` by using pip (or any other package managing tool).
6
6
7
7
```pip install pandantic```
8
8
9
+
## Docs
10
+
11
+
Documentation can be found [here](https://pandantic-rtd.readthedocs.io/en/latest/)
12
+
9
13
## parse_df
10
14
11
15
To validate `pd.DataFrame`s using Pydantic `BaseModel`s make sure to import the `BaseModel` class from the `pandantic` package.
@@ -14,11 +18,11 @@ To validate `pd.DataFrame`s using Pydantic `BaseModel`s make sure to import the
14
18
15
19
The `pandantic.BaseModel` subclasses the original `pydantic.BaseModel` which means the `pandantic.BaseModel` includes all functionality from the original `pydantic.BaseModel` but it adds the `parse_df` class method which should be used to parse DataFrames.
16
20
17
-
## A quick example
21
+
###A quick example
18
22
19
23
Enough of the talking, lets just make things easier by showing a very minor but quick example. Make sure to import the `BaseModel` class from `pandantic` and create a schema like we normally would when using `pydantic`.
20
24
21
-
```
25
+
```python
22
26
from pydantic.types import StrictInt
23
27
24
28
from pandantic import BaseModel
@@ -33,7 +37,7 @@ class DataFrameSchema(BaseModel):
33
37
34
38
Let's try this schema on a simple `pandas.DataFrame`. Use the class method `parse_df` from the freshly defined `DataFrameSchema` and specify the `dataframe` that should be validated using the arguments of the method. In this example, we want to `filter` out the bad records (there are more options like the good old `raise` to raise a ValueError after validating the whole DataFrame). In this case, only the second record would be kept in the returned DataFrame.
One of the great features of Pydantic is the ability to create custom validators. Luckily, those custom validators will also work when parsing DataFrames using `pandantic`. Make sure to import the original decorator from the `pydantic` package and keep in mind that `pandantic` is using the V2 of Pydantic (so `field_validation` it is). In the example below the `BaseModel` will validate the `example_int` field and makes sure it is an even number.
54
+
## Pandas plugin
55
+
56
+
Another way to use `pandantic` is via our [`pandas.DataFrame` extension](https://pandas.pydata.org/docs/development/extending.html) plugin. This adds the following methods to `pandas` (once "registered" by `import pandantic.plugins.pandas`):
57
+
*`DataFrame.pydantic.validate(schema:PandanticBaseModel)`, which returns a boolean for all valid inputs.
58
+
*`DataFrame.pydantic.filter(schema:PandanticBaseModel)`, which wraps `PandanticBaseModel.parse_obj(errors="filter")` and returns as dataframe.
df2.pydantic.filter(MyModel) # returns the filtered DataFrame with only the first row
53
78
```
79
+
80
+
## Custom validator example
81
+
82
+
One of the great features of Pydantic is the ability to create custom validators. Luckily, those custom validators will also work when parsing DataFrames using `pandantic`. Make sure to import the original decorator from the `pydantic` package and keep in mind that `pandantic` is using the V2 of Pydantic (so `field_validation` it is). In the example below the `BaseModel` will validate the `example_int` field and makes sure it is an even number.
83
+
84
+
```python
54
85
from pydantic import ValidationError, field_validator
55
86
56
87
@@ -72,7 +103,7 @@ class DataFrameSchema(BaseModel):
72
103
73
104
By setting the `errors` argument to `raise`, the code will raise an ValueError after validating every row as the first row contains an uneven number.
As the DataFrame is being parsed into a dict, a `None` value is considered as a `nan` value in cases there are different values in the dict. Therefore, specifying `Optional` columns (where the value can be empty) can be speciyfied by using the custom `pandantic.Optional` type. This type is a replacement for `typing.Optional`.
0 commit comments