Skip to content

Commit 0b1a5d8

Browse files
Merge pull request #22 from wesselhuising/xrn-pandas-plugin
Pandas plugin
2 parents f0d7494 + 5ba7601 commit 0b1a5d8

8 files changed

Lines changed: 1219 additions & 1057 deletions

File tree

.pre-commit-config.yaml

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -53,19 +53,8 @@ repos:
5353
entry: isort
5454
require_serial: true
5555
language: system
56-
- id: pylint
57-
name: pylint
58-
entry: pylint
56+
- id: safety
57+
name: check for vulnerable dependencies with Safety
58+
entry: safety
5959
language: system
60-
types: [python]
61-
args:
62-
[
63-
"-rn", # Only display messages
64-
"-sn", # Don't display the score
65-
"src", # Only source code, skip tests folder
66-
]
67-
# - id: safety
68-
# name: check for vulnerable dependencies with Safety
69-
# entry: safety
70-
# language: system
71-
# args: [./pyproject.toml]
60+
args: [./pyproject.toml]

README.md

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ First, install `pandantic` by using pip (or any other package managing tool).
66

77
```pip install pandantic```
88

9+
## Docs
10+
11+
Documentation can be found [here](https://pandantic-rtd.readthedocs.io/en/latest/)
12+
913
## parse_df
1014

1115
To validate `pd.DataFrame`s using Pydantic `BaseModel`s make sure to import the `BaseModel` class from the `pandantic` package.
@@ -14,11 +18,11 @@ To validate `pd.DataFrame`s using Pydantic `BaseModel`s make sure to import the
1418

1519
The `pandantic.BaseModel` subclasses the original `pydantic.BaseModel` which means the `pandantic.BaseModel` includes all functionality from the original `pydantic.BaseModel` but it adds the `parse_df` class method which should be used to parse DataFrames.
1620

17-
## A quick example
21+
### A quick example
1822

1923
Enough of the talking, lets just make things easier by showing a very minor but quick example. Make sure to import the `BaseModel` class from `pandantic` and create a schema like we normally would when using `pydantic`.
2024

21-
```
25+
```python
2226
from pydantic.types import StrictInt
2327

2428
from pandantic import BaseModel
@@ -33,7 +37,7 @@ class DataFrameSchema(BaseModel):
3337

3438
Let's try this schema on a simple `pandas.DataFrame`. Use the class method `parse_df` from the freshly defined `DataFrameSchema` and specify the `dataframe` that should be validated using the arguments of the method. In this example, we want to `filter` out the bad records (there are more options like the good old `raise` to raise a ValueError after validating the whole DataFrame). In this case, only the second record would be kept in the returned DataFrame.
3539

36-
```
40+
```python
3741
df_invalid = pd.DataFrame(
3842
data={
3943
"example_str": ["foo", "bar", 1],
@@ -46,11 +50,38 @@ df_filtered = DataFrameSchema.parse_df(
4650
errors="filter",
4751
)
4852
```
49-
### Custom validators
5053

51-
One of the great features of Pydantic is the ability to create custom validators. Luckily, those custom validators will also work when parsing DataFrames using `pandantic`. Make sure to import the original decorator from the `pydantic` package and keep in mind that `pandantic` is using the V2 of Pydantic (so `field_validation` it is). In the example below the `BaseModel` will validate the `example_int` field and makes sure it is an even number.
54+
## Pandas plugin
55+
56+
Another way to use `pandantic` is via our [`pandas.DataFrame` extension](https://pandas.pydata.org/docs/development/extending.html) plugin. This adds the following methods to `pandas` (once "registered" by `import pandantic.plugins.pandas`):
57+
* `DataFrame.pydantic.validate(schema:PandanticBaseModel)`, which returns a boolean for all valid inputs.
58+
* `DataFrame.pydantic.filter(schema:PandanticBaseModel)`, which wraps `PandanticBaseModel.parse_obj(errors="filter")` and returns as dataframe.
59+
60+
**Example:**
61+
```python
62+
from pandantic import BaseModel
63+
import pandantic.plugins.pandas
64+
65+
df1: pd.DataFrame = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]})
66+
class MyModel(BaseModel):
67+
a: int
68+
b: str
5269

70+
df1.pydantic.validate(MyModel) # returns True
71+
df1.pydantic.filter(MyModel) # returns the same dataframe
72+
73+
# but if we have a mixed DataFrame
74+
df2: pd.DataFrame = pd.DataFrame({"a": [1, 2, "3"], "b": ["a", 3, "c"]})
75+
76+
df2.pydantic.validate(MyModel) # returns False
77+
df2.pydantic.filter(MyModel) # returns the filtered DataFrame with only the first row
5378
```
79+
80+
## Custom validator example
81+
82+
One of the great features of Pydantic is the ability to create custom validators. Luckily, those custom validators will also work when parsing DataFrames using `pandantic`. Make sure to import the original decorator from the `pydantic` package and keep in mind that `pandantic` is using the V2 of Pydantic (so `field_validation` it is). In the example below the `BaseModel` will validate the `example_int` field and makes sure it is an even number.
83+
84+
```python
5485
from pydantic import ValidationError, field_validator
5586

5687

@@ -72,7 +103,7 @@ class DataFrameSchema(BaseModel):
72103

73104
By setting the `errors` argument to `raise`, the code will raise an ValueError after validating every row as the first row contains an uneven number.
74105

75-
```
106+
```python
76107
example_df_invalid = pd.DataFrame(
77108
data={
78109
"example_str": ["foo", "bar", "baz"],
@@ -90,7 +121,7 @@ df_raised_error = DataFrameSchema.parse_df(
90121
### Optional
91122
As the DataFrame is being parsed into a dict, a `None` value is considered as a `nan` value in cases there are different values in the dict. Therefore, specifying `Optional` columns (where the value can be empty) can be speciyfied by using the custom `pandantic.Optional` type. This type is a replacement for `typing.Optional`.
92123

93-
```
124+
```python
94125
from pandantic import BaseModel, Optional
95126

96127
class Model(BaseModel):
@@ -101,6 +132,3 @@ df_example = pd.DataFrame({"a": [1, None, 2], "b": ["str", 2, 3]})
101132

102133
df_filtered = Model.parse_df(df_example, errors="filter", verbose=True)
103134
```
104-
105-
## Docs
106-
Documentation can be found [here](https://pandantic-rtd.readthedocs.io/en/latest/)

0 commit comments

Comments
 (0)