Skip to content

Commit a39a391

Browse files
committed
Add Parquet demo
1 parent 11bc196 commit a39a391

File tree

4 files changed

+70
-0
lines changed

4 files changed

+70
-0
lines changed

data_raw/x-01-parquet.qmd

+6
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ html-table-processing: none
44
---
55

66
```{python}
7+
import pointblank as pb
78
import polars as pl
89
910
tbl_xyz = pl.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7], "z": [8, 8, 8, 8]})
@@ -17,6 +18,7 @@ tbl_dates_times_text = pl.DataFrame(
1718
"text": [None, "5-egh-163", "8-kdg-938"],
1819
}
1920
)
21+
tbl_game_revenue = pb.load_dataset(dataset="game_revenue", tbl_type="polars")
2022
```
2123

2224

@@ -32,3 +34,7 @@ tbl_xyz_missing.write_parquet("tbl_xyz_missing.parquet")
3234
```{python}
3335
tbl_dates_times_text.write_parquet("tbl_dates_times_text.parquet")
3436
```
37+
38+
```{python}
39+
tbl_game_revenue.write_parquet("game_revenue.parquet")
40+
```

docs/demos/data/game_revenue.parquet

39.3 KB
Binary file not shown.

docs/demos/index.qmd

+3
Original file line numberDiff line numberDiff line change
@@ -142,4 +142,7 @@ Use column selector functions in the `columns=` argument to conveniently choose
142142
[Check the Schema of a Table](./schema-check/index.qmd)<br>
143143
The schema of a table can be flexibly defined with `Schema` and verified with `col_schema_match()`.
144144

145+
[Using Parquet Data](./using-parquet-data/index.qmd)<br>
146+
A Parquet dataset can be used for data validation, thanks to Ibis.
147+
145148
</div>
+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
pagetitle: "Examples: Using Parquet Data"
3+
notebook-links: false
4+
page-navigation: false
5+
toc: false
6+
html-table-processing: none
7+
---
8+
9+
### Using Parquet Data
10+
11+
A Parquet dataset can be used for data validation, thanks to Ibis.
12+
13+
```{python}
14+
# | echo: false
15+
16+
import pointblank as pb
17+
import ibis
18+
19+
game_revenue = ibis.read_parquet("../data/game_revenue.parquet")
20+
21+
validation = (
22+
pb.Validate(data=game_revenue, label="Example using a Parquet dataset.")
23+
.col_vals_lt(columns="item_revenue", value=200)
24+
.col_vals_gt(columns="item_revenue", value=0)
25+
.col_vals_gt(columns="session_duration", value=5)
26+
.col_vals_in_set(columns="item_type", set=["iap", "ad"])
27+
.col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}\d{3}")
28+
.interrogate()
29+
)
30+
31+
validation
32+
```
33+
34+
```python
35+
import pointblank as pb
36+
import ibis
37+
38+
game_revenue = ibis.read_parquet("data/game_revenue.parquet")
39+
40+
validation = (
41+
pb.Validate(data=game_revenue, label="Example using a Parquet dataset.")
42+
.col_vals_lt(columns="item_revenue", value=200)
43+
.col_vals_gt(columns="item_revenue", value=0)
44+
.col_vals_gt(columns="session_duration", value=5)
45+
.col_vals_in_set(columns="item_type", set=["iap", "ad"])
46+
.col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}\d{3}")
47+
.interrogate()
48+
)
49+
50+
validation
51+
```
52+
53+
<details>
54+
<summary>Preview of Input Table</summary>
55+
56+
```{python}
57+
# | echo: false
58+
pb.preview(game_revenue)
59+
```
60+
61+
</details>

0 commit comments

Comments
 (0)