Skip to content

Commit cf66d3f

Browse files
authored
Merge branch 'posit-dev:main' into add-get_dataframe
2 parents 2d7e9b8 + e56ee08 commit cf66d3f

713 files changed

Lines changed: 225702 additions & 29153 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ message: 'If you wish to cite the "Pointblank" package use:'
33
type: software
44
license: MIT
55
title: "Pointblank: data validation toolkit for assessing and monitoring data quality."
6-
version: 0.16.0
6+
version: 0.21.0
77
abstract: Validate data in Polars and Pandas DataFrames and database tables.
88
Validation pipelines can be made using easily-readable, consecutive validation
99
steps. Upon execution of the validation plan, several reporting options are available.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,7 @@ data
392392
The generator supports sophisticated data generation with these capabilities:
393393

394394
- **Realistic data with presets**: Use built-in presets like `"name"`, `"email"`, `"address"`, `"phone"`, etc.
395+
- **User agent strings**: Generate highly varied, realistic browser user agent strings from 17 browser categories with over 42,000 unique combinations
395396
- **50+ country support**: Generate locale-specific data (e.g., `country="DE"` for German addresses)
396397
- **Field constraints**: Control ranges, patterns, uniqueness, and allowed values
397398
- **Multiple output formats**: Returns Polars DataFrames by default, but also supports Pandas (`output="pandas"`) or dictionaries (`output="dict"`)
@@ -404,6 +405,7 @@ This makes it easy to generate test data that matches your validation rules, hel
404405
- **Built for collaboration**: Share results with colleagues through beautiful interactive reports
405406
- **Practical outputs**: Get exactly what you need: counts, extracts, summaries, or full reports
406407
- **Flexible deployment**: Use in notebooks, scripts, or data pipelines
408+
- **Synthetic data generation**: Create realistic test data with 30+ presets, user agent strings, locale-aware formatting, and 50+ country support
407409
- **Customizable**: Tailor validation steps and reporting to your specific needs
408410
- **Internationalization**: Reports can be generated in 40 languages, including English, Spanish, French, and German
409411

docs/_quarto.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,16 @@ quartodoc:
274274
- name: get_data_path
275275
- name: connect_to_table
276276
- name: print_database_tables
277+
- title: Table Pre-checks
278+
desc: >
279+
The *Table Pre-checks* group contains helper functions that are designed for use with the
280+
`active=` parameter of validation methods. These callables inspect the target table before
281+
a validation step runs and conditionally skip the step when a precondition is not met
282+
(e.g., a required column is missing or the table does not have enough rows). A descriptive,
283+
locale-aware note is automatically attached to any step that is skipped.
284+
contents:
285+
- name: has_columns
286+
- name: has_rows
277287
- title: YAML
278288
desc: >
279289
The *YAML* group contains functions that allow for the use of YAML to orchestrate validation
@@ -304,7 +314,8 @@ quartodoc:
304314
desc: >
305315
Generate synthetic test data based on schema definitions. Use `generate_dataset()` to
306316
create data from a `Schema` object. The helper functions define typed fields with
307-
constraints for realistic test data generation.
317+
constraints for realistic test data generation. The `profile_fields()` helper creates a
318+
complete person-profile schema (name, email, address, phone, etc.) in a single call.
308319
contents:
309320
- name: generate_dataset
310321
- name: int_field
@@ -315,6 +326,7 @@ quartodoc:
315326
- name: datetime_field
316327
- name: time_field
317328
- name: duration_field
329+
- name: profile_fields
318330
- title: Prebuilt Actions
319331
desc: >
320332
The *Prebuilt Actions* group contains a function that can be used to send a Slack

docs/demos/datetime-validations/index.qmd

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ html-table-processing: none
1616
import pointblank as pb
1717
import polars as pl
1818
from datetime import date, datetime
19-
import pytz
19+
from zoneinfo import ZoneInfo
2020
2121
# Create sample data with various temporal data types
2222
temporal_data = pl.DataFrame({
@@ -33,10 +33,10 @@ temporal_data = pl.DataFrame({
3333
datetime(2024, 3, 20, 17, 22, 45)
3434
],
3535
"event_time_tz": [
36-
datetime(2023, 1, 15, 9, 0, tzinfo=pytz.timezone("America/New_York")),
37-
datetime(2023, 6, 10, 12, 30, tzinfo=pytz.timezone("America/New_York")),
38-
datetime(2023, 12, 5, 15, 45, tzinfo=pytz.timezone("America/New_York")),
39-
datetime(2024, 3, 20, 18, 15, tzinfo=pytz.timezone("America/New_York"))
36+
datetime(2023, 1, 15, 9, 0, tzinfo=ZoneInfo("America/New_York")),
37+
datetime(2023, 6, 10, 12, 30, tzinfo=ZoneInfo("America/New_York")),
38+
datetime(2023, 12, 5, 15, 45, tzinfo=ZoneInfo("America/New_York")),
39+
datetime(2024, 3, 20, 18, 15, tzinfo=ZoneInfo("America/New_York"))
4040
],
4141
"order_id": [1001, 1002, 1003, 1004],
4242
"amount": [150.0, 275.5, 89.99, 420.00]
@@ -57,7 +57,7 @@ validation = (
5757
)
5858
.col_vals_ge(
5959
columns="event_time_tz",
60-
value=datetime(2023, 1, 1, 8, 0, tzinfo=pytz.timezone("America/New_York")),
60+
value=datetime(2023, 1, 1, 8, 0, tzinfo=ZoneInfo("America/New_York")),
6161
brief="Timezone-aware events after 8 AM Eastern"
6262
)
6363
.col_schema_match(
@@ -82,7 +82,7 @@ validation
8282
import pointblank as pb
8383
import polars as pl
8484
from datetime import date, datetime
85-
import pytz
85+
from zoneinfo import ZoneInfo
8686

8787
# Create sample data with various temporal data types
8888
temporal_data = pl.DataFrame({
@@ -99,10 +99,10 @@ temporal_data = pl.DataFrame({
9999
datetime(2024, 3, 20, 17, 22, 45)
100100
],
101101
"event_time_tz": [
102-
datetime(2023, 1, 15, 9, 0, tzinfo=pytz.timezone("America/New_York")),
103-
datetime(2023, 6, 10, 12, 30, tzinfo=pytz.timezone("America/New_York")),
104-
datetime(2023, 12, 5, 15, 45, tzinfo=pytz.timezone("America/New_York")),
105-
datetime(2024, 3, 20, 18, 15, tzinfo=pytz.timezone("America/New_York"))
102+
datetime(2023, 1, 15, 9, 0, tzinfo=ZoneInfo("America/New_York")),
103+
datetime(2023, 6, 10, 12, 30, tzinfo=ZoneInfo("America/New_York")),
104+
datetime(2023, 12, 5, 15, 45, tzinfo=ZoneInfo("America/New_York")),
105+
datetime(2024, 3, 20, 18, 15, tzinfo=ZoneInfo("America/New_York"))
106106
],
107107
"order_id": [1001, 1002, 1003, 1004],
108108
"amount": [150.0, 275.5, 89.99, 420.00]
@@ -123,7 +123,7 @@ validation = (
123123
)
124124
.col_vals_ge(
125125
columns="event_time_tz",
126-
value=datetime(2023, 1, 1, 8, 0, tzinfo=pytz.timezone("America/New_York")),
126+
value=datetime(2023, 1, 1, 8, 0, tzinfo=ZoneInfo("America/New_York")),
127127
brief="Timezone-aware events after 8 AM Eastern"
128128
)
129129
.col_schema_match(

0 commit comments

Comments
 (0)