Skip to content

Add AzureOpenAI as a model provider to DraftValidation#387

Merged
rich-iannone merged 2 commits into
posit-dev:mainfrom
howardbaik:main
Apr 22, 2026
Merged

Add AzureOpenAI as a model provider to DraftValidation#387
rich-iannone merged 2 commits into
posit-dev:mainfrom
howardbaik:main

Conversation

@howardbaik

@howardbaik howardbaik commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • pointblank/draft.py: new azure-openai branch in __post_init__ after the bedrock block, mirroring the _utils_ai.py pattern but using self.api_key and the draft-specific system prompt. Docstring updated in three places: class summary, model parameter description, and the "Constructing the model Argument" block (now lists five providers and explains the :deployment_id convention plus required env vars).
  • pointblank/draft.py:254: fixed an unrelated latent Windows bug where api-docs.txt was being read with the platform default encoding (cp1252 on Windows), which crashed on non-latin1 bytes. Now uses encoding="utf-8". This only surfaced because the new tests reach further into __post_init__ than the old ones.
  • user_guide/02-advanced-validation/04-draft-validation.qmd: added Azure OpenAI to the supported-providers list, to the .env sample, and to the model-specification examples.
  • tests/test_draft.py: two new tests covering the missing-endpoint and missing-api-version ValueError paths, following the existing monkeypatch/importorskip style from test__utils_ai.py.

Example Usage

Reproducible Test Code

import pointblank as pb
import os
from dotenv import load_dotenv

load_dotenv(override=True)

api_key = os.getenv("AZURE_OPENAI_API_KEY")

data = pb.load_dataset(dataset="global_sales", tbl_type="polars")

# Generate a validation plan
pb.DraftValidation(
    data=data,
    model="azure-openai:gpt-5.4",
    api_key=api_key
)

Expected Output

import pointblank as pb

# Define schema based on column names and dtypes
schema = pb.Schema(
    columns=[
        ("product_id", "String"),
        ("product_category", "String"),
        ("customer_id", "String"),
        ("customer_segment", "String"),
        ("region", "String"),
        ("country", "String"),
        ("city", "String"),
        ("timestamp", "Datetime(time_unit='us', time_zone=None)"),
        ("quarter", "String"),
        ("month", "Int64"),
        ("year", "Int64"),
        ("price", "Float64"),
        ("quantity", "Int64"),
        ("status", "String"),
        ("email", "String"),
        ("revenue", "Float64"),
        ("tax", "Float64"),
        ("total", "Float64"),
        ("payment_method", "String"),
        ("sales_channel", "String"),
    ]
)

# The validation plan
validation = (
    pb.Validate(
        data=your_data,  # Replace your_data with the actual data variable
        label="Draft Validation",
        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35)
    )
    .col_schema_match(schema=schema)
    .col_count_match(count=20)
    .row_count_match(count=50000)
    .rows_distinct()
    .col_vals_not_null(
        columns=[
            "product_category",
            "customer_segment",
            "region",
            "country",
            "month",
            "year",
            "price",
            "quantity",
            "status",
            "email",
            "revenue",
            "tax",
            "total",
            "payment_method",
            "sales_channel",
        ]
    )
    .col_vals_between(
        columns="month",
        left=1,
        right=12,
        na_pass=True
    )
    .col_vals_between(
        columns="year",
        left=2021,
        right=2023,
        na_pass=True
    )
    .col_vals_gt(
        columns="price",
        value=0,
        na_pass=False
    )
    .col_vals_gt(
        columns="quantity",
        value=0,
        na_pass=False
    )
    .col_vals_in_set(
        columns="status",
        set=[
            "pending",
            "processing",
            "shipped",
            "delivered",
            "returned",
            "cancelled",
        ]
    )
    .col_vals_within_spec(
        columns="email",
        spec="email"
    )
    .col_vals_gt(
        columns="revenue",
        value=0,
        na_pass=False
    )
    .col_vals_ge(
        columns="tax",
        value=0,
        na_pass=False
    )
    .col_vals_ge(
        columns="total",
        value=0,
        na_pass=False
    )
    .col_vals_in_set(
        columns="quarter",
        set=[
            "2021-Q1",
            "2021-Q2",
            "2021-Q3",
            "2021-Q4",
            "2022-Q1",
            "2022-Q2",
            "2022-Q3",
            "2022-Q4",
            "2023-Q1",
            "2023-Q2",
            "2023-Q3",
            "2023-Q4",
        ]
    )
    .col_vals_in_set(
        columns="region",
        set=[
            "North America",
            "Europe",
            "Asia Pacific",
        ]
    )
    .col_vals_in_set(
        columns="payment_method",
        set=[
            "Credit Card",
            "PayPal",
            "Bank Transfer",
            "Apple Pay",
            "Google Pay",
        ]
    )
    .col_vals_in_set(
        columns="sales_channel",
        set=[
            "Online",
            "Retail",
            "Distributor",
            "Partner",
            "Phone",
        ]
    )
    .interrogate()
)

validation

Note

Make sure OPENAI_API_VERSION is set to 2025-03-01-preview or later — the Azure OpenAI Responses API is only available on these versions.

Related GitHub Issues and PRs

Fixes: #386

Checklist

  • [ x] I understand and agree to the Code of Conduct.
  • [ x] I have followed the Style Guide for Python Code as best as possible for the submitted code.
  • [ x] I have added pytest unit tests for any new functionality.

@rich-iannone rich-iannone left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rich-iannone rich-iannone merged commit 5ad9b0a into posit-dev:main Apr 22, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Azure OpenAI as a model provider to DraftValidation

2 participants