Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
db35eb8
Pydantic AI eda
htahir1 Aug 21, 2025
b4e3318
Merge remote-tracking branch 'origin/develop' into doc/pydanticaiexample
htahir1 Aug 21, 2025
44e8190
easier example
htahir1 Aug 21, 2025
a665a7a
Refactor quality gate evaluation function
htahir1 Aug 21, 2025
f747738
Update pipeline with prompt experimentation for Pydantic AI agents
htahir1 Aug 22, 2025
008bd17
Update Pydantic AI EDA pipeline README
htahir1 Aug 22, 2025
d669d92
Update pydantic version to be more flexible
htahir1 Aug 22, 2025
29dfe96
Merge remote-tracking branch 'origin/develop' into doc/pydanticaiexample
htahir1 Aug 25, 2025
42864f9
Formattingg
htahir1 Aug 25, 2025
aa6f4c1
Optimize prompt variants and tag the best performer
htahir1 Aug 25, 2025
33c4795
Merge remote-tracking branch 'origin/develop' into doc/pydanticaiexample
htahir1 Aug 26, 2025
ec3d90a
Merge remote-tracking branch 'origin/develop' into doc/pydanticaiexample
htahir1 Aug 30, 2025
61b6079
Merge branch 'develop' into doc/pydanticaiexample
strickvl Sep 29, 2025
f6ba7d9
Use argparse not click
strickvl Sep 29, 2025
dc04d9b
Fix Pydantic validation bug
strickvl Sep 29, 2025
b00bac1
Align prompt optimization prompts
strickvl Sep 29, 2025
4a61bcc
Update Agents file with instructions about commit messages
strickvl Sep 29, 2025
7df9545
Remove partial comment
strickvl Sep 29, 2025
9f4f544
Enhance prompt optimization example with scoring and provider auto-de…
strickvl Sep 29, 2025
0d20f53
Merge branch 'develop' into doc/pydanticaiexample
strickvl Sep 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,9 @@ Use filesystem navigation tools to explore the codebase structure as needed.
- Use imperative mood: "Add feature" not "Added feature"
- Reference issue numbers when applicable: "Fix user auth bug (#1234)"
- For multi-line messages, add a blank line after the summary

Codex-style agents must review the diff (when not already tracked), craft a concise summary line, and include a detailed body covering the key changes. The body should describe the main code adjustments so reviewers can understand the scope from the commit message alone.

- Example:
```
Add retry logic to artifact upload
Expand Down
165 changes: 165 additions & 0 deletions examples/prompt_optimization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Prompt Optimization with ZenML

This example demonstrates **ZenML's artifact management** through a two-stage AI prompt optimization workflow using **Pydantic AI** for exploratory data analysis.

## What This Example Shows

**Stage 1: Prompt Optimization**
- Tests multiple prompt variants against sample data
- Compares performance using data quality scores and execution time
- Emits a scoreboard artifact that summarizes quality, speed, findings, and success per prompt
- Tags the best-performing prompt with an **exclusive ZenML tag**
- Stores the optimized prompt in ZenML's artifact registry

**Stage 2: Production Analysis**
- Automatically attempts to retrieve the latest optimized prompt from the registry
- Falls back to the default system prompt if no optimized prompt is available or retrieval fails
- Runs production EDA analysis using the selected prompt
- Returns a `used_optimized_prompt` boolean to indicate whether the optimized prompt was actually used
- Demonstrates real artifact sharing between pipeline runs

This showcases how ZenML enables **reproducible ML workflows** where optimization results automatically flow into production systems, with safe fallbacks.

## Quick Start

### Prerequisites
```bash
# Install ZenML and initialize
pip install "zenml[server]"
zenml init

# Install example dependencies
pip install -r requirements.txt

# Set your API key (OpenAI or Anthropic)
export OPENAI_API_KEY="your-openai-key"
# OR
export ANTHROPIC_API_KEY="your-anthropic-key"
```

### Run the Example
```bash
# Complete two-stage workflow (default behavior)
python run.py

# Run individual stages
python run.py --optimization-pipeline # Stage 1: Find best prompt
python run.py --production-pipeline # Stage 2: Use optimized prompt

# Force a specific provider/model (override auto-detection)
python run.py --provider openai --model-name gpt-4o-mini
python run.py --provider anthropic --model-name claude-3-haiku-20240307
```

## Data Sources

The example supports multiple data sources:

```bash
# HuggingFace datasets (default)
python run.py --data-source "hf:scikit-learn/iris"
python run.py --data-source "hf:scikit-learn/wine"

# Local CSV files
python run.py --data-source "local:./my_data.csv"

# Specify target column
python run.py --data-source "local:sales.csv" --target-column "revenue"

# Sample a subset of rows for faster iterations
python run.py --data-source "hf:scikit-learn/iris" --sample-size 500
```

## Key ZenML Concepts Demonstrated

### Artifact Management
- **Exclusive Tagging**: Only one prompt can have the "optimized" tag at a time
- **Artifact Registry**: Centralized storage for ML artifacts with versioning
- **Cross-Pipeline Sharing**: Production pipeline automatically finds optimization results

### Pipeline Orchestration
- **Multi-Stage Workflows**: Optimization → Production with artifact passing
- **Conditional Execution**: Production pipeline adapts based on available artifacts
- **Lineage Tracking**: Full traceability from prompt testing to production use

### AI Integration
- **Model Flexibility**: Works with OpenAI GPT or Anthropic Claude models
- **Performance Testing**: Systematic comparison of prompt variants
- **Production Deployment**: Seamless transition from experimentation to production

## Configuration Options

### Provider and Model Selection
```bash
# Auto (default): infer provider from model name or environment keys
python run.py

# Force provider explicitly
python run.py --provider openai --model-name gpt-4o-mini
python run.py --provider anthropic --model-name claude-3-haiku-20240307

# Fully-qualified model names are also supported
python run.py --model-name "openai:gpt-4o-mini"
python run.py --model-name "anthropic:claude-3-haiku-20240307"
```

### Scoring Configuration
Configure how prompts are ranked during optimization:
```bash
# Weights for the aggregate score
python run.py --weight-quality 0.7 --weight-speed 0.2 --weight-findings 0.1

# Speed penalty (points per second) applied to compute a speed score
python run.py --speed-penalty-per-second 2.0

# Findings scoring (base points per finding) and cap
python run.py --findings-score-per-item 0.5 --findings-cap 20
```

*Why a cap?* It prevents a variant that simply emits an excessively long list of "findings" from dominating the overall score. Capping keeps the aggregate score bounded and comparable across variants while still rewarding useful coverage.

### Custom Prompt Files
Provide your own prompt variants via a file (UTF-8, one prompt per line; blank lines ignored):
```bash
python run.py --prompts-file ./my_prompts.txt
```

### Sampling
Downsample large datasets to speed up experiments:
```bash
python run.py --sample-size 500
```

### Budgets and Timeouts
Budgets are enforced at the tool boundary for deterministic behavior and cost control:
```bash
# Tool-call budget and overall time budget for the agent
python run.py --max-tool-calls 8 --timeout-seconds 120
```
- The agent tools check and enforce these budgets during execution.

### Caching
```bash
# Disable caching for fresh runs
python run.py --no-cache
```

## Expected Output

When you run the complete workflow, you'll see:

1. **Optimization Stage**: Testing of multiple prompt variants with performance metrics
2. **Scoreboard**: CLI prints a compact top-3 summary (score, time, findings, success) and a short preview of the best prompt; a full scoreboard artifact is saved
3. **Tagging**: Best prompt automatically tagged in ZenML registry (exclusive "optimized" tag)
4. **Production Stage**: Retrieval and use of optimized prompt if available; otherwise falls back to the default
5. **Results**: EDA analysis with data quality scores and recommendations, plus a `used_optimized_prompt` flag indicating whether an optimized prompt was actually used

The ZenML dashboard will show the complete lineage from optimization to production use, including the prompt scoreboard and tagged best prompt.

## Next Steps

- **View Results**: Check the ZenML dashboard for pipeline runs and artifacts
- **Customize Prompts**: Provide your own variants via `--prompts-file`
- **Tune Scoring**: Adjust weights and penalties to match your evaluation criteria
- **Scale Up**: Deploy with remote orchestrators for production workloads
- **Integrate**: Connect to your existing data sources and ML pipelines
5 changes: 5 additions & 0 deletions examples/prompt_optimization/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Pydantic AI EDA pipeline example for ZenML.

This example demonstrates how to build an AI-powered Exploratory Data Analysis
pipeline using ZenML and Pydantic AI for automated data analysis and quality assessment.
"""
Loading