Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
bec6122
Add Prompt Materializer and related model enhancements
safoinme Jun 23, 2025
ed20aa5
Merge branch 'develop' of https://github.com/zenml-io/zenml into feat…
safoinme Jun 25, 2025
07771be
Add prompt example modules and pipelines for demonstration
safoinme Jun 25, 2025
5e51696
Merge branch 'develop' into feature/prompt-abstraction
safoinme Jul 1, 2025
4ee96c3
Refactor prompt imports and update environment configurations
safoinme Jul 2, 2025
6959830
Add comprehensive ZenML prompt management demo script and update requ…
safoinme Jul 3, 2025
de61436
No code changes made.
safoinme Jul 3, 2025
a37b84e
Remove deprecated prompt evaluation example scripts and related docum…
safoinme Jul 6, 2025
c3bc6d5
Add prompt abstraction implementation plan and example pipelines
safoinme Jul 6, 2025
b5ed4ce
Refactor prompt engineering pipelines and improve code readability
safoinme Jul 6, 2025
5d467f7
Refactor and simplify prompt management in ZenML
safoinme Jul 29, 2025
f2b31b1
Merge branch 'develop' into feature/prompt-abstraction
safoinme Jul 29, 2025
458e0bf
Implement simplified prompt management system
safoinme Jul 29, 2025
a7d8765
Refactor prompt comparison utility and testing scripts
safoinme Jul 29, 2025
2a491d7
Remove `IMPLEMENTATION_SUMMARY.md` file to streamline documentation a…
safoinme Jul 29, 2025
3b4b624
Refactor performance metrics computation in training pipeline
safoinme Jul 29, 2025
b4f3cf8
Update project links in user guide README.md for improved accessibility
safoinme Jul 29, 2025
854f721
Enhance prompt engineering documentation and structure
safoinme Jul 30, 2025
d57edc9
Refactor prompt documentation for clarity and consistency
safoinme Aug 3, 2025
93090f4
Refactor prompt comparison example and pipeline documentation
safoinme Aug 3, 2025
15782b1
Merge branch 'develop' into feature/prompt-abstraction
safoinme Aug 3, 2025
7852930
Update prompt engineering documentation and remove outdated content
safoinme Aug 3, 2025
2a76eae
Refactor helper functions for prompt validation
safoinme Aug 3, 2025
2748ff0
Update src/zenml/prompts/__init__.py
safoinme Aug 4, 2025
cc5a30f
Alex Review
safoinme Aug 4, 2025
02b72a7
Enhance prompt management documentation and introduce PromptType enum
safoinme Aug 4, 2025
c6f5c1a
Add demo script for GitHub-style prompt diffing functionality
safoinme Aug 4, 2025
43e255c
Enhance prompt engineering documentation with quick start guide and a…
safoinme Aug 4, 2025
8f0a17f
Refactor output formatting in prompt comparison scripts for consistency
safoinme Aug 4, 2025
f651a25
Refactor output formatting in text summarization script for improved …
safoinme Aug 4, 2025
c7850bc
Add document extraction project with main pipeline and utilities
safoinme Aug 5, 2025
01292e2
Refactor document extraction prompts and pipeline integration
safoinme Aug 5, 2025
efd7856
Enhance document extraction project with improved prompt management a…
safoinme Aug 5, 2025
9ef3f27
Enhance prompt engineering documentation and features
safoinme Aug 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
290 changes: 290 additions & 0 deletions PROMPT_ABSTRACTION_LEARNINGS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
# ZenML Prompt Abstraction: Learnings and Vision

Based on analysis of the prompt abstraction feature and working with ZenML's architecture, here are key insights and learnings for ZenML's evolution into a leading LLMOps platform.

## **Key Learnings from Prompt Implementation**

### **1. The Complexity of Prompt Management**

**What I Learned:**
- Prompts are deceptively simple but operationally complex
- They exist at multiple abstraction levels: templates, instances, variants, evaluations
- The lifecycle is non-linear: create → test → iterate → compare → deploy → monitor → drift

**Implementation Challenge:**
```python
# This looks simple...
prompt = Prompt(template="Answer: {question}")

# But operationally requires:
# - Version management
# - A/B testing infrastructure
# - Performance tracking
# - Lineage tracking
# - Rollback capabilities
# - Multi-model evaluation
```

### **2. ZenML Philosophy vs. LLMOps Reality**

**ZenML's Core Strength (MLOps):**
- "Everything is a pipeline step"
- Artifacts are immutable and versioned
- Reproducible, traceable workflows

**LLMOps Challenge:**
- Prompts are **both** code and data
- Need real-time iteration (not just batch processing)
- Require human-in-the-loop validation
- Performance is subjective and context-dependent

**The Tension:**
```python
# ZenML way (good for MLOps):
@step
def evaluate_prompt(prompt: Prompt, dataset: Dataset) -> Metrics:
# Batch evaluation, reproducible

# LLMOps reality (also needed):
def interactive_prompt_playground(prompt: Prompt):
# Real-time testing, human feedback
# Doesn't fit pipeline paradigm well
```

### **3. Artifacts vs. Entities Dilemma**

**What We Discovered:**
The current implementation suffers from **identity crisis**:

- **As Artifacts**: Immutable, versioned, pipeline-native ✅
- **As Entities**: Need CRUD operations, real-time updates ❌

**Better Approach:**
```python
# Prompt Templates = Entities (mutable, managed)
class PromptTemplate(BaseEntity):
template: str
metadata: Dict[str, Any]

# Prompt Instances = Artifacts (immutable, versioned)
class PromptInstance(BaseArtifact):
template_id: UUID
variables: Dict[str, Any]
formatted_text: str
```

## **What Would Be Done Differently**

### **1. Embrace the Dual Nature**

**Current Problem:** Trying to force prompts into pure artifact model
**Better Solution:**
```python
# Management Layer (Entity-like)
@step
def create_prompt_template(template: str) -> PromptTemplate:
# Lives in ZenML server, has CRUD operations

# Execution Layer (Artifact-like)
@step
def instantiate_prompt(template: PromptTemplate, **vars) -> PromptInstance:
# Immutable, versioned, pipeline-native
```

### **2. Built-in Evaluation Framework**

**Current:** Examples show manual evaluation steps
**Better:** Native evaluation infrastructure:

```python
@prompt_evaluator(metrics=["accuracy", "relevance", "safety"])
def evaluate_qa_prompt(prompt: PromptInstance, ground_truth: Dataset):
# Auto-tracked, comparable across experiments

@pipeline
def prompt_optimization_pipeline():
variants = generate_prompt_variants(base_template)
results = evaluate_variants_parallel(variants) # Built-in parallelization
best_prompt = select_optimal_variant(results)
deploy_prompt(best_prompt) # Integrated deployment
```

### **3. Context-Aware Prompt Management**

**Current:** Static prompt templates
**Better:** Dynamic, context-aware prompts:

```python
class ContextualPrompt(BaseModel):
base_template: str
context_adapters: List[ContextAdapter]

def adapt_for_context(self, context: Context) -> str:
# Domain adaptation, user personalization, etc.
```

## **Vision for ZenML as Leading LLMOps Platform**

### **1. Prompt-Native Architecture**

**What This Means:**
- Prompts are first-class citizens, not afterthoughts
- Native prompt versioning, not generic artifact versioning
- Built-in prompt evaluation, not custom step implementations

**Implementation:**
```python
# Native prompt pipeline decorator
@prompt_pipeline
def optimize_customer_service_prompts():
# Auto-handles prompt-specific concerns:
# - A/B testing
# - Human evaluation collection
# - Performance monitoring
# - Automatic rollback on degradation
```

### **2. Multi-Modal Prompt Management**

**Beyond Text:**
```python
class MultiModalPrompt(BaseModel):
text_component: str
image_components: List[ImagePrompt]
audio_components: List[AudioPrompt]

# Unified evaluation across modalities
def evaluate_multimodal_performance(self, test_cases: MultiModalDataset):
# Cross-modal consistency checking
```

### **3. Production-Ready LLMOps Features**

**What's Missing (but needed for leadership):**

```python
# 1. Prompt Drift Detection
@step
def detect_prompt_drift(
current_prompt: PromptInstance,
production_logs: ConversationLogs
) -> DriftReport:
# Automatic detection of performance degradation

# 2. Prompt Security & Safety
@step
def validate_prompt_safety(prompt: PromptInstance) -> SafetyReport:
# Built-in jailbreak detection, bias checking

# 3. Cost Optimization
@step
def optimize_prompt_cost(
prompt: PromptInstance,
performance_threshold: float
) -> OptimizedPrompt:
# Automatic prompt compression while maintaining quality
```

### **4. Human-in-the-Loop Integration**

**Current Gap:** No native human feedback integration
**Vision:**
```python
@human_evaluation_step
def collect_human_feedback(
prompt_responses: List[Response]
) -> HumanFeedback:
# Integrated UI for human evaluation
# Automatic feedback aggregation
# Bias detection in human evaluations
```

## **Specific Recommendations for ZenML Leadership**

### **1. Architectural Changes**

**Immediate (6 months):**
- Split prompt management into Template (entity) + Instance (artifact)
- Native prompt evaluation framework
- Built-in A/B testing infrastructure

**Medium-term (1 year):**
- Multi-modal prompt support
- Prompt drift detection
- Cost optimization tools

**Long-term (2+ years):**
- AI-assisted prompt optimization
- Cross-model prompt portability
- Prompt marketplace/sharing

### **2. Developer Experience**

**What Would Make ZenML the Go-To LLMOps Platform:**

```python
# This should be possible with 5 lines of code:
from zenml.llmops import PromptOptimizer

optimizer = PromptOptimizer(
base_template="Summarize: {text}",
evaluation_dataset=my_dataset,
target_metrics=["accuracy", "conciseness"]
)

best_prompt = optimizer.optimize() # Handles everything automatically
```

### **3. Integration Ecosystem**

**Missing Pieces:**
- Native LangChain/LlamaIndex integration
- Built-in vector database connectors
- Prompt sharing/marketplace
- Model provider abstractions

## **Core Insight: The Prompt Paradox**

**The Challenge:** Prompts are simultaneously:
- **Engineering artifacts** (need versioning, testing, deployment)
- **Creative content** (need iteration, human judgment, contextual adaptation)
- **Business logic** (need governance, compliance, performance monitoring)

**ZenML's Opportunity:** Be the first platform to solve this paradox elegantly by:
1. Embracing the complexity rather than oversimplifying
2. Building prompt-native infrastructure, not generic artifact management
3. Integrating human feedback as a first-class citizen
4. Providing end-to-end prompt lifecycle management

## **Critical Review Summary**

### **Current Implementation Issues:**
- **Architectural Inconsistency**: Can't decide if prompts are entities or artifacts
- **Overcomplicated Core Class**: 434 lines of business logic in `Prompt` class
- **Violation of ZenML Philosophy**: Logic that should be in steps is in the core class
- **Poor Server Integration**: Generic artifact handling instead of prompt-specific logic

### **Rating: 4/10** - Needs significant refactoring

**Strengths:**
- Good conceptual foundation
- Comprehensive examples
- Solid utility functions

**Critical Issues:**
- Overcomplicated core class
- Architectural inconsistency
- Security vulnerabilities
- Poor separation of concerns

## **Conclusion**

The current implementation is a good start, but to become the **leading LLMOps platform**, ZenML needs to think bigger and solve the unique challenges of prompt management, not just apply traditional MLOps patterns to a fundamentally different problem.

The path forward requires:
1. **Architectural clarity** - Choose entity vs artifact approach and stick to it
2. **Prompt-native features** - Build for LLMOps, not generic MLOps
3. **Human-in-the-loop integration** - Essential for prompt workflows
4. **Production-ready tooling** - Drift detection, safety validation, cost optimization

ZenML has the opportunity to define the LLMOps category the same way it helped define MLOps, but only if it embraces the unique challenges of prompt management rather than trying to force them into existing MLOps patterns.
2 changes: 1 addition & 1 deletion docs/book/user-guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Step-by-step instructions to help you master ZenML concepts and features.
Complete end-to-end implementations that showcase ZenML in real-world scenarios.\
[See all projects in our website →](https://www.zenml.io/projects)

<table data-view="cards"><thead><tr><th></th><th></th><th data-hidden data-card-cover data-type="files"></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>ZenCoder</strong></td><td>Your Own MLOps Engineer</td><td><a href=".gitbook/assets/zencoder.jpg">zencoder.jpg</a></td><td><a href="https://www.zenml.io/projects/zencoder-your-own-mlops-engineer">https://www.zenml.io/projects/zencoder-your-own-mlops-engineer</a></td></tr><tr><td><strong>LLM-Complete Guide</strong></td><td>Production-ready RAG pipelines from basic retrieval to advanced LLMOps with embeddings finetuning and evals.</td><td><a href=".gitbook/assets/llm-complete-guide.jpg">llm-complete-guide.jpg</a></td><td><a href="https://github.com/zenml-io/zenml-projects/tree/main/llm-complete-guide">https://github.com/zenml-io/zenml-projects/tree/main/llm-complete-guide</a></td></tr><tr><td><strong>NightWatch</strong></td><td>AI Database Summaries While You Sleep</td><td><a href=".gitbook/assets/nightwatch.jpg">nightwatch.jpg</a></td><td><a href="https://www.zenml.io/projects/nightwatch-ai-database-summaries-while-you-sleep">https://www.zenml.io/projects/nightwatch-ai-database-summaries-while-you-sleep</a></td></tr><tr><td><strong>Research Radar</strong></td><td>Automates research paper discovery and classification for specialized research domains.</td><td><a href=".gitbook/assets/researchradar.jpg">researchradar.jpg</a></td><td></td></tr><tr><td><strong>Magic Photobooth</strong></td><td>A personalized AI image generation product that can create your avatars from a selfie.</td><td><a href=".gitbook/assets/magicphoto.jpg">magicphoto.jpg</a></td><td><a href="https://www.zenml.io/projects/magic-photobooth">https://www.zenml.io/projects/magic-photobooth</a></td></tr><tr><td><strong>Sign Language Detection with YOLOv5</strong></td><td>End-to-end computer vision pipeline</td><td><a href=".gitbook/assets/yolo.jpg">yolo.jpg</a></td><td><a href="https://www.zenml.io/projects/sign-language-detection-with-yolov5">https://www.zenml.io/projects/sign-language-detection-with-yolov5</a></td></tr><tr><td><strong>ZenML Support Agent</strong></td><td>A production-ready agent that can help you with your ZenML questions.</td><td><a href=".gitbook/assets/support.jpg">support.jpg</a></td><td><a href="https://www.zenml.io/projects/zenml-support-agent">https://www.zenml.io/projects/zenml-support-agent</a></td></tr><tr><td><strong>GameSense</strong></td><td>The LLM That Understands Gamers</td><td><a href=".gitbook/assets/gamesense.jpg">gamesense.jpg</a></td><td><a href="https://www.zenml.io/projects/gamesense-the-llm-that-understands-gamers">https://www.zenml.io/projects/gamesense-the-llm-that-understands-gamers</a></td></tr><tr><td><strong>EuroRate Predictor</strong></td><td>Turn European Central Bank data into actionable interest rate forecasts with this comprehensive MLOps solution.</td><td><a href=".gitbook/assets/eurorate.jpg">eurorate.jpg</a></td><td><a href="https://www.zenml.io/projects/eurorate-predictor">https://www.zenml.io/projects/eurorate-predictor</a></td></tr></tbody></table>
<table data-view="cards"><thead><tr><th></th><th></th><th data-hidden data-card-cover data-type="files"></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>ZenCoder</strong></td><td>Your Own MLOps Engineer</td><td><a href=".gitbook/assets/zencoder.jpg">zencoder.jpg</a></td><td><a href="https://www.zenml.io/projects/zencoder-your-own-mlops-engineer">https://www.zenml.io/projects/zencoder-your-own-mlops-engineer</a></td></tr><tr><td><strong>LLM-Complete Guide</strong></td><td>Production-ready RAG pipelines from basic retrieval to advanced LLMOps with embeddings finetuning and evals.</td><td><a href=".gitbook/assets/llm-complete-guide.jpg">llm-complete-guide.jpg</a></td><td><a href="https://github.com/zenml-io/zenml-projects/blob/main/llm-complete-guide/README.md">https://www.zenml.io/projects/llm-complete-guide</a></td></tr><tr><td><strong>NightWatch</strong></td><td>AI Database Summaries While You Sleep</td><td><a href=".gitbook/assets/nightwatch.jpg">nightwatch.jpg</a></td><td><a href="https://www.zenml.io/projects/nightwatch-ai-database-summaries-while-you-sleep">https://www.zenml.io/projects/nightwatch-ai-database-summaries-while-you-sleep</a></td></tr><tr><td><strong>Research Radar</strong></td><td>Automates research paper discovery and classification for specialized research domains.</td><td><a href=".gitbook/assets/researchradar.jpg">researchradar.jpg</a></td><td></td></tr><tr><td><strong>Magic Photobooth</strong></td><td>A personalized AI image generation product that can create your avatars from a selfie.</td><td><a href=".gitbook/assets/magicphoto.jpg">magicphoto.jpg</a></td><td><a href="https://www.zenml.io/projects/magic-photobooth">https://www.zenml.io/projects/magic-photobooth</a></td></tr><tr><td><strong>Sign Language Detection with YOLOv5</strong></td><td>End-to-end computer vision pipeline</td><td><a href=".gitbook/assets/yolo.jpg">yolo.jpg</a></td><td><a href="https://www.zenml.io/projects/sign-language-detection-with-yolov5">https://www.zenml.io/projects/sign-language-detection-with-yolov5</a></td></tr><tr><td><strong>ZenML Support Agent</strong></td><td>A production-ready agent that can help you with your ZenML questions.</td><td><a href=".gitbook/assets/support.jpg">support.jpg</a></td><td><a href="https://www.zenml.io/projects/zenml-support-agent">https://www.zenml.io/projects/zenml-support-agent</a></td></tr><tr><td><strong>GameSense</strong></td><td>The LLM That Understands Gamers</td><td><a href=".gitbook/assets/gamesense.jpg">gamesense.jpg</a></td><td><a href="https://www.zenml.io/projects/gamesense-the-llm-that-understands-gamers">https://www.zenml.io/projects/gamesense-the-llm-that-understands-gamers</a></td></tr><tr><td><strong>EuroRate Predictor</strong></td><td>Turn European Central Bank data into actionable interest rate forecasts with this comprehensive MLOps solution.</td><td><a href=".gitbook/assets/eurorate.jpg">eurorate.jpg</a></td><td><a href="https://www.zenml.io/projects/eurorate-predictor">https://www.zenml.io/projects/eurorate-predictor</a></td></tr></tbody></table>

## Examples

Expand Down
6 changes: 6 additions & 0 deletions docs/book/user-guide/llmops-guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ icon: robot

Welcome to the ZenML LLMOps Guide, where we dive into the exciting world of Large Language Models (LLMs) and how to integrate them seamlessly into your MLOps pipelines using ZenML. This guide is designed for ML practitioners and MLOps engineers looking to harness the potential of LLMs while maintaining the robustness and scalability of their workflows.

From foundational prompt engineering practices to advanced RAG implementations, we cover the essential techniques for building production-ready LLM applications with ZenML's streamlined approach.

<figure><img src="../../.gitbook/assets/rag-overview.png" alt=""><figcaption><p>ZenML simplifies the development and deployment of LLM-powered MLOps pipelines.</p></figcaption></figure>

In this guide, we'll explore various aspects of working with LLMs in ZenML, including:
Expand All @@ -23,6 +25,10 @@ In this guide, we'll explore various aspects of working with LLMs in ZenML, incl
* [Retrieval evaluation](evaluation/retrieval.md)
* [Generation evaluation](evaluation/generation.md)
* [Evaluation in practice](evaluation/evaluation-in-practice.md)
* [Prompt engineering](prompt-engineering/)
* [Quick start](prompt-engineering/quick-start.md)
* [Understanding prompt management](prompt-engineering/understanding-prompt-management.md)
* [Best practices](prompt-engineering/best-practices.md)
* [Reranking for better retrieval](reranking/)
* [Understanding reranking](reranking/understanding-reranking.md)
* [Implementing reranking in ZenML](reranking/implementing-reranking.md)
Expand Down
59 changes: 59 additions & 0 deletions docs/book/user-guide/llmops-guide/prompt-engineering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
description: Simple prompt engineering with ZenML - version control, A/B testing, and dashboard visualization.
icon: edit
---

# Prompt Engineering

ZenML's prompt engineering focuses on the three things teams actually need: **simple versioning**, **A/B testing**, and **dashboard visualization**.

## Quick Start

1. **Run the example**:
```bash
cd examples/prompt_engineering
python run_simple_comparison.py
```

2. **Check your dashboard** to see prompt artifacts with rich visualizations

## Core Features

### Git-like Versioning
```python
prompt_v1 = Prompt(template="Answer: {question}", version="1.0")
prompt_v2 = Prompt(template="Detailed answer: {question}", version="2.0")
```

### A/B Testing
```python
# Pipeline automatically compares versions and determines winner
result = simple_prompt_comparison()
print(f"Winner: {result['winner']}")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is implemented in the code, but it seems a bit magic to me :) like the two prompts aren't even passed in as arguments, so what's happening here?


### Dashboard Integration
- Syntax-highlighted templates
- Variable tables and validation
- Version tracking across runs
- Side-by-side comparisons

## Why This Approach?

User research shows teams with millions of daily requests use **simple Git-based versioning**, not complex management systems. This approach focuses on what actually works in production.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this page needs a bit more about our philosophy around prompts:

  • you should version your prompts
  • version your prompts with git in code, but zenml is also able to independently track versions
  • why we think this is better than having a 'prompt registry' as a separate thing etc
  • that we see prompts as just a different type of artifact, and that we store all these versions in your artifact store like normal etc.


## Documentation

* [Quick Start](quick-start.md) - Working example walkthrough
* [Understanding Prompt Management](understanding-prompt-management.md) - Research and philosophy
* [Best Practices](best-practices.md) - Production guidance

## Example Structure

The `examples/prompt_engineering/` directory demonstrates proper organization:
- `pipelines/` - Pipeline definitions
- `steps/` - Individual step implementations
- `utils/` - Helper functions
- Clean separation of concerns

Start with the quick start example to see all features in action.
Loading
Loading