🚀 DSPy-GEPA-BI: Multi-Objective LLM Optimization for Business Intelligence

A reproducible framework combining DSPy agentic workflows with GEPA (Genetic-Pareto) optimization for automated BI tasks: NL→SQL synthesis, KPI compilation, and executive insights.

📋 Table of Contents

Overview
Key Features
Why This Framework?
Quick Start
Architecture
Experiments & Results
Usage
API Documentation
Contributing

🎯 Overview

This project demonstrates a production-ready BI pipeline that combines:

DSPy Framework: Modular LLM programs for SQL generation, KPI compilation, and narrative insights
GEPA Optimizer: Genetic algorithm + Pareto optimization for multi-objective prompt tuning
Real BI Warehouse: DuckDB-based data warehouse with 10K orders, 5 data marts
Interactive Dashboard: Streamlit app for natural language BI queries

What Makes This Unique?

Multi-objective optimization - Unlike traditional prompt engineering that optimizes for accuracy alone, GEPA simultaneously optimizes for:

✅ Accuracy (SQL correctness)
✅ Cost (token efficiency)
✅ Latency (response time)
✅ Test Pass Rate (reliability)

Result: Pareto-optimal prompts that balance quality and efficiency!

✨ Key Features

🏗️ Production-Ready BI Pipeline

DuckDB Warehouse: 10,000 synthetic orders with realistic schema
5 Data Marts: Fact tables, dimensions, aggregations
Parquet Exports: Ready for Power BI, Tableau, Looker
ETL Pipeline: Full extract-transform-load workflow

🤖 DSPy Agentic Workflows

SQL Synthesizer: Natural language → SQL with chain-of-thought
KPI Compiler: NL descriptions → DAX/SQL measures
Insight Writer: Automated executive summaries
Auto-Graders: Quality validation and scoring

🧬 GEPA Multi-Objective Optimizer

Genetic Algorithm: Population-based search with selection, crossover, mutation
Pareto Optimization: Find non-dominated solutions across objectives
Reflective Learning: Mine rules from failures, adapt prompts
Configurable: Customize population size, generations, objectives

📊 Interactive Dashboard

Streamlit UI: Natural language query interface
Accessible: WCAG-compliant keyboard navigation
Visualizations: Charts, tables, KPI cards
Real-time: Live SQL generation and execution

🔬 Evaluation Suite

15 Benchmark Tasks: SQL + KPI generation
Automated Scoring: Execution-based validation
Comparative Experiments: Vanilla vs DSPy vs DSPy+GEPA
Reproducible: All configs and data included

🚀 Why This Framework?

Problem: Traditional Prompting Falls Short

Vanilla Prompting: Manual trial-and-error, single-objective focus, no systematic improvement

Regular DSPy: Better structure, but no optimization for cost/latency trade-offs

DSPy + GEPA: Automated multi-objective optimization - finds prompts that are accurate AND efficient!

Our Approach: GEPA Optimization

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Initial   │────▶│   Genetic    │────▶│   Pareto    │
│  Population │     │  Evolution   │     │  Selection  │
│             │     │ (mutate/     │     │ (best       │
│ 8 genomes   │     │  crossover)  │     │  tradeoffs) │
└─────────────┘     └──────────────┘     └─────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │  Reflection  │
                    │  & Learning  │
                    │ (mine rules) │
                    └──────────────┘

Result after 4 generations:

70% accuracy ✅
100% test pass rate ✅
134 tokens/task (30% more efficient than baseline) ✅
$0.004 per optimization run ✅

⚡ Quick Start

Prerequisites

Python 3.8+
LLM API key (Mistral recommended, or OpenAI/Gemini)

Installation

# 1. Clone repository
git clone https://github.com/yourusername/DsPy-GEPA-BI.git
cd DsPy-GEPA-BI

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure API key
cp .env.example .env
# Edit .env and add: MISTRAL_API_KEY=your-key-here

# 4. Initialize data warehouse
python -m src.scripts.run_etl

# 5. Run GEPA optimization (optional)
python -m src.scripts.run_real_gepa

# 6. Launch dashboard
python -m src.scripts.run_app

Visit http://localhost:8501 to access the dashboard!

Get API Key (Free Options!)

Mistral (Recommended):

Get $5 free credits at https://console.mistral.ai/
Cost: $0.001 per 1K tokens (~1,250 GEPA runs on free tier!)

Google Gemini (Alternative):

Free tier at https://aistudio.google.com/app/apikey
15 requests/minute, no credit card needed

🏛️ Architecture

Repository Structure

DsPy-GEPA-BI/
├── src/
│   ├── etl/              # Extract-Transform-Load pipeline
│   ├── dspy_programs/    # SQL synth, KPI compiler, insights
│   ├── gepa/             # Genetic-Pareto optimizer
│   ├── eval/             # Scoring & benchmarks
│   ├── app/              # Streamlit dashboard
│   └── scripts/          # CLI entry points
├── data/
│   ├── raw/              # Source CSVs (orders, customers, products)
│   └── warehouse/        # DuckDB + Parquet exports
├── eval/
│   ├── benchmarks/       # sql_tasks.jsonl, kpi_tasks.jsonl
│   └── results/          # GEPA outputs
├── configs/              # project.yaml, eval.yaml
├── .env.example          # API key template
├── requirements.txt      # Dependencies
└── README.md             # This file

System Flow

┌───────────┐      ┌──────────┐      ┌─────────────┐      ┌──────────┐
│  Raw CSV  │─────▶│   ETL    │─────▶│   DuckDB    │─────▶│ Parquet  │
│   Data    │      │ Pipeline │      │  Warehouse  │      │ Exports  │
└───────────┘      └──────────┘      └─────────────┘      └──────────┘
                                            │
                                            ▼
┌───────────┐      ┌──────────┐      ┌─────────────┐
│   User    │─────▶│   DSPy   │─────▶│     LLM     │
│  Query    │  NL  │ Programs │ SQL  │  (Mistral)  │
└───────────┘      └──────────┘      └─────────────┘
                         │
                         ▼
                   ┌──────────┐
                   │   GEPA   │
                   │Optimizer │ ←─ Benchmarks
                   └──────────┘

🔬 Experiments & Results

Comparative Experiment: Vanilla vs DSPy vs DSPy+GEPA

Run the comparison:

python -m src.scripts.compare_baselines

Expected Results (5 SQL tasks):

Approach	Accuracy	Avg Tokens	Avg Latency	Cost
Vanilla Prompting	60%	180	0.35s	$0.001
DSPy (No Optimization)	65%	165	0.31s	$0.001
DSPy + GEPA ✨	70%	134	0.26s	$0.0007

Key Findings:

✅ +10% accuracy over vanilla prompting
✅ 25% fewer tokens (cost savings!)
✅ 26% faster response time
✅ Better reliability (100% test pass rate)

Real GEPA Optimization Run

Configuration:

Population: 8 genomes
Generations: 4
Tasks per genome: 5
Model: Mistral Small

Evolution Progress:

Gen 1: Score -12.571 → Archive: 1 genome
Gen 2: Score -12.576 → Archive: 1 genome
Gen 3: Score -12.573 → Archive: 1 genome  
Gen 4: Score -12.566 → Archive: 1 genome ✨ BEST

Best Genome Results:

Accuracy: 70%
Test Pass Rate: 100%
Tokens/Task: 134
Latency: 0.26s
Total Cost: $0.004

Pareto Archive: Contains non-dominated solutions representing optimal accuracy/cost/latency tradeoffs

💻 Usage

CLI Commands

# ETL: Create warehouse
python -m src.scripts.run_etl

# Evaluation: Test DSPy programs
python -m src.scripts.eval_sql

# GEPA: Run optimization (simulated)
python -m src.scripts.run_gepa_search

# GEPA: Run with REAL LLM
python -m src.scripts.run_real_gepa

# Compare: Baseline experiments
python -m src.scripts.compare_baselines

# Dashboard: Launch Streamlit app
python -m src.scripts.run_app

Python API

from src.dspy_programs.sql_synth import SQLSynthesizer
from src.gepa.loop import GEPAOptimizer
from src.utils.duck import DuckDBManager

# Initialize components
db = DuckDBManager('data/warehouse/bi.duckdb')
synthesizer = SQLSynthesizer(schema={...})

# Generate SQL
result = synthesizer(task="Get total revenue by region")
print(result.sql)

# Run GEPA optimization
optimizer = GEPAOptimizer(
    eval_fn=your_eval_function,
    config=gepa_config
)
results = optimizer.optimize()

Configuration

Edit configs/project.yaml:

gepa:
  population_size: 8      # Number of genomes per generation
  generations: 4          # Evolution iterations
  max_trials: 50          # Budget cap
  mutation_rate: 0.3      # Probability of mutation
  crossover_rate: 0.5     # Probability of crossover

objectives:
  accuracy: maximize      # SQL correctness
  cost: minimize          # Token usage
  latency: minimize       # Response time
  tests: maximize         # Test pass rate

🎯 API Documentation

DSPy Programs

SQLSynthesizer

class SQLSynthesizer(dspy.Module):
    """Generate SQL from natural language."""
    
    def forward(self, task: str, schema: dict) -> dspy.Prediction:
        """
        Args:
            task: Natural language query
            schema: Database schema dict
            
        Returns:
            Prediction with .sql attribute
        """

KPICompiler

class KPICompiler(dspy.Module):
    """Compile KPI definitions to DAX/SQL."""
    
    def forward(self, kpi_desc: str, schema: dict) -> dspy.Prediction:
        """
        Args:
            kpi_desc: KPI description in natural language
            schema: Data warehouse schema
            
        Returns:
            Prediction with .measure (DAX/SQL)
        """

GEPA Optimizer

GEPAOptimizer

class GEPAOptimizer:
    """Multi-objective genetic optimization."""
    
    def __init__(self, eval_fn: Callable, config: GEPAConfig):
        """
        Args:
            eval_fn: Function(genome) -> metrics dict
            config: Population size, generations, etc.
        """
    
    def optimize(self) -> dict:
        """
        Run optimization loop.
        
        Returns:
            {
                'best_genome': PromptGenome,
                'pareto_archive': List[genome],
                'history': List[generation_stats],
                'mined_rules': List[Rule]
            }
        """

📊 Dashboard Features

Natural Language Interface

Type questions like: "What are the top 10 products by revenue?"
Auto-generates SQL and executes on warehouse
Displays results in tables and charts

Accessibility (WCAG 2.1 Level AA)

✅ Keyboard navigation (Tab, Enter, Escape)
✅ Focus indicators (3px outline)
✅ Screen reader compatible
✅ High contrast mode
✅ Minimum 44px click targets

Visualizations

📈 Line charts (time series)
📊 Bar charts (comparisons)
🥧 Pie charts (distributions)
📋 Data tables (detailed views)
💳 KPI cards (metrics)

🔧 Advanced Configuration

Custom Objectives

Add new objectives in src/gepa/objectives.py:

class CustomObjective(Objective):
    name = "readability"
    direction = "maximize"
    
    def compute(self, metrics: dict) -> float:
        # Higher score = better readability
        return metrics.get('cyclomatic_complexity', 0)

Custom Mutations

Extend GEPA in src/gepa/loop.py:

def _mutate_custom(self, genome: PromptGenome) -> PromptGenome:
    """Custom mutation operator."""
    # Your logic here
    return modified_genome

🤝 Contributing

We welcome contributions! Areas for improvement:

More LLM Providers: Add Claude, Cohere, local models
Larger Benchmarks: Expand to 100+ tasks
Advanced GEPA: Multi-population, adaptive operators
Model Distillation: Train smaller models on best prompts
Real-world Data: Integration examples with actual databases

Development Setup

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Check code quality
black src/
flake8 src/
mypy src/

📜 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

DSPy: Stanford NLP's modular LLM framework
DuckDB: High-performance in-process OLAP database
Mistral AI: Affordable, high-quality LLM API
Streamlit: Easy dashboard development

🎯 Roadmap

v1.0 (Current) ✅

v1.1 (Planned)

v2.0 (Future)

⭐ Star this repo if you find it useful!

Built with ❤️ using DSPy, GEPA, Mistral AI, and DuckDB.

Last Updated: October 9, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
dashboards		dashboards
eval/benchmarks		eval/benchmarks
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

kp27302/SmartPrompt-BI

Folders and files

Latest commit

History

Repository files navigation