jxnl
diff --git a/‎AGENTS.md‎
Lines changed: 2 additions & 1 deletion b/‎AGENTS.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 79 additions & 116 deletions b/‎README.md‎
Lines changed: 79 additions & 116 deletions
diff --git a/‎all_providers_test.md‎
Lines changed: 26 additions & 0 deletions b/‎all_providers_test.md‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎benchmark_results.md‎
Lines changed: 18 additions & 0 deletions b/‎benchmark_results.md‎
Lines changed: 18 additions & 0 deletions
@@ -4,7 +4,8 @@
 - `latest/`: Current course code and the WildChat case study (`latest/case_study/{core,pipelines}`).
 - `cohort_1/`, `cohort_2/`: Earlier cohort materials kept for reference.
 - `docs/`: MkDocs book sources; site config in `mkdocs.yml`.
-- `docs/workshops/`: Chapter content `chapterN.md` and subparts `chapterN-M.md`, plus `chapterN-slides.md`; entrypoint is `docs/workshops/index.md`.
+- `docs/workshops/`: Chapter content `chapterN.md` and subparts `chapterN-M.md`; entrypoint is `docs/workshops/index.md`.
+- `docs/slides/`: Slide decks `chapterN-slides.md` for workshop chapters.
 - `md/`: Markdown exports of notebooks; images in `images/`.
 - `scripts/`, `build_book.sh`: Utilities for diagrams and building the PDF/ebook.
 
 
@@ -1,22 +1,25 @@
 # Systematically Improving RAG Applications
 
-A comprehensive course teaching data-driven approaches to building and improving Retrieval-Augmented Generation (RAG) systems. This repository contains course materials, code examples, and a companion book.
+A comprehensive educational resource teaching data-driven approaches to building and improving Retrieval-Augmented Generation systems that get better over time. Learn from real case studies with concrete metrics showing how RAG systems improve from 60% to 85%+ accuracy through systematic measurement and iteration.
 
-## 🎓 Take the Course
+## What You'll Learn
 
-All of this material is supported by the **Systematically Improving RAG Course**.
+Transform RAG from a technical implementation into a continuously improving product through:
 
-[**Click here to get 20% off →**](https://maven.com/applied-llms/rag-playbook?promoCode=EBOOK)
+- **Data-driven evaluation**: Establish metrics before building features
+- **Systematic improvement**: Turn evaluation insights into measurable gains
+- **User feedback loops**: Design systems that learn from real usage
+- **Specialized retrieval**: Build purpose-built retrievers for different content types
+- **Intelligent routing**: Orchestrate multiple specialized components
+- **Production deployment**: Maintain improvement velocity at scale
 
-## Course Overview
+### Real Case Studies Featured
 
-This course teaches you how to systematically improve RAG applications through:
+**Legal Tech Company**: 63% → 87% accuracy over 3 months through systematic error analysis, better chunking, and validation patterns. Generated 50,000+ citation examples for continuous training.
 
-- Data-driven evaluation and metrics
-- Embedding fine-tuning and optimization
-- Query understanding and routing
-- Structured data integration
-- Production deployment strategies
+**Construction Blueprint Search**: 27% → 85% recall in 4 days by using vision models for spatial descriptions. Further improved to 92% for counting queries through bounding box detection.
+
+**Feedback Collection**: 10 → 40 daily submissions (4x improvement) through better UX copy and interactive elements, enabling faster improvement cycles.
 
 ### The RAG Flywheel
 
@@ -31,145 +34,105 @@ The core philosophy centers around the "RAG Flywheel" - a continuous improvement
 
 ```text
 .
-├── cohort_1/        # First cohort materials (6 weeks)
-├── cohort_2/        # Second cohort materials (weeks 0-6)
-├── latest/          # Current course version with latest updates
-│   ├── week0/       # Getting started with Jupyter, LanceDB, and evals
-│   ├── week1/       # RAG evaluation foundations
-│   ├── week2/       # Embedding fine-tuning
-│   ├── week4/       # Query understanding and routing
-│   ├── week5/       # Structured data and metadata
-│   ├── week6/       # Tool selection and product integration
-│   ├── case_study/  # Comprehensive WildChat project
-│   └── extra_kura/  # Advanced notebooks on clustering and classifiers
-├── docs/            # MkDocs documentation source
-│   ├── workshops/   # Detailed chapter guides (0-7) aligned with course weeks
-│   ├── talks/       # Industry expert presentations and case studies
-│   ├── office-hours/# Q&A summaries from cohorts 2 and 3
-│   ├── assets/      # Images and diagrams for documentation
+├── docs/            # Complete workshop series (Chapters 0-7)
+│   ├── workshops/   # Progressive learning path from evaluation to production
+│   ├── talks/       # Industry expert presentations with case studies
+│   ├── office-hours/# Q&A summaries addressing real implementation challenges
 │   └── misc/        # Additional learning resources
-├── data/            # CSV files from industry talks
-├── md/              # Markdown conversions of notebooks
+├── latest/          # Reference implementations and case study code
+│   ├── case_study/  # Comprehensive WildChat project demonstrating concepts
+│   ├── week0-6/     # Code examples aligned with workshop chapters
+│   └── examples/    # Standalone demonstrations
+├── data/            # Real datasets from case studies and talks
 └── mkdocs.yml       # Documentation configuration
 ```
 
-## Course Structure: Weekly Curriculum & Book Chapters
+## Learning Path: Workshop Chapters
 
-The course follows a 6-week structure where each week corresponds to specific workshop chapters in the companion book:
+The workshops follow a systematic progression from evaluation to production:
 
-### Week 1: Starting the Flywheel
+### Chapter 0: Beyond Implementation to Improvement
 
-- **Book Coverage**: Chapter 0 (Introduction) + Chapter 1 (Starting the Flywheel with Data)
-- **Topics**:
-  - Shifting from static implementations to continuously improving products
-  - Overcoming the cold-start problem through synthetic data generation
-  - Establishing meaningful metrics aligned with business goals
-  - RAG as a recommendation engine wrapped around language models
+Mindset shift from technical project to product. See how the legal tech company went from 63% to 87% accuracy by treating RAG as a recommendation engine with continuous feedback loops.
 
-### Week 2: From Evaluation to Enhancement
+### Chapter 1: Starting the Data Flywheel
 
-- **Book Coverage**: Chapter 2 (From Evaluation to Product Enhancement)
-- **Topics**:
-  - Transforming evaluation insights into concrete improvements
-  - Fine-tuning embeddings with Cohere and open-source models
-  - Re-ranking strategies and targeted capability development
+Build evaluation frameworks before you have users. Learn from the blueprint search case: 27% → 85% recall in 4 days through synthetic data and task-specific vision model prompting.
 
-### Week 3: User Experience Design
+### Chapter 2: From Evaluation to Enhancement
 
-- **Book Coverage**: Chapter 3 (UX - 3 parts)
-  - Part 1: Design Principles
-  - Part 2: Feedback Collection
-  - Part 3: Iterative Improvement
-- **Topics**:
-  - Building interfaces that delight users and gather feedback
-  - Creating virtuous cycles of improvement
-  - Continuous refinement based on user interaction
+Turn evaluation insights into measurable improvements. Fine-tuning embeddings delivers 6-10% gains. Learn when to use re-rankers vs custom embeddings based on your data distribution.
 
-### Week 4: Query Understanding & Topic Modeling
+### Chapter 3: User Experience (3 Parts)
 
-- **Book Coverage**: Chapter 4 (Topic Modeling - 2 parts)
-  - Part 1: Analysis - Segmenting users and queries
-  - Part 2: Prioritization - High-value opportunities
-- **Topics**:
-  - Query classification with BERTopic
-  - Pattern discovery in user queries
-  - Creating improvement roadmaps based on usage patterns
+**3.1 - Feedback Collection**: Zapier increased feedback from 10 to 40 submissions/day through better UX copy  
+**3.2 - Perceived Performance**: 11% perception improvement equals 40% reduction in perceived wait time  
+**3.3 - Quality of Life**: Citations, validation, chain-of-thought delivering 18% accuracy improvements
+
+### Chapter 4: Understanding Users (2 Parts)
+
+**4.1 - Finding Patterns**: Construction company discovered 8% of queries (scheduling) drove 35% of churn  
+**4.2 - Prioritization**: Use 2x2 frameworks to choose what to build next based on volume and impact
+
+### Chapter 5: Specialized Retrieval (2 Parts)
+
+**5.1 - Foundations**: Why one-size-fits-all fails. Different queries need different approaches  
+**5.2 - Implementation**: Documents, images, tables, SQL - each needs specialized handling
 
-### Week 5: Multimodal & Structured Data
+### Chapter 6: Unified Architecture (3 Parts)
 
-- **Book Coverage**: Chapter 5 (Multimodal - 2 parts)
-  - Part 1: Understanding different content types
-  - Part 2: Implementation strategies
+**6.1 - Query Routing**: Construction company: 65% → 78% through proper routing (95% × 82% = 78%)  
+**6.2 - Tool Interfaces**: Clean APIs enable parallel development. 40 examples/tool = 95% routing accuracy  
+**6.3 - Performance Measurement**: Two-level metrics separate routing failures from retrieval failures
+
+### Chapter 7: Production Considerations
+
+Maintain improvement velocity at scale. Construction company: 78% → 84% success while scaling 5x query volume and reducing unit costs from $0.09 to $0.04 per query.
+
+- Part 1: Understanding different content types
+- Part 2: Implementation strategies
 - **Topics**:
   - Working with documents, images, tables, and structured data
   - Metadata filtering and Text-to-SQL integration
   - PDF parsing and multimodal embeddings
 
 ### Week 6: Architecture & Product Integration
 
-- **Book Coverage**: Chapter 6 (Architecture - 3 parts)
-  - Part 1: Intelligent routing to specialized components
-  - Part 2: Building and integrating specialized tools
-  - Part 3: Creating unified product experiences
-- **Topics**:
-  - Tool evaluation and selection
-  - Performance optimization strategies
-  - Streaming implementations and production deployment
-
-### Capstone Project
-
-A comprehensive project using the WildChat dataset that covers:
+## Technologies & Tools
 
-- Data exploration and understanding
-- Vector database integration (ChromaDB, LanceDB, Turbopuffer)
-- Synthetic question generation
-- Summarization strategies
-- Complete test suite implementation
-
-## Technologies Used
+The workshops use industry-standard tools for production RAG systems:
 
 - **LLM APIs**: OpenAI, Anthropic, Cohere
 - **Vector Databases**: LanceDB, ChromaDB, Turbopuffer
-- **ML/AI Frameworks**: Sentence-transformers, BERTopic, Transformers
-- **Evaluation Tools**: Braintrust, Pydantic-evals
-- **Monitoring**: Logfire, production monitoring strategies
-- **Data Processing**: Pandas, NumPy, BeautifulSoup, SQLModel
-- **Visualization**: Matplotlib, Seaborn, Streamlit
-- **CLI Framework**: Typer + Rich for interactive command-line tools
-- **Document Processing**: Docling for PDF parsing and analysis
+- **Frameworks**: Sentence-transformers, BERTopic, Transformers, Instructor
+- **Evaluation**: Synthetic data generation, precision/recall metrics, A/B testing
+- **Monitoring**: Logfire, production observability patterns
+- **Processing**: Pandas, SQLModel, Docling for PDF parsing
 
-## Course Book & Documentation
+## Documentation
 
-The `/docs` directory contains a comprehensive book built with MkDocs that serves as the primary learning resource:
+The `/docs` directory contains comprehensive workshop materials built with MkDocs:
 
-### Book Structure
+### Content Overview
 
-- **Introduction & Core Concepts**: The RAG Flywheel philosophy and product-first thinking
-- **Workshop Chapters (0-6)**: Detailed guides that map directly to each course week
-- **Office Hours**: Q&A summaries from Cohorts 2 and 3 with real-world implementation insights
-- **Industry Talks**: Expert presentations including:
-  - RAG Anti-patterns in the Wild
-  - Semantic Search Over the Web
-  - Understanding Embedding Performance
-  - Online Evals and Production Monitoring
-  - RAG Without APIs (Browser-based approaches)
+- **Workshop Chapters (0-7)**: Complete learning path from evaluation to production
+- **Office Hours**: Q&A summaries addressing real implementation challenges
+- **Industry Talks**: Expert presentations on RAG anti-patterns, embedding performance, production monitoring
+- **Case Studies**: Detailed examples with specific metrics and timelines
 
-### Key Themes in the Book
+### Core Philosophy
 
-1. **Product-First Thinking**: Treating RAG as an evolving product, not a static implementation
-2. **Data-Driven Improvement**: Using metrics, evaluations, and user feedback to guide development
-3. **Systematic Approach**: Moving from ad-hoc tweaking to structured improvement processes
-4. **User-Centered Design**: Focusing on user value and experience, not just technical capabilities
-5. **Continuous Learning**: Building systems that improve with every interaction
+1. **Product mindset**: RAG as evolving product, not static implementation
+2. **Data-driven improvement**: Metrics and feedback guide development
+3. **Systematic approach**: Structured improvement processes over ad-hoc tweaking
+4. **User-centered design**: Focus on user value, not just technical capabilities
+5. **Continuous learning**: Systems that improve with every interaction
 
-To build and view the documentation:
+Build and view documentation:
 
 ```bash
-# Serve documentation locally (live reload)
-mkdocs serve
-
-# Build static documentation
-mkdocs build
+mkdocs serve  # Local development with live reload
+mkdocs build  # Static site generation
 ```
 
 ## Getting Started
@@ -235,4 +198,4 @@ This course emphasizes:
 
 ## License
 
-This is educational material for the "Systematically Improving RAG Applications" course.
+This is educational material for the "Systematically Improving RAG Applications" course.
@@ -0,0 +1,26 @@
+# Embedding Latency Benchmark Results
+
+**Text analyzed:** 100 samples, avg 11.8 tokens each
+
+## Key Finding
+
+Embedding latency dominates RAG pipeline performance:
+- Database reads: 8-20ms
+- Embedding generation: 100-500ms (10-25x slower!)
+
+## Results
+
+| Provider/Model                |   Batch Size | P50 (ms)     | P95 (ms)     | P99 (ms)      |   Throughput (emb/s) |   Embeddings | Status   |
+|:------------------------------|-------------:|:-------------|:-------------|:--------------|---------------------:|-------------:|:---------|
+| Cohere/embed-v4.0             |            1 | 287.4 ±110.5 | 447.8 ±6.7   | 453.2 ±1.3    |                 32.1 |          100 | ✅ OK    |
+| Cohere/embed-v4.0             |           10 | 909.6 ±49.7  | 954.5 ±4.8   | 958.4 ±1.0    |                 27.6 |          100 | ✅ OK    |
+| Cohere/embed-v4.0             |           25 | 187.7 ±19.3  | 580.7 ±31.5  | 621.1 ±31.5   |                  3.9 |          100 | ✅ OK    |
+| Gemini/gemini-embedding-001   |            1 | 334.9 ±282.4 | 634.1 ±12.4  | 644.1 ±2.5    |                 24.3 |          100 | ✅ OK    |
+| Gemini/gemini-embedding-001   |           10 | 515.2 ±145.0 | 646.7 ±13.4  | 657.4 ±2.7    |                 48.9 |          100 | ✅ OK    |
+| Gemini/gemini-embedding-001   |           25 | 305.5 ±21.0  | 482.0 ±103.0 | 625.7 ±453.7  |                  3.1 |          100 | ✅ OK    |
+| Openai/text-embedding-3-large |            1 | 576.1 ±81.9  | 751.9 ±40.8  | 784.5 ±8.2    |                 17.4 |          100 | ✅ OK    |
+| Openai/text-embedding-3-large |           10 | 607.0 ±41.4  | 646.2 ±2.2   | 647.9 ±0.4    |                 43.5 |          100 | ✅ OK    |
+| Openai/text-embedding-3-large |           25 | 337.8 ±20.2  | 476.2 ±51.9  | 563.6 ±57.4   |                  2.9 |          100 | ✅ OK    |
+| Openai/text-embedding-3-small |            1 | 986.3 ±31.9  | 1029.1 ±5.2  | 1033.3 ±1.0   |                 10.2 |          100 | ✅ OK    |
+| Openai/text-embedding-3-small |           10 | 1032.0 ±69.6 | 1094.2 ±7.4  | 1100.2 ±1.5   |                 24.4 |          100 | ✅ OK    |
+| Openai/text-embedding-3-small |           25 | 244.1 ±57.9  | 909.7 ±22.3  | 1133.2 ±793.4 |                  2.8 |          100 | ✅ OK    |
@@ -0,0 +1,18 @@
+# Embedding Latency Benchmark Results
+
+**Text analyzed:** 25 samples, avg 14.1 tokens each
+
+## Key Finding
+
+Embedding latency dominates RAG pipeline performance:
+- Database reads: 8-20ms
+- Embedding generation: 100-500ms (10-25x slower!)
+
+## Results
+
+| Provider/Model                |   Batch Size |   P50 (ms) |   P95 (ms) |   P99 (ms) |   Throughput (emb/s) |   Embeddings | Status   |
+|:------------------------------|-------------:|-----------:|-----------:|-----------:|---------------------:|-------------:|:---------|
+| Openai/text-embedding-3-large |            1 |      247.8 |      315   |      329.4 |                  7.5 |           25 | ✅ OK    |
+| Openai/text-embedding-3-large |            2 |      312.8 |      940.5 |     1042.6 |                  4.5 |           25 | ✅ OK    |
+| Openai/text-embedding-3-small |            1 |      390.4 |      689   |      751.4 |                  2.5 |           25 | ✅ OK    |
+| Openai/text-embedding-3-small |            2 |      225.5 |      554.8 |      589.5 |                  3.5 |           25 | ✅ OK    |