Skip to content

Commit fb851b9

Browse files
authored
docs: expand workshops and add examples (#64)
* docs: update workshop chapters with enhanced content and structure This commit updates all workshop chapter documentation with improved explanations, better structure, and enhanced learning objectives across chapters 0-7. Changes include: - Refined introduction and takeaway messages in docs/index.md and misc sections - Enhanced chapter content with clearer explanations and examples - Improved learning objectives and key insights throughout all chapters - Better organization and flow across the workshop content These updates improve the overall learning experience and clarity of the workshop materials. * chore: add supporting files, backups, and new examples Add remaining files including: - Documentation and planning files (CONTENT_INTEGRATION_PLAN.md, EDITORIAL_CHANGES.md, etc.) - Backup versions of workshop chapters (.bak, .bak2 files) - New synthetic relevance example in latest/examples/synthetic_relevance/ - Chapter 4 assets (cards, judge feedback, logs) - Turbopuffer slides PDF - Utility scripts and notebooks - Updated slide decks for chapters 0-6 * chore: remove scratch files and backup files (de-slop) * refactor: move all slides to docs/slides/ directory * chore: remove temporary/one-off Python scripts
1 parent c89af58 commit fb851b9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+4995
-1996
lines changed

AGENTS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
- `latest/`: Current course code and the WildChat case study (`latest/case_study/{core,pipelines}`).
55
- `cohort_1/`, `cohort_2/`: Earlier cohort materials kept for reference.
66
- `docs/`: MkDocs book sources; site config in `mkdocs.yml`.
7-
- `docs/workshops/`: Chapter content `chapterN.md` and subparts `chapterN-M.md`, plus `chapterN-slides.md`; entrypoint is `docs/workshops/index.md`.
7+
- `docs/workshops/`: Chapter content `chapterN.md` and subparts `chapterN-M.md`; entrypoint is `docs/workshops/index.md`.
8+
- `docs/slides/`: Slide decks `chapterN-slides.md` for workshop chapters.
89
- `md/`: Markdown exports of notebooks; images in `images/`.
910
- `scripts/`, `build_book.sh`: Utilities for diagrams and building the PDF/ebook.
1011

README.md

Lines changed: 79 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,25 @@
11
# Systematically Improving RAG Applications
22

3-
A comprehensive course teaching data-driven approaches to building and improving Retrieval-Augmented Generation (RAG) systems. This repository contains course materials, code examples, and a companion book.
3+
A comprehensive educational resource teaching data-driven approaches to building and improving Retrieval-Augmented Generation systems that get better over time. Learn from real case studies with concrete metrics showing how RAG systems improve from 60% to 85%+ accuracy through systematic measurement and iteration.
44

5-
## 🎓 Take the Course
5+
## What You'll Learn
66

7-
All of this material is supported by the **Systematically Improving RAG Course**.
7+
Transform RAG from a technical implementation into a continuously improving product through:
88

9-
[**Click here to get 20% off →**](https://maven.com/applied-llms/rag-playbook?promoCode=EBOOK)
9+
- **Data-driven evaluation**: Establish metrics before building features
10+
- **Systematic improvement**: Turn evaluation insights into measurable gains
11+
- **User feedback loops**: Design systems that learn from real usage
12+
- **Specialized retrieval**: Build purpose-built retrievers for different content types
13+
- **Intelligent routing**: Orchestrate multiple specialized components
14+
- **Production deployment**: Maintain improvement velocity at scale
1015

11-
## Course Overview
16+
### Real Case Studies Featured
1217

13-
This course teaches you how to systematically improve RAG applications through:
18+
**Legal Tech Company**: 63% → 87% accuracy over 3 months through systematic error analysis, better chunking, and validation patterns. Generated 50,000+ citation examples for continuous training.
1419

15-
- Data-driven evaluation and metrics
16-
- Embedding fine-tuning and optimization
17-
- Query understanding and routing
18-
- Structured data integration
19-
- Production deployment strategies
20+
**Construction Blueprint Search**: 27% → 85% recall in 4 days by using vision models for spatial descriptions. Further improved to 92% for counting queries through bounding box detection.
21+
22+
**Feedback Collection**: 10 → 40 daily submissions (4x improvement) through better UX copy and interactive elements, enabling faster improvement cycles.
2023

2124
### The RAG Flywheel
2225

@@ -31,145 +34,105 @@ The core philosophy centers around the "RAG Flywheel" - a continuous improvement
3134

3235
```text
3336
.
34-
├── cohort_1/ # First cohort materials (6 weeks)
35-
├── cohort_2/ # Second cohort materials (weeks 0-6)
36-
├── latest/ # Current course version with latest updates
37-
│ ├── week0/ # Getting started with Jupyter, LanceDB, and evals
38-
│ ├── week1/ # RAG evaluation foundations
39-
│ ├── week2/ # Embedding fine-tuning
40-
│ ├── week4/ # Query understanding and routing
41-
│ ├── week5/ # Structured data and metadata
42-
│ ├── week6/ # Tool selection and product integration
43-
│ ├── case_study/ # Comprehensive WildChat project
44-
│ └── extra_kura/ # Advanced notebooks on clustering and classifiers
45-
├── docs/ # MkDocs documentation source
46-
│ ├── workshops/ # Detailed chapter guides (0-7) aligned with course weeks
47-
│ ├── talks/ # Industry expert presentations and case studies
48-
│ ├── office-hours/# Q&A summaries from cohorts 2 and 3
49-
│ ├── assets/ # Images and diagrams for documentation
37+
├── docs/ # Complete workshop series (Chapters 0-7)
38+
│ ├── workshops/ # Progressive learning path from evaluation to production
39+
│ ├── talks/ # Industry expert presentations with case studies
40+
│ ├── office-hours/# Q&A summaries addressing real implementation challenges
5041
│ └── misc/ # Additional learning resources
51-
├── data/ # CSV files from industry talks
52-
├── md/ # Markdown conversions of notebooks
42+
├── latest/ # Reference implementations and case study code
43+
│ ├── case_study/ # Comprehensive WildChat project demonstrating concepts
44+
│ ├── week0-6/ # Code examples aligned with workshop chapters
45+
│ └── examples/ # Standalone demonstrations
46+
├── data/ # Real datasets from case studies and talks
5347
└── mkdocs.yml # Documentation configuration
5448
```
5549

56-
## Course Structure: Weekly Curriculum & Book Chapters
50+
## Learning Path: Workshop Chapters
5751

58-
The course follows a 6-week structure where each week corresponds to specific workshop chapters in the companion book:
52+
The workshops follow a systematic progression from evaluation to production:
5953

60-
### Week 1: Starting the Flywheel
54+
### Chapter 0: Beyond Implementation to Improvement
6155

62-
- **Book Coverage**: Chapter 0 (Introduction) + Chapter 1 (Starting the Flywheel with Data)
63-
- **Topics**:
64-
- Shifting from static implementations to continuously improving products
65-
- Overcoming the cold-start problem through synthetic data generation
66-
- Establishing meaningful metrics aligned with business goals
67-
- RAG as a recommendation engine wrapped around language models
56+
Mindset shift from technical project to product. See how the legal tech company went from 63% to 87% accuracy by treating RAG as a recommendation engine with continuous feedback loops.
6857

69-
### Week 2: From Evaluation to Enhancement
58+
### Chapter 1: Starting the Data Flywheel
7059

71-
- **Book Coverage**: Chapter 2 (From Evaluation to Product Enhancement)
72-
- **Topics**:
73-
- Transforming evaluation insights into concrete improvements
74-
- Fine-tuning embeddings with Cohere and open-source models
75-
- Re-ranking strategies and targeted capability development
60+
Build evaluation frameworks before you have users. Learn from the blueprint search case: 27% → 85% recall in 4 days through synthetic data and task-specific vision model prompting.
7661

77-
### Week 3: User Experience Design
62+
### Chapter 2: From Evaluation to Enhancement
7863

79-
- **Book Coverage**: Chapter 3 (UX - 3 parts)
80-
- Part 1: Design Principles
81-
- Part 2: Feedback Collection
82-
- Part 3: Iterative Improvement
83-
- **Topics**:
84-
- Building interfaces that delight users and gather feedback
85-
- Creating virtuous cycles of improvement
86-
- Continuous refinement based on user interaction
64+
Turn evaluation insights into measurable improvements. Fine-tuning embeddings delivers 6-10% gains. Learn when to use re-rankers vs custom embeddings based on your data distribution.
8765

88-
### Week 4: Query Understanding & Topic Modeling
66+
### Chapter 3: User Experience (3 Parts)
8967

90-
- **Book Coverage**: Chapter 4 (Topic Modeling - 2 parts)
91-
- Part 1: Analysis - Segmenting users and queries
92-
- Part 2: Prioritization - High-value opportunities
93-
- **Topics**:
94-
- Query classification with BERTopic
95-
- Pattern discovery in user queries
96-
- Creating improvement roadmaps based on usage patterns
68+
**3.1 - Feedback Collection**: Zapier increased feedback from 10 to 40 submissions/day through better UX copy
69+
**3.2 - Perceived Performance**: 11% perception improvement equals 40% reduction in perceived wait time
70+
**3.3 - Quality of Life**: Citations, validation, chain-of-thought delivering 18% accuracy improvements
71+
72+
### Chapter 4: Understanding Users (2 Parts)
73+
74+
**4.1 - Finding Patterns**: Construction company discovered 8% of queries (scheduling) drove 35% of churn
75+
**4.2 - Prioritization**: Use 2x2 frameworks to choose what to build next based on volume and impact
76+
77+
### Chapter 5: Specialized Retrieval (2 Parts)
78+
79+
**5.1 - Foundations**: Why one-size-fits-all fails. Different queries need different approaches
80+
**5.2 - Implementation**: Documents, images, tables, SQL - each needs specialized handling
9781

98-
### Week 5: Multimodal & Structured Data
82+
### Chapter 6: Unified Architecture (3 Parts)
9983

100-
- **Book Coverage**: Chapter 5 (Multimodal - 2 parts)
101-
- Part 1: Understanding different content types
102-
- Part 2: Implementation strategies
84+
**6.1 - Query Routing**: Construction company: 65% → 78% through proper routing (95% × 82% = 78%)
85+
**6.2 - Tool Interfaces**: Clean APIs enable parallel development. 40 examples/tool = 95% routing accuracy
86+
**6.3 - Performance Measurement**: Two-level metrics separate routing failures from retrieval failures
87+
88+
### Chapter 7: Production Considerations
89+
90+
Maintain improvement velocity at scale. Construction company: 78% → 84% success while scaling 5x query volume and reducing unit costs from $0.09 to $0.04 per query.
91+
92+
- Part 1: Understanding different content types
93+
- Part 2: Implementation strategies
10394
- **Topics**:
10495
- Working with documents, images, tables, and structured data
10596
- Metadata filtering and Text-to-SQL integration
10697
- PDF parsing and multimodal embeddings
10798

10899
### Week 6: Architecture & Product Integration
109100

110-
- **Book Coverage**: Chapter 6 (Architecture - 3 parts)
111-
- Part 1: Intelligent routing to specialized components
112-
- Part 2: Building and integrating specialized tools
113-
- Part 3: Creating unified product experiences
114-
- **Topics**:
115-
- Tool evaluation and selection
116-
- Performance optimization strategies
117-
- Streaming implementations and production deployment
118-
119-
### Capstone Project
120-
121-
A comprehensive project using the WildChat dataset that covers:
101+
## Technologies & Tools
122102

123-
- Data exploration and understanding
124-
- Vector database integration (ChromaDB, LanceDB, Turbopuffer)
125-
- Synthetic question generation
126-
- Summarization strategies
127-
- Complete test suite implementation
128-
129-
## Technologies Used
103+
The workshops use industry-standard tools for production RAG systems:
130104

131105
- **LLM APIs**: OpenAI, Anthropic, Cohere
132106
- **Vector Databases**: LanceDB, ChromaDB, Turbopuffer
133-
- **ML/AI Frameworks**: Sentence-transformers, BERTopic, Transformers
134-
- **Evaluation Tools**: Braintrust, Pydantic-evals
135-
- **Monitoring**: Logfire, production monitoring strategies
136-
- **Data Processing**: Pandas, NumPy, BeautifulSoup, SQLModel
137-
- **Visualization**: Matplotlib, Seaborn, Streamlit
138-
- **CLI Framework**: Typer + Rich for interactive command-line tools
139-
- **Document Processing**: Docling for PDF parsing and analysis
107+
- **Frameworks**: Sentence-transformers, BERTopic, Transformers, Instructor
108+
- **Evaluation**: Synthetic data generation, precision/recall metrics, A/B testing
109+
- **Monitoring**: Logfire, production observability patterns
110+
- **Processing**: Pandas, SQLModel, Docling for PDF parsing
140111

141-
## Course Book & Documentation
112+
## Documentation
142113

143-
The `/docs` directory contains a comprehensive book built with MkDocs that serves as the primary learning resource:
114+
The `/docs` directory contains comprehensive workshop materials built with MkDocs:
144115

145-
### Book Structure
116+
### Content Overview
146117

147-
- **Introduction & Core Concepts**: The RAG Flywheel philosophy and product-first thinking
148-
- **Workshop Chapters (0-6)**: Detailed guides that map directly to each course week
149-
- **Office Hours**: Q&A summaries from Cohorts 2 and 3 with real-world implementation insights
150-
- **Industry Talks**: Expert presentations including:
151-
- RAG Anti-patterns in the Wild
152-
- Semantic Search Over the Web
153-
- Understanding Embedding Performance
154-
- Online Evals and Production Monitoring
155-
- RAG Without APIs (Browser-based approaches)
118+
- **Workshop Chapters (0-7)**: Complete learning path from evaluation to production
119+
- **Office Hours**: Q&A summaries addressing real implementation challenges
120+
- **Industry Talks**: Expert presentations on RAG anti-patterns, embedding performance, production monitoring
121+
- **Case Studies**: Detailed examples with specific metrics and timelines
156122

157-
### Key Themes in the Book
123+
### Core Philosophy
158124

159-
1. **Product-First Thinking**: Treating RAG as an evolving product, not a static implementation
160-
2. **Data-Driven Improvement**: Using metrics, evaluations, and user feedback to guide development
161-
3. **Systematic Approach**: Moving from ad-hoc tweaking to structured improvement processes
162-
4. **User-Centered Design**: Focusing on user value and experience, not just technical capabilities
163-
5. **Continuous Learning**: Building systems that improve with every interaction
125+
1. **Product mindset**: RAG as evolving product, not static implementation
126+
2. **Data-driven improvement**: Metrics and feedback guide development
127+
3. **Systematic approach**: Structured improvement processes over ad-hoc tweaking
128+
4. **User-centered design**: Focus on user value, not just technical capabilities
129+
5. **Continuous learning**: Systems that improve with every interaction
164130

165-
To build and view the documentation:
131+
Build and view documentation:
166132

167133
```bash
168-
# Serve documentation locally (live reload)
169-
mkdocs serve
170-
171-
# Build static documentation
172-
mkdocs build
134+
mkdocs serve # Local development with live reload
135+
mkdocs build # Static site generation
173136
```
174137

175138
## Getting Started
@@ -235,4 +198,4 @@ This course emphasizes:
235198

236199
## License
237200

238-
This is educational material for the "Systematically Improving RAG Applications" course.
201+
This is educational material for the "Systematically Improving RAG Applications" course.

all_providers_test.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Embedding Latency Benchmark Results
2+
3+
**Text analyzed:** 100 samples, avg 11.8 tokens each
4+
5+
## Key Finding
6+
7+
Embedding latency dominates RAG pipeline performance:
8+
- Database reads: 8-20ms
9+
- Embedding generation: 100-500ms (10-25x slower!)
10+
11+
## Results
12+
13+
| Provider/Model | Batch Size | P50 (ms) | P95 (ms) | P99 (ms) | Throughput (emb/s) | Embeddings | Status |
14+
|:------------------------------|-------------:|:-------------|:-------------|:--------------|---------------------:|-------------:|:---------|
15+
| Cohere/embed-v4.0 | 1 | 287.4 ±110.5 | 447.8 ±6.7 | 453.2 ±1.3 | 32.1 | 100 | ✅ OK |
16+
| Cohere/embed-v4.0 | 10 | 909.6 ±49.7 | 954.5 ±4.8 | 958.4 ±1.0 | 27.6 | 100 | ✅ OK |
17+
| Cohere/embed-v4.0 | 25 | 187.7 ±19.3 | 580.7 ±31.5 | 621.1 ±31.5 | 3.9 | 100 | ✅ OK |
18+
| Gemini/gemini-embedding-001 | 1 | 334.9 ±282.4 | 634.1 ±12.4 | 644.1 ±2.5 | 24.3 | 100 | ✅ OK |
19+
| Gemini/gemini-embedding-001 | 10 | 515.2 ±145.0 | 646.7 ±13.4 | 657.4 ±2.7 | 48.9 | 100 | ✅ OK |
20+
| Gemini/gemini-embedding-001 | 25 | 305.5 ±21.0 | 482.0 ±103.0 | 625.7 ±453.7 | 3.1 | 100 | ✅ OK |
21+
| Openai/text-embedding-3-large | 1 | 576.1 ±81.9 | 751.9 ±40.8 | 784.5 ±8.2 | 17.4 | 100 | ✅ OK |
22+
| Openai/text-embedding-3-large | 10 | 607.0 ±41.4 | 646.2 ±2.2 | 647.9 ±0.4 | 43.5 | 100 | ✅ OK |
23+
| Openai/text-embedding-3-large | 25 | 337.8 ±20.2 | 476.2 ±51.9 | 563.6 ±57.4 | 2.9 | 100 | ✅ OK |
24+
| Openai/text-embedding-3-small | 1 | 986.3 ±31.9 | 1029.1 ±5.2 | 1033.3 ±1.0 | 10.2 | 100 | ✅ OK |
25+
| Openai/text-embedding-3-small | 10 | 1032.0 ±69.6 | 1094.2 ±7.4 | 1100.2 ±1.5 | 24.4 | 100 | ✅ OK |
26+
| Openai/text-embedding-3-small | 25 | 244.1 ±57.9 | 909.7 ±22.3 | 1133.2 ±793.4 | 2.8 | 100 | ✅ OK |

benchmark_results.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Embedding Latency Benchmark Results
2+
3+
**Text analyzed:** 25 samples, avg 14.1 tokens each
4+
5+
## Key Finding
6+
7+
Embedding latency dominates RAG pipeline performance:
8+
- Database reads: 8-20ms
9+
- Embedding generation: 100-500ms (10-25x slower!)
10+
11+
## Results
12+
13+
| Provider/Model | Batch Size | P50 (ms) | P95 (ms) | P99 (ms) | Throughput (emb/s) | Embeddings | Status |
14+
|:------------------------------|-------------:|-----------:|-----------:|-----------:|---------------------:|-------------:|:---------|
15+
| Openai/text-embedding-3-large | 1 | 247.8 | 315 | 329.4 | 7.5 | 25 | ✅ OK |
16+
| Openai/text-embedding-3-large | 2 | 312.8 | 940.5 | 1042.6 | 4.5 | 25 | ✅ OK |
17+
| Openai/text-embedding-3-small | 1 | 390.4 | 689 | 751.4 | 2.5 | 25 | ✅ OK |
18+
| Openai/text-embedding-3-small | 2 | 225.5 | 554.8 | 589.5 | 3.5 | 25 | ✅ OK |

0 commit comments

Comments
 (0)