You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs: update workshop chapters with enhanced content and structure
This commit updates all workshop chapter documentation with improved explanations,
better structure, and enhanced learning objectives across chapters 0-7. Changes include:
- Refined introduction and takeaway messages in docs/index.md and misc sections
- Enhanced chapter content with clearer explanations and examples
- Improved learning objectives and key insights throughout all chapters
- Better organization and flow across the workshop content
These updates improve the overall learning experience and clarity of the workshop materials.
* chore: add supporting files, backups, and new examples
Add remaining files including:
- Documentation and planning files (CONTENT_INTEGRATION_PLAN.md, EDITORIAL_CHANGES.md, etc.)
- Backup versions of workshop chapters (.bak, .bak2 files)
- New synthetic relevance example in latest/examples/synthetic_relevance/
- Chapter 4 assets (cards, judge feedback, logs)
- Turbopuffer slides PDF
- Utility scripts and notebooks
- Updated slide decks for chapters 0-6
* chore: remove scratch files and backup files (de-slop)
* refactor: move all slides to docs/slides/ directory
* chore: remove temporary/one-off Python scripts
A comprehensive course teaching data-driven approaches to building and improving Retrieval-Augmented Generation (RAG) systems. This repository contains course materials, code examples, and a companion book.
3
+
A comprehensive educational resource teaching data-driven approaches to building and improving Retrieval-Augmented Generation systems that get better over time. Learn from real case studies with concrete metrics showing how RAG systems improve from 60% to 85%+ accuracy through systematic measurement and iteration.
4
4
5
-
## 🎓 Take the Course
5
+
## What You'll Learn
6
6
7
-
All of this material is supported by the **Systematically Improving RAG Course**.
7
+
Transform RAG from a technical implementation into a continuously improving product through:
8
8
9
-
[**Click here to get 20% off →**](https://maven.com/applied-llms/rag-playbook?promoCode=EBOOK)
9
+
-**Data-driven evaluation**: Establish metrics before building features
10
+
-**Systematic improvement**: Turn evaluation insights into measurable gains
11
+
-**User feedback loops**: Design systems that learn from real usage
12
+
-**Specialized retrieval**: Build purpose-built retrievers for different content types
-**Production deployment**: Maintain improvement velocity at scale
10
15
11
-
##Course Overview
16
+
### Real Case Studies Featured
12
17
13
-
This course teaches you how to systematically improve RAG applications through:
18
+
**Legal Tech Company**: 63% → 87% accuracy over 3 months through systematic error analysis, better chunking, and validation patterns. Generated 50,000+ citation examples for continuous training.
14
19
15
-
- Data-driven evaluation and metrics
16
-
- Embedding fine-tuning and optimization
17
-
- Query understanding and routing
18
-
- Structured data integration
19
-
- Production deployment strategies
20
+
**Construction Blueprint Search**: 27% → 85% recall in 4 days by using vision models for spatial descriptions. Further improved to 92% for counting queries through bounding box detection.
21
+
22
+
**Feedback Collection**: 10 → 40 daily submissions (4x improvement) through better UX copy and interactive elements, enabling faster improvement cycles.
20
23
21
24
### The RAG Flywheel
22
25
@@ -31,145 +34,105 @@ The core philosophy centers around the "RAG Flywheel" - a continuous improvement
31
34
32
35
```text
33
36
.
34
-
├── cohort_1/ # First cohort materials (6 weeks)
35
-
├── cohort_2/ # Second cohort materials (weeks 0-6)
36
-
├── latest/ # Current course version with latest updates
37
-
│ ├── week0/ # Getting started with Jupyter, LanceDB, and evals
38
-
│ ├── week1/ # RAG evaluation foundations
39
-
│ ├── week2/ # Embedding fine-tuning
40
-
│ ├── week4/ # Query understanding and routing
41
-
│ ├── week5/ # Structured data and metadata
42
-
│ ├── week6/ # Tool selection and product integration
│ ├── week0-6/ # Code examples aligned with workshop chapters
45
+
│ └── examples/ # Standalone demonstrations
46
+
├── data/ # Real datasets from case studies and talks
53
47
└── mkdocs.yml # Documentation configuration
54
48
```
55
49
56
-
## Course Structure: Weekly Curriculum & Book Chapters
50
+
## Learning Path: Workshop Chapters
57
51
58
-
The course follows a 6-week structure where each week corresponds to specific workshop chapters in the companion book:
52
+
The workshops follow a systematic progression from evaluation to production:
59
53
60
-
### Week 1: Starting the Flywheel
54
+
### Chapter 0: Beyond Implementation to Improvement
61
55
62
-
-**Book Coverage**: Chapter 0 (Introduction) + Chapter 1 (Starting the Flywheel with Data)
63
-
-**Topics**:
64
-
- Shifting from static implementations to continuously improving products
65
-
- Overcoming the cold-start problem through synthetic data generation
66
-
- Establishing meaningful metrics aligned with business goals
67
-
- RAG as a recommendation engine wrapped around language models
56
+
Mindset shift from technical project to product. See how the legal tech company went from 63% to 87% accuracy by treating RAG as a recommendation engine with continuous feedback loops.
68
57
69
-
### Week 2: From Evaluation to Enhancement
58
+
### Chapter 1: Starting the Data Flywheel
70
59
71
-
-**Book Coverage**: Chapter 2 (From Evaluation to Product Enhancement)
72
-
-**Topics**:
73
-
- Transforming evaluation insights into concrete improvements
74
-
- Fine-tuning embeddings with Cohere and open-source models
75
-
- Re-ranking strategies and targeted capability development
60
+
Build evaluation frameworks before you have users. Learn from the blueprint search case: 27% → 85% recall in 4 days through synthetic data and task-specific vision model prompting.
76
61
77
-
### Week 3: User Experience Design
62
+
### Chapter 2: From Evaluation to Enhancement
78
63
79
-
-**Book Coverage**: Chapter 3 (UX - 3 parts)
80
-
- Part 1: Design Principles
81
-
- Part 2: Feedback Collection
82
-
- Part 3: Iterative Improvement
83
-
-**Topics**:
84
-
- Building interfaces that delight users and gather feedback
85
-
- Creating virtuous cycles of improvement
86
-
- Continuous refinement based on user interaction
64
+
Turn evaluation insights into measurable improvements. Fine-tuning embeddings delivers 6-10% gains. Learn when to use re-rankers vs custom embeddings based on your data distribution.
**6.3 - Performance Measurement**: Two-level metrics separate routing failures from retrieval failures
87
+
88
+
### Chapter 7: Production Considerations
89
+
90
+
Maintain improvement velocity at scale. Construction company: 78% → 84% success while scaling 5x query volume and reducing unit costs from $0.09 to $0.04 per query.
91
+
92
+
- Part 1: Understanding different content types
93
+
- Part 2: Implementation strategies
103
94
-**Topics**:
104
95
- Working with documents, images, tables, and structured data
0 commit comments