Skip to content

Commit ad40fa9

Browse files
committed
Enhance documentation clarity and consistency across multiple files
- Updated AGENT.md and CLAUDE.md to emphasize the use of `uv` for package management and installation commands, ensuring consistency in dependency management instructions. - Improved formatting in mkdocs.yml by streamlining plugin and markdown extension listings for better readability. - Added new sections and improved existing content in various talks to enhance clarity and user engagement, including better organization of tags and descriptions. - Ensured consistent use of spacing and formatting across all documentation files to align with established style guidelines.
1 parent a91e610 commit ad40fa9

25 files changed

+325
-200
lines changed

AGENT.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
11
# AGENT.md - Systematically Improving RAG Applications
22

33
## Build/Test Commands
4+
45
- Install dependencies: `uv install` or `pip install -e .`
56
- Run tests: `pytest` (pytest>=8.4.1 with pytest-asyncio>=1.0.0)
67
- Build docs: `mkdocs serve` (local) or `mkdocs build` (static)
78
- Package management: Always use `uv` instead of `pip` (e.g., `uv add <package>`)
89

910
## Architecture & Structure
11+
1012
- **Main course content**: `latest/` (current version), `cohort_1/`, `cohort_2/` (previous versions)
1113
- **Weekly modules**: `week0/` (setup), `week1/` (evaluation), `week2/` (embedding tuning), `week4/` (query understanding), `week5/` (multimodal), `week6/` (architecture)
1214
- **Capstone project**: `latest/capstone_project/` - comprehensive WildChat dataset implementation
1315
- **Documentation**: `docs/` (MkDocs book), `md/` (notebook conversions)
1416
- **Technologies**: ChromaDB/LanceDB/Turbopuffer (vector DBs), OpenAI/Anthropic/Cohere (LLMs), Sentence-transformers, BERTopic
1517

1618
## Code Style & Conventions
19+
1720
- **Reading level**: Write at 9th-grade level for educational content
1821
- **CLI tools**: Use `typer` + `rich` for command-line interfaces, never emojis in code
1922
- **Async preferred**: Use async/await over synchronous code when possible
@@ -23,6 +26,7 @@
2326
- **Dependencies**: Core ML stack includes sentence-transformers, transformers, pandas, numpy, scikit-learn
2427

2528
## Key Patterns
29+
2630
- RAG Flywheel: Measure → Analyze → Improve → Iterate
2731
- Evaluation-first approach using Braintrust/pydantic-evals
2832
- Vector database integration with filtering and metadata

CLAUDE.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,37 +9,45 @@ This is an educational course repository for "Systematically Improving RAG Appli
99
## Common Development Commands
1010

1111
### Package Management
12+
1213
- **Install dependencies**: `uv install` or `pip install -e .`
1314
- **Add new packages**: `uv add <package>` (preferred over pip)
1415
- **Sync dependencies**: `uv sync`
1516

1617
### Documentation
18+
1719
- **Local development**: `mkdocs serve` (serves at localhost:8000)
1820
- **Build static site**: `mkdocs build`
1921

2022
### Code Quality
23+
2124
- **Lint and fix**: `uv run ruff check --fix --unsafe-fixes .`
2225
- **Format code**: `uv run ruff format .`
2326

2427
### Running Tests
28+
2529
- **Test runner**: `pytest` (no test files currently in repository)
2630
- **With coverage**: `pytest --cov`
2731

2832
## High-Level Architecture
2933

3034
### Course Structure
35+
3136
The repository follows a progressive learning path organized by cohorts and weeks:
37+
3238
- **latest/**: Current course version with weeks 0-6
3339
- **cohort_1/, cohort_2/**: Previous versions showing course evolution
3440
- **docs/**: MkDocs documentation with workshops, office hours, and talks
3541

3642
### Core RAG Flywheel Philosophy
43+
3744
1. **Measure**: Establish baseline metrics using evaluation frameworks
3845
2. **Analyze**: Identify failure modes and improvement opportunities
3946
3. **Improve**: Implement targeted solutions (embeddings, reranking, etc.)
4047
4. **Iterate**: Continuous improvement based on data
4148

4249
### Key Technologies Stack
50+
4351
- **Vector Databases**: ChromaDB, LanceDB, Turbopuffer
4452
- **LLM APIs**: OpenAI, Anthropic, Cohere
4553
- **Embeddings**: Sentence-transformers, OpenAI text-embedding models
@@ -48,14 +56,17 @@ The repository follows a progressive learning path organized by cohorts and week
4856
- **Async Operations**: asyncio, httpx for concurrent processing
4957

5058
### Case Study Architecture (latest/case_study/)
59+
5160
The WildChat case study demonstrates a complete RAG system:
61+
5262
1. **Data Loading**: Processes conversation data from HuggingFace datasets
5363
2. **Question Generation**: Creates synthetic queries for evaluation
5464
3. **Embedding Pipeline**: Multiple strategies (v1-v5) for document processing
5565
4. **Evaluation Framework**: Comprehensive metrics using Braintrust
5666
5. **Reranking**: Cohere and other reranking models for result optimization
5767

5868
### Code Patterns
69+
5970
- **Async-first**: Use async/await for I/O operations
6071
- **CLI Commands**: Typer commands with Rich console output
6172
- **Evaluation-driven**: Always measure before and after changes
@@ -64,24 +75,28 @@ The WildChat case study demonstrates a complete RAG system:
6475
## Development Guidelines
6576

6677
### When Working on Workshops
78+
6779
- Each week's workshop corresponds to book chapters
6880
- Notebooks should be educational with clear markdown explanations
6981
- Use the latest/ directory for new content
7082

7183
### When Working on the Case Study
84+
7285
- Follow the existing CLI command structure in main.py
7386
- Use Braintrust for evaluation tracking
7487
- Implement new embedding strategies as v6, v7, etc.
7588
- Always include evaluation metrics for new approaches
7689

7790
### Adding New Features
91+
7892
1. Check existing patterns in latest/case_study/
7993
2. Use async operations for API calls
8094
3. Add CLI commands using Typer decorators
8195
4. Include progress bars with Rich for long operations
8296
5. Store results in appropriate formats (Parquet, JSON)
8397

8498
### Documentation Updates
99+
85100
- Update relevant workshop notebooks
86101
- Add entries to docs/ if creating new tutorials
87-
- Keep markdown conversions in md/ directory synchronized
102+
- Keep markdown conversions in md/ directory synchronized

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ A comprehensive course teaching data-driven approaches to building and improving
44

55
## 🎓 Take the Course
66

7-
All of this material is supported by the **Systematically Improving RAG Course**.
7+
All of this material is supported by the **Systematically Improving RAG Course**.
88

99
[**Click here to get 20% off →**](https://maven.com/applied-llms/rag-playbook?promoCode=EBOOK)
1010

@@ -54,7 +54,7 @@ The course follows a 6-week structure where each week corresponds to specific wo
5454
### Week 1: Starting the Flywheel
5555

5656
- **Book Coverage**: Chapter 0 (Introduction) + Chapter 1 (Starting the Flywheel with Data)
57-
- **Topics**:
57+
- **Topics**:
5858
- Shifting from static implementations to continuously improving products
5959
- Overcoming the cold-start problem through synthetic data generation
6060
- Establishing meaningful metrics aligned with business goals

docs/talks/AGENTS.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,43 @@
22

33
> **Project Setup Note:**
44
> To install all dependencies and extras for building and working with this documentation, always use:
5-
>
5+
>
66
> ```sh
77
> uv sync --all-extras
88
> ```
99
1010
## Overview
11+
1112
This directory contains industry talks and presentations from the Systematically Improving RAG Applications series. Each talk provides insights from experts at companies like ChromaDB, Zapier, Glean, Exa, and others, covering practical RAG implementation strategies and lessons learned.
1213
1314
## File Structure
15+
1416
- **Industry expert talks**: 15+ markdown files covering specific RAG topics
1517
- **Organized by chapter**: Talks align with workshop chapters (evaluation, training, UX, etc.)
1618
- **Consistent format**: YAML frontmatter with catchy titles, descriptions, tags, speakers, and dates
1719
- **Study notes format**: Key takeaways and technical insights highlighted
1820
1921
## Title Format Standards
22+
2023
All talk titles follow a **catchy, conversational format** designed to grab attention and communicate value:
2124
2225
### Title Pattern Examples:
26+
2327
- **"Why I Stopped Using RAG for Coding Agents (And You Should Too)"** - Personal story + actionable advice
2428
- **"The RAG Mistakes That Are Killing Your AI (Lessons from Google & LinkedIn)"** - Problem identification + company credibility
2529
- **"Stop Trusting MTEB Rankings - Here's How Chroma Actually Tests Embeddings"** - Contrarian take + insider knowledge
2630
- **"The 12% RAG Performance Boost You're Missing (LanceDB's Re-ranking Secrets)"** - Specific benefit + insider secrets
2731
2832
### Title Principles:
33+
2934
- **Conversational tone**: Use "I", "You", "Why", "How" to make it personal
3035
- **Specific benefits**: Include numbers, percentages, or concrete outcomes when possible
3136
- **Company attribution**: Reference the company/organization for credibility
3237
- **Controversial hooks**: Challenge conventional wisdom or common practices
3338
- **Actionable implications**: Suggest there's something readers should do differently
3439
3540
## Content Standards
41+
3642
- **YAML frontmatter**: catchy title, speaker with company, description, tags, date
3743
- **H1 title**: Matches the YAML title exactly for consistency
3844
- **Study notes structure**: Technical insights with `**Key Takeaway:**` summaries
@@ -41,14 +47,19 @@ All talk titles follow a **catchy, conversational format** designed to grab atte
4147
- **Performance metrics**: Specific numbers and improvements mentioned
4248
4349
## Question Formatting Guidelines
50+
4451
**Main Section Questions**: Use proper markdown headers for navigable sections:
52+
4553
```markdown
4654
## Why is accurate document parsing so critical for AI applications?
55+
4756
## How should you evaluate document parsing performance?
57+
4858
## What are the most challenging document elements to parse correctly?
4959
```
5060
5161
**FAQ Section Questions**: Use bold emphasis within FAQ content:
62+
5263
```markdown
5364
## FAQs
5465
@@ -61,11 +72,13 @@ Document ingestion refers to the process of extracting...
6172
Accurate parsing is critical because...
6273
```
6374
64-
**Key Distinction**:
75+
**Key Distinction**:
76+
6577
- `## Question?` = Main section headers (navigable, structured content)
6678
- `**Question?**` = FAQ emphasis (within content sections only)
6779
6880
## Key Topics Covered
81+
6982
- **"Why I Stopped Using RAG for Coding Agents (And You Should Too)"** - Nik Pash (Cline)
7083
- **"The RAG Mistakes That Are Killing Your AI (Lessons from Google & LinkedIn)"** - Skylar Payne
7184
- **"Stop Trusting MTEB Rankings - Here's How Chroma Actually Tests Embeddings"** - Kelly Hong (Chroma)
@@ -78,6 +91,7 @@ Accurate parsing is critical because...
7891
- **"The 12% RAG Performance Boost You're Missing (LanceDB's Re-ranking Secrets)"** - Ayush (LanceDB)
7992
8093
## Writing Style
94+
8195
- **9th-grade reading level** for accessibility
8296
- **Technical depth** with practical examples
8397
- **Actionable insights** over theoretical concepts
@@ -86,6 +100,7 @@ Accurate parsing is critical because...
86100
- **Conversational tone** that matches the catchy titles
87101
88102
## Formatting Standards
103+
89104
- **Consistent H1 titles**: Match YAML frontmatter exactly
90105
- **Proper markdown structure**: Use ## for main sections, ### for subsections
91106
- **Question headers**: Use `## Question?` format for main section questions (NOT `**Question?**`)
@@ -96,6 +111,7 @@ Accurate parsing is critical because...
96111
- **Company attribution**: Always include company names in titles and content
97112
98113
## Tags and Organization
114+
99115
Common tags include: RAG, coding agents, embeddings, evaluation, feedback systems, enterprise search, query routing, performance optimization, user experience, production monitoring, document parsing, fine-tuning, re-ranking
100116
101117
---

docs/talks/chromadb-anton-chunking.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,15 @@ title: "Text Chunking Strategies for RAG Applications"
33
speaker: Anton
44
cohort: 3
55
description: "Technical session with Anton from ChromaDB on text chunking fundamentals, evaluation methods, and practical tips for improving retrieval performance"
6-
tags: [text chunking, ChromaDB, retrieval performance, semantic chunking, heuristic chunking, evaluation]
6+
tags:
7+
[
8+
text chunking,
9+
ChromaDB,
10+
retrieval performance,
11+
semantic chunking,
12+
heuristic chunking,
13+
evaluation,
14+
]
715
date: 2025-01-01
816
---
917

@@ -12,6 +20,7 @@ date: 2025-01-01
1220
I hosted a special session with Anton from ChromaDB to discuss their latest technical research on text chunking for RAG applications. This session covers the fundamentals of chunking strategies, evaluation methods, and practical tips for improving retrieval performance in your AI systems.
1321

1422
## What is chunking and why is it important for RAG systems?
23+
1524
Chunking is the process of splitting documents into smaller components to enable effective retrieval of relevant information. Despite what many believe, chunking remains critical even as LLM context windows grow larger.
1625

1726
The fundamental purpose of chunking is to find the relevant text for a given query among all the divisions we've created from our documents. This becomes especially important when the information needed to answer a query spans multiple documents.
@@ -23,9 +32,10 @@ There are several compelling reasons why chunking matters regardless of context
2332
3. Information accuracy - Effective chunking eliminates distractors that could confuse the model
2433
4. Retrieval performance - Proper chunking significantly improves your system's ability to find all relevant information
2534

26-
***Key Takeaway:*** Chunking will remain important regardless of how large context windows become because it addresses fundamental challenges in retrieval efficiency, accuracy, and cost management.
35+
**_Key Takeaway:_** Chunking will remain important regardless of how large context windows become because it addresses fundamental challenges in retrieval efficiency, accuracy, and cost management.
2736

2837
## What approaches exist for text chunking?
38+
2939
There are two broad categories of chunking approaches that are currently being used:
3040

3141
Heuristic approaches rely on separator characters (like newlines, question marks, periods) to divide documents based on their existing structure. The most widely used implementation is the recursive character text splitter, which uses a hierarchy of splitting characters to subdivide documents into pieces not exceeding a specified maximum length.
@@ -36,9 +46,10 @@ Semantic approaches are more experimental but promising. These use embedding or
3646

3747
What's particularly interesting is that you can use the same embedding model for both chunking and retrieval, potentially finding an embedding-optimal chunking strategy. Since embeddings are relatively cheap, this approach is becoming more viable.
3848

39-
***Key Takeaway:*** While heuristic approaches like recursive character text splitters are most common today, semantic chunking methods that identify natural topic boundaries show promise for more robust performance across diverse document types.
49+
**_Key Takeaway:_** While heuristic approaches like recursive character text splitters are most common today, semantic chunking methods that identify natural topic boundaries show promise for more robust performance across diverse document types.
4050

4151
## Does chunking strategy actually matter for performance?
52+
4253
According to Anton's research, chunking strategy matters tremendously. Their technical report demonstrates significant performance variations based solely on chunking approach, even when using the same embedding model and retrieval system.
4354

4455
They discovered two fundamental rules of thumb that exist in tension with each other:
@@ -50,9 +61,10 @@ The most important insight, however, is that you must always examine your data.
5061

5162
By looking at your actual chunks, you can develop intuition about how your chunking strategy is working for your specific use case. This is critical because there's likely no universal "best" chunking strategy - the optimal approach depends on your data and task.
5263

53-
***Key Takeaway:*** There's no one-size-fits-all chunking strategy. The best approach depends on your specific data and task, which is why examining your actual chunks is essential for diagnosing retrieval problems.
64+
**_Key Takeaway:_** There's no one-size-fits-all chunking strategy. The best approach depends on your specific data and task, which is why examining your actual chunks is essential for diagnosing retrieval problems.
5465

5566
## How should we evaluate chunking strategies?
67+
5668
When evaluating chunking strategies, focus on the retriever itself rather than the generative output. This differs from traditional information retrieval benchmarks in several important ways:
5769

5870
Recall is the single most important metric. Modern models are increasingly good at ignoring irrelevant information, but they cannot complete a task if you haven't retrieved all the relevant information in the first place.
@@ -63,9 +75,10 @@ Ranking metrics like NDCG (which consider the order of retrieved documents) are
6375

6476
The ChromaDB team has released code for their generative benchmark, which can help evaluate chunking strategies against your specific data.
6577

66-
***Key Takeaway:*** Focus on passage-level recall rather than document-level metrics or ranking-sensitive measures. The model can handle irrelevant information, but it can't work with information that wasn't retrieved.
78+
**_Key Takeaway:_** Focus on passage-level recall rather than document-level metrics or ranking-sensitive measures. The model can handle irrelevant information, but it can't work with information that wasn't retrieved.
6779

6880
## What practical advice can improve our chunking implementation?
81+
6982
The most emphatic advice from Anton was: "Always, always, always look at your data." This point was stressed repeatedly throughout the presentation.
7083

7184
Many retrieval problems stem from poor chunking that isn't apparent until you actually examine the chunks being produced. Default settings in popular libraries often produce surprisingly poor results for specific datasets.
@@ -81,7 +94,7 @@ While better tooling is being developed to help with this process, in the meanti
8194

8295
This approach acknowledges that we're in an interesting era of software development where AI application builders are being forced to learn machine learning best practices that have evolved over decades.
8396

84-
***Key Takeaway:*** No amount of sophisticated algorithms can compensate for not understanding your data. Examining your chunks and evaluating them against representative queries is the most reliable path to improving retrieval performance.
97+
**_Key Takeaway:_** No amount of sophisticated algorithms can compensate for not understanding your data. Examining your chunks and evaluating them against representative queries is the most reliable path to improving retrieval performance.
8598

8699
**Final thoughts on chunking for RAG applications**
87100
The fundamental tension in chunking is between maximizing the use of the embedding model's context window and avoiding the grouping of unrelated information. Finding the right balance requires understanding your specific data and use case.
@@ -92,8 +105,7 @@ As Anton emphasized, retrieval is not a general system but a task-specific one.
92105

93106
The ChromaDB team is developing better tooling to help with this process, but in the meantime, the most reliable approach is to manually examine your chunks and measure passage-level recall against representative queries.
94107

95-
By focusing on these fundamentals rather than blindly applying frameworks or following defaults, you can significantly improve the performance of your RAG applications and deliver better results to your users.
96-
---
108+
## By focusing on these fundamentals rather than blindly applying frameworks or following defaults, you can significantly improve the performance of your RAG applications and deliver better results to your users.
97109

98110
IF you want to get discounts and 6 day email source on the topic make sure to subscribe to
99111

0 commit comments

Comments
 (0)