jxnl
diff --git a/‎AGENT.md‎
Lines changed: 4 additions & 0 deletions b/‎AGENT.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 16 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 16 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/talks/AGENTS.md‎
Lines changed: 18 additions & 2 deletions b/‎docs/talks/AGENTS.md‎
Lines changed: 18 additions & 2 deletions
diff --git a/‎docs/talks/chromadb-anton-chunking.md‎
Lines changed: 20 additions & 8 deletions b/‎docs/talks/chromadb-anton-chunking.md‎
Lines changed: 20 additions & 8 deletions
@@ -1,19 +1,22 @@
 # AGENT.md - Systematically Improving RAG Applications
 
 ## Build/Test Commands
+
 - Install dependencies: `uv install` or `pip install -e .`
 - Run tests: `pytest` (pytest>=8.4.1 with pytest-asyncio>=1.0.0)
 - Build docs: `mkdocs serve` (local) or `mkdocs build` (static)
 - Package management: Always use `uv` instead of `pip` (e.g., `uv add <package>`)
 
 ## Architecture & Structure
+
 - **Main course content**: `latest/` (current version), `cohort_1/`, `cohort_2/` (previous versions)
 - **Weekly modules**: `week0/` (setup), `week1/` (evaluation), `week2/` (embedding tuning), `week4/` (query understanding), `week5/` (multimodal), `week6/` (architecture)
 - **Capstone project**: `latest/capstone_project/` - comprehensive WildChat dataset implementation
 - **Documentation**: `docs/` (MkDocs book), `md/` (notebook conversions)
 - **Technologies**: ChromaDB/LanceDB/Turbopuffer (vector DBs), OpenAI/Anthropic/Cohere (LLMs), Sentence-transformers, BERTopic
 
 ## Code Style & Conventions
+
 - **Reading level**: Write at 9th-grade level for educational content
 - **CLI tools**: Use `typer` + `rich` for command-line interfaces, never emojis in code
 - **Async preferred**: Use async/await over synchronous code when possible
@@ -23,6 +26,7 @@
 - **Dependencies**: Core ML stack includes sentence-transformers, transformers, pandas, numpy, scikit-learn
 
 ## Key Patterns
+
 - RAG Flywheel: Measure → Analyze → Improve → Iterate
 - Evaluation-first approach using Braintrust/pydantic-evals
 - Vector database integration with filtering and metadata
 
@@ -9,37 +9,45 @@ This is an educational course repository for "Systematically Improving RAG Appli
 ## Common Development Commands
 
 ### Package Management
+
 - **Install dependencies**: `uv install` or `pip install -e .`
 - **Add new packages**: `uv add <package>` (preferred over pip)
 - **Sync dependencies**: `uv sync`
 
 ### Documentation
+
 - **Local development**: `mkdocs serve` (serves at localhost:8000)
 - **Build static site**: `mkdocs build`
 
 ### Code Quality
+
 - **Lint and fix**: `uv run ruff check --fix --unsafe-fixes .`
 - **Format code**: `uv run ruff format .`
 
 ### Running Tests
+
 - **Test runner**: `pytest` (no test files currently in repository)
 - **With coverage**: `pytest --cov`
 
 ## High-Level Architecture
 
 ### Course Structure
+
 The repository follows a progressive learning path organized by cohorts and weeks:
+
 - **latest/**: Current course version with weeks 0-6
 - **cohort_1/, cohort_2/**: Previous versions showing course evolution
 - **docs/**: MkDocs documentation with workshops, office hours, and talks
 
 ### Core RAG Flywheel Philosophy
+
 1. **Measure**: Establish baseline metrics using evaluation frameworks
 2. **Analyze**: Identify failure modes and improvement opportunities
 3. **Improve**: Implement targeted solutions (embeddings, reranking, etc.)
 4. **Iterate**: Continuous improvement based on data
 
 ### Key Technologies Stack
+
 - **Vector Databases**: ChromaDB, LanceDB, Turbopuffer
 - **LLM APIs**: OpenAI, Anthropic, Cohere
 - **Embeddings**: Sentence-transformers, OpenAI text-embedding models
@@ -48,14 +56,17 @@ The repository follows a progressive learning path organized by cohorts and week
 - **Async Operations**: asyncio, httpx for concurrent processing
 
 ### Case Study Architecture (latest/case_study/)
+
 The WildChat case study demonstrates a complete RAG system:
+
 1. **Data Loading**: Processes conversation data from HuggingFace datasets
 2. **Question Generation**: Creates synthetic queries for evaluation
 3. **Embedding Pipeline**: Multiple strategies (v1-v5) for document processing
 4. **Evaluation Framework**: Comprehensive metrics using Braintrust
 5. **Reranking**: Cohere and other reranking models for result optimization
 
 ### Code Patterns
+
 - **Async-first**: Use async/await for I/O operations
 - **CLI Commands**: Typer commands with Rich console output
 - **Evaluation-driven**: Always measure before and after changes
@@ -64,24 +75,28 @@ The WildChat case study demonstrates a complete RAG system:
 ## Development Guidelines
 
 ### When Working on Workshops
+
 - Each week's workshop corresponds to book chapters
 - Notebooks should be educational with clear markdown explanations
 - Use the latest/ directory for new content
 
 ### When Working on the Case Study
+
 - Follow the existing CLI command structure in main.py
 - Use Braintrust for evaluation tracking
 - Implement new embedding strategies as v6, v7, etc.
 - Always include evaluation metrics for new approaches
 
 ### Adding New Features
+
 1. Check existing patterns in latest/case_study/
 2. Use async operations for API calls
 3. Add CLI commands using Typer decorators
 4. Include progress bars with Rich for long operations
 5. Store results in appropriate formats (Parquet, JSON)
 
 ### Documentation Updates
+
 - Update relevant workshop notebooks
 - Add entries to docs/ if creating new tutorials
-- Keep markdown conversions in md/ directory synchronized
+- Keep markdown conversions in md/ directory synchronized
@@ -4,7 +4,7 @@ A comprehensive course teaching data-driven approaches to building and improving
 
 ## 🎓 Take the Course
 
-All of this material is supported by the **Systematically Improving RAG Course**. 
+All of this material is supported by the **Systematically Improving RAG Course**.
 
 [**Click here to get 20% off →**](https://maven.com/applied-llms/rag-playbook?promoCode=EBOOK)
 
@@ -54,7 +54,7 @@ The course follows a 6-week structure where each week corresponds to specific wo
 ### Week 1: Starting the Flywheel
 
 - **Book Coverage**: Chapter 0 (Introduction) + Chapter 1 (Starting the Flywheel with Data)
-- **Topics**: 
+- **Topics**:
   - Shifting from static implementations to continuously improving products
   - Overcoming the cold-start problem through synthetic data generation
   - Establishing meaningful metrics aligned with business goals
 
@@ -2,37 +2,43 @@
 
 > **Project Setup Note:**
 > To install all dependencies and extras for building and working with this documentation, always use:
-> 
+>
 > ```sh
 > uv sync --all-extras
 > ```
 
 ## Overview
+
 This directory contains industry talks and presentations from the Systematically Improving RAG Applications series. Each talk provides insights from experts at companies like ChromaDB, Zapier, Glean, Exa, and others, covering practical RAG implementation strategies and lessons learned.
 
 ## File Structure
+
 - **Industry expert talks**: 15+ markdown files covering specific RAG topics
 - **Organized by chapter**: Talks align with workshop chapters (evaluation, training, UX, etc.)
 - **Consistent format**: YAML frontmatter with catchy titles, descriptions, tags, speakers, and dates
 - **Study notes format**: Key takeaways and technical insights highlighted
 
 ## Title Format Standards
+
 All talk titles follow a **catchy, conversational format** designed to grab attention and communicate value:
 
 ### Title Pattern Examples:
+
 - **"Why I Stopped Using RAG for Coding Agents (And You Should Too)"** - Personal story + actionable advice
 - **"The RAG Mistakes That Are Killing Your AI (Lessons from Google & LinkedIn)"** - Problem identification + company credibility
 - **"Stop Trusting MTEB Rankings - Here's How Chroma Actually Tests Embeddings"** - Contrarian take + insider knowledge
 - **"The 12% RAG Performance Boost You're Missing (LanceDB's Re-ranking Secrets)"** - Specific benefit + insider secrets
 
 ### Title Principles:
+
 - **Conversational tone**: Use "I", "You", "Why", "How" to make it personal
 - **Specific benefits**: Include numbers, percentages, or concrete outcomes when possible
 - **Company attribution**: Reference the company/organization for credibility
 - **Controversial hooks**: Challenge conventional wisdom or common practices
 - **Actionable implications**: Suggest there's something readers should do differently
 
 ## Content Standards
+
 - **YAML frontmatter**: catchy title, speaker with company, description, tags, date
 - **H1 title**: Matches the YAML title exactly for consistency
 - **Study notes structure**: Technical insights with `**Key Takeaway:**` summaries
@@ -41,14 +47,19 @@ All talk titles follow a **catchy, conversational format** designed to grab atte
 - **Performance metrics**: Specific numbers and improvements mentioned
 
 ## Question Formatting Guidelines
+
 **Main Section Questions**: Use proper markdown headers for navigable sections:
+
 ```markdown
 ## Why is accurate document parsing so critical for AI applications?
+
 ## How should you evaluate document parsing performance?
+
 ## What are the most challenging document elements to parse correctly?
 ```
 
 **FAQ Section Questions**: Use bold emphasis within FAQ content:
+
 ```markdown
 ## FAQs
 
@@ -61,11 +72,13 @@ Document ingestion refers to the process of extracting...
 Accurate parsing is critical because...
 ```
 
-**Key Distinction**: 
+**Key Distinction**:
+
 - `## Question?` = Main section headers (navigable, structured content)
 - `**Question?**` = FAQ emphasis (within content sections only)
 
 ## Key Topics Covered
+
 - **"Why I Stopped Using RAG for Coding Agents (And You Should Too)"** - Nik Pash (Cline)
 - **"The RAG Mistakes That Are Killing Your AI (Lessons from Google & LinkedIn)"** - Skylar Payne
 - **"Stop Trusting MTEB Rankings - Here's How Chroma Actually Tests Embeddings"** - Kelly Hong (Chroma)
@@ -78,6 +91,7 @@ Accurate parsing is critical because...
 - **"The 12% RAG Performance Boost You're Missing (LanceDB's Re-ranking Secrets)"** - Ayush (LanceDB)
 
 ## Writing Style
+
 - **9th-grade reading level** for accessibility
 - **Technical depth** with practical examples
 - **Actionable insights** over theoretical concepts
@@ -86,6 +100,7 @@ Accurate parsing is critical because...
 - **Conversational tone** that matches the catchy titles
 
 ## Formatting Standards
+
 - **Consistent H1 titles**: Match YAML frontmatter exactly
 - **Proper markdown structure**: Use ## for main sections, ### for subsections
 - **Question headers**: Use `## Question?` format for main section questions (NOT `**Question?**`)
@@ -96,6 +111,7 @@ Accurate parsing is critical because...
 - **Company attribution**: Always include company names in titles and content
 
 ## Tags and Organization
+
 Common tags include: RAG, coding agents, embeddings, evaluation, feedback systems, enterprise search, query routing, performance optimization, user experience, production monitoring, document parsing, fine-tuning, re-ranking
 
 ---
 
@@ -3,7 +3,15 @@ title: "Text Chunking Strategies for RAG Applications"
 speaker: Anton
 cohort: 3
 description: "Technical session with Anton from ChromaDB on text chunking fundamentals, evaluation methods, and practical tips for improving retrieval performance"
-tags: [text chunking, ChromaDB, retrieval performance, semantic chunking, heuristic chunking, evaluation]
+tags:
+  [
+    text chunking,
+    ChromaDB,
+    retrieval performance,
+    semantic chunking,
+    heuristic chunking,
+    evaluation,
+  ]
 date: 2025-01-01
 ---
 
@@ -12,6 +20,7 @@ date: 2025-01-01
 I hosted a special session with Anton from ChromaDB to discuss their latest technical research on text chunking for RAG applications. This session covers the fundamentals of chunking strategies, evaluation methods, and practical tips for improving retrieval performance in your AI systems.
 
 ## What is chunking and why is it important for RAG systems?
+
 Chunking is the process of splitting documents into smaller components to enable effective retrieval of relevant information. Despite what many believe, chunking remains critical even as LLM context windows grow larger.
 
 The fundamental purpose of chunking is to find the relevant text for a given query among all the divisions we've created from our documents. This becomes especially important when the information needed to answer a query spans multiple documents.
@@ -23,9 +32,10 @@ There are several compelling reasons why chunking matters regardless of context
 3. Information accuracy - Effective chunking eliminates distractors that could confuse the model
 4. Retrieval performance - Proper chunking significantly improves your system's ability to find all relevant information
 
-***Key Takeaway:*** Chunking will remain important regardless of how large context windows become because it addresses fundamental challenges in retrieval efficiency, accuracy, and cost management.
+**_Key Takeaway:_** Chunking will remain important regardless of how large context windows become because it addresses fundamental challenges in retrieval efficiency, accuracy, and cost management.
 
 ## What approaches exist for text chunking?
+
 There are two broad categories of chunking approaches that are currently being used:
 
 Heuristic approaches rely on separator characters (like newlines, question marks, periods) to divide documents based on their existing structure. The most widely used implementation is the recursive character text splitter, which uses a hierarchy of splitting characters to subdivide documents into pieces not exceeding a specified maximum length.
@@ -36,9 +46,10 @@ Semantic approaches are more experimental but promising. These use embedding or
 
 What's particularly interesting is that you can use the same embedding model for both chunking and retrieval, potentially finding an embedding-optimal chunking strategy. Since embeddings are relatively cheap, this approach is becoming more viable.
 
-***Key Takeaway:*** While heuristic approaches like recursive character text splitters are most common today, semantic chunking methods that identify natural topic boundaries show promise for more robust performance across diverse document types.
+**_Key Takeaway:_** While heuristic approaches like recursive character text splitters are most common today, semantic chunking methods that identify natural topic boundaries show promise for more robust performance across diverse document types.
 
 ## Does chunking strategy actually matter for performance?
+
 According to Anton's research, chunking strategy matters tremendously. Their technical report demonstrates significant performance variations based solely on chunking approach, even when using the same embedding model and retrieval system.
 
 They discovered two fundamental rules of thumb that exist in tension with each other:
@@ -50,9 +61,10 @@ The most important insight, however, is that you must always examine your data.
 
 By looking at your actual chunks, you can develop intuition about how your chunking strategy is working for your specific use case. This is critical because there's likely no universal "best" chunking strategy - the optimal approach depends on your data and task.
 
-***Key Takeaway:*** There's no one-size-fits-all chunking strategy. The best approach depends on your specific data and task, which is why examining your actual chunks is essential for diagnosing retrieval problems.
+**_Key Takeaway:_** There's no one-size-fits-all chunking strategy. The best approach depends on your specific data and task, which is why examining your actual chunks is essential for diagnosing retrieval problems.
 
 ## How should we evaluate chunking strategies?
+
 When evaluating chunking strategies, focus on the retriever itself rather than the generative output. This differs from traditional information retrieval benchmarks in several important ways:
 
 Recall is the single most important metric. Modern models are increasingly good at ignoring irrelevant information, but they cannot complete a task if you haven't retrieved all the relevant information in the first place.
@@ -63,9 +75,10 @@ Ranking metrics like NDCG (which consider the order of retrieved documents) are
 
 The ChromaDB team has released code for their generative benchmark, which can help evaluate chunking strategies against your specific data.
 
-***Key Takeaway:*** Focus on passage-level recall rather than document-level metrics or ranking-sensitive measures. The model can handle irrelevant information, but it can't work with information that wasn't retrieved.
+**_Key Takeaway:_** Focus on passage-level recall rather than document-level metrics or ranking-sensitive measures. The model can handle irrelevant information, but it can't work with information that wasn't retrieved.
 
 ## What practical advice can improve our chunking implementation?
+
 The most emphatic advice from Anton was: "Always, always, always look at your data." This point was stressed repeatedly throughout the presentation.
 
 Many retrieval problems stem from poor chunking that isn't apparent until you actually examine the chunks being produced. Default settings in popular libraries often produce surprisingly poor results for specific datasets.
@@ -81,7 +94,7 @@ While better tooling is being developed to help with this process, in the meanti
 
 This approach acknowledges that we're in an interesting era of software development where AI application builders are being forced to learn machine learning best practices that have evolved over decades.
 
-***Key Takeaway:*** No amount of sophisticated algorithms can compensate for not understanding your data. Examining your chunks and evaluating them against representative queries is the most reliable path to improving retrieval performance.
+**_Key Takeaway:_** No amount of sophisticated algorithms can compensate for not understanding your data. Examining your chunks and evaluating them against representative queries is the most reliable path to improving retrieval performance.
 
 **Final thoughts on chunking for RAG applications**
 The fundamental tension in chunking is between maximizing the use of the embedding model's context window and avoiding the grouping of unrelated information. Finding the right balance requires understanding your specific data and use case.
@@ -92,8 +105,7 @@ As Anton emphasized, retrieval is not a general system but a task-specific one.
 
 The ChromaDB team is developing better tooling to help with this process, but in the meantime, the most reliable approach is to manually examine your chunks and measure passage-level recall against representative queries.
 
-By focusing on these fundamentals rather than blindly applying frameworks or following defaults, you can significantly improve the performance of your RAG applications and deliver better results to your users.
----
+## By focusing on these fundamentals rather than blindly applying frameworks or following defaults, you can significantly improve the performance of your RAG applications and deliver better results to your users.
 
 IF you want to get discounts and 6 day email source on the topic make sure to subscribe to