jxnl
diff --git a/‎docs/workshops/chapter5-1.md‎
Lines changed: 61 additions & 174 deletions b/‎docs/workshops/chapter5-1.md‎
Lines changed: 61 additions & 174 deletions
diff --git a/‎docs/workshops/chapter5-2.md‎
Lines changed: 87 additions & 440 deletions b/‎docs/workshops/chapter5-2.md‎
Lines changed: 87 additions & 440 deletions
diff --git a/‎docs/workshops/chapter5-transcript.txt‎
Lines changed: 323 additions & 0 deletions b/‎docs/workshops/chapter5-transcript.txt‎
Lines changed: 323 additions & 0 deletions
diff --git a/‎docs/workshops/chapter6-1.md‎
Lines changed: 65 additions & 60 deletions b/‎docs/workshops/chapter6-1.md‎
Lines changed: 65 additions & 60 deletions
@@ -12,85 +12,90 @@ tags:
 
 # Query Routing Foundations: Building a Cohesive RAG System
 
-## What This Chapter Covers
+Welcome back to the last core session of systematically improving RAG applications. Last session, we talked mostly about how we improved our search indices one by one. This could mean extracting data, generating summaries, and then using that to combine both lexical search filters and semantic search to create a single index.
 
-- Building unified RAG architectures with query routing
-- Designing tool interfaces for specialized retrievers
-- Implementing effective routing between components
-- Measuring system-level performance
+And today we'll talk about how we can combine them to create a cohesive application.
 
-## The Query Routing Problem
+## The Goal of This Week
 
-In Chapter 5, we built specialized retrievers for different content types. Now we need to decide when to use each one.
+The goal of this week is to talk a little bit more about query routing versus building these specific indices. We'll discuss the challenges we have, how we can improve things with more testing, and more importantly, think about what the UI could look like to improve the way we collect data and improve our search indices.
 
-**Query routing** means directing user queries to the right retrieval components. Without it, even excellent specialized retrievers become useless if they're never called for the right queries.
+Overall, this should be a pretty quick session, and then we'll conclude with some food for thought, and then conclude the course as a whole.
 
-The architecture we'll build:
+## Recap: Two Ways to Improve Search
 
-1. Uses specialized retrievers built from user segmentation data
-2. Routes queries to appropriate components
-3. Provides clear interfaces for both models and users
-4. Collects feedback to improve routing accuracy
+In the previous week, we talked about the idea that there are two ways of improving our ability to search. One of them is to turn chunk data into more structured data, and the other was to build text summaries of our data so that we can fully represent them in full text search or embedding search.
 
-## Tools as APIs Pattern
+## The Construction Company Blueprint Example
 
-Treat each specialized retriever as an API that language models can call. This creates separation between:
+So as a hypothetical, imagine us being a construction company and we want to be able to search over the images of different blueprints. One thing we can do is define a blueprint extractor and extract out a description of the blueprint. And potentially the date the blueprint was set. And so you can imagine in this situation, we know that in this data set, all the images have blueprints and they have some date in the OCR.
 
-1. **Tool Interfaces**: Definitions of what each tool does and its parameters
-2. **Tool Implementations**: The actual retrieval code
-3. **Routing Logic**: Code that selects which tools to call
+Once we extract that and put into a database, now we can think about querying that database.
 
-This is similar to building microservices, except the primary client is a language model rather than another service. The pattern evolved from simple function calling in LLM APIs to more sophisticated tool selection frameworks.
+In the first example, we define a blueprint extractor, which saves a date and a description. Now we can build a search blueprint model that searches the description and potentially has start and end dates. And now we have to define an execute method that builds the query and then sends it off to the database.
 
-### Benefits of the API Approach
+## Building APIs for Your Language Model
 
-- **Clear Boundaries**: Teams work independently on different tools
-- **Testability**: Components can be tested in isolation
-- **Reusability**: Tools work for both LLMs and direct API calls
-- **Scalability**: Add new capabilities without changing existing code
-- **Performance**: Enable parallel execution
-- **Team Structure**: Different teams own different components
+With this simple tool, we can start testing whether or not that document we're looking for is returned in the arguments that we specified.
 
-!!! example "Organizational Structure"
-One effective team structure:
-\- **Interface Team**: Designs the API contracts and tool specifications based on user needs
-\- **Implementation Team**: Builds and optimizes individual retrievers for specific content types
-\- **Router Team**: Creates and optimizes the query routing system
-\- **Evaluation Team**: Tests the performance of the entire system and identifies bottlenecks
+What we've basically done is two things. We've defined the extractor and saved it into a database, and then we defined the search method that calls that database. These are the two primary ideas. And now if you want to use a language model to be able to use this tool, we just have to specify it in the response model.
 
-```mermaid
-graph TD
-    A[User Query] --> B[Query Router]
-    B --> C[Tool Selection]
-    C --> D[Document Tool]
-    C --> E[Image Tool]
-    C --> F[Table Tool]
-    D --> G[Ranking]
-    E --> G
-    F --> G
-    G --> H[Context Assembly]
-    H --> I[Response Generation]
-    I --> J[User Interface]
-```
+I have a prompt that tells the language model that they have access to the search blueprint tool, and I've given it a couple of examples. "Find blueprints for the city hall built in 2010." And you can imagine we can search city hall blueprints with a start and end date.
 
-This architecture resembles modern microservice patterns where specialized services handle specific tasks. The difference is that the "client" making API calls is often a language model rather than another service.
+The idea that these few-shot examples can help us understand how to use this response model.
 
-### Moving from Monolithic to Modular
+Now, when a user asks something like, "can you help me find the plans for the 123 Main Street building," it's gonna be able to use the search blueprint model.
 
-Most RAG systems start monolithic: one vector database, one chunking strategy, one retrieval method. This breaks down as content types diversify.
+Each of these retrieval requests should feel very much like a get request or a post request in a REST API. It's about building an index and then defining some kind of API over that index. But defining many APIs to query that database.
 
-Typical migration path:
+## Model Context Protocol (MCP)
 
-1. **Recognition**: Different queries need different retrieval
-2. **Separation**: Break into specialized components
-3. **Interface**: Define clear contracts between components
-4. **Orchestration**: Build routing layer
+This is something that Model Context Protocol could support in the future that would allow you to build this tool once, expose the protocol once and expose it to many interfaces, whether it's to Claude API, or potentially Cursor.
 
-**Example**: A financial services client migrated from a single vector database to specialized components:
+## Modular API Development Benefits
 
-- Development velocity: 40% faster feature delivery
-- Retrieval quality: 25-35% improvement by query type
-- Team coordination: Fewer cross-team dependencies
-- Scaling: New content types added without disrupting existing features
+By defining these APIs, separating our concerns, making this a lot more modular and allowing bigger teams to work together.
 
-The key was treating each retriever as a service with a clear API contract.
+Individual teams can work on specific APIs, whether it's our ability to search emails versus blueprints or schedules or something else. And it allows us to have a bigger team work together. You realize you're effectively a framework developer for the language model.
+
+From my own experience, I spent many years developing multiple microservices to do retrieval for other teams, and I think moving forward it's gonna feel a lot like building distributed microservices.
+
+## Adding More Capabilities
+
+Now we can take this a lot further.
+
+Originally we had a search blueprint tool that had a description to search against, start and end date to filter. But we might also want to define some more text search abilities. Here we have a search query and also a filter by type. And here we can filter by contracts, proposals, bids, and all.
+
+We could have discovered these were the important capabilities through our data analysis. We may have even discovered that bids, proposals, and contracts were the important kinds of filters through our data analysis and segmentation. And again, you can build a very simple filter to do this.
+
+Here we are running some search query, applying some filter, and then executing the search query. It doesn't really matter whether it's LanceDB or Chroma or Turbopuffer, or Postgres. The detail really is that you're likely gonna be running these queries by yourself.
+
+## Building the Gateway Router
+
+Now what we do is we can define a gateway or router. We can have a system prompt that says that you can search documents and blueprints. We can probably include a bunch of different examples of how we use search blueprints and how we use search text, but this time we're gonna use parallel function calling.
+
+I wanted to notice one thing here. I just talked about two tools, you know, search blueprint and search text. But more likely is that you're gonna have a lot more tools at your disposal. You might want to have a tool that specifically is prompted to clarify what the customer is asking. And other tools to give answers in previous examples.
+
+An answer could contain not only the response, but citations and sources and follow up questions. And this can be all defined in the single model.
+
+And now when we execute a search query, we can basically send it to the search function, return a list of queries, and then gather all the results. And then what we can do is we can pass these results back into a language model that answers the question, and then you can go forward from here.
+
+## The Classic Architecture Pattern
+
+This might harken back to the old school way of doing things with interfaces that we can experiment with. And they can define the interactions of the tools with our client and with our backend. Then we have implementations of individual tools. And then lastly, we have a gateway that puts it all together.
+
+And these boundaries will ultimately help you figure out how to split your team and your resources. Each team can experiment with a different aspect of the interface, the implementation, and the gateway. One team could explore the segmentation of the tools and figure out what the right interfaces are. Another can run experiments to improve the implementation of each one, improving the per tool recall. And then the last team, for example, can test the tools and see how they can be connected and put together through the gateway router system.
+
+And obviously we talked about the first two in sessions four and five.
+
+## What's Next
+
+So this week we're mostly gonna be talking about how we can think about testing. And again, we're gonna go back to the same concepts of precision and recall. You can imagine creating a simple data set in the beginning that is just for a certain question, what were the tools being called?
+
+Once we have a data set that looks like this, we can go back to just doing precision and recall of tool selection.
+
+---
+
+IF you want to get discounts and 6 day email source on the topic make sure to subscribe to
+
+<script async data-uid="010fd9b52b" src="https://fivesixseven.kit.com/010fd9b52b/index.js"></script>