Skip to content

Commit 0891a6b

Browse files
jxnlclaude
andcommitted
Align chapters 5-7 with conversational style and transcript content
- Rewrite chapters 5-1, 5-2 to follow transcript examples (hardware store, fiscal years, image descriptions) - Update chapters 6-1, 6-2, 6-3 to incorporate construction company examples and specific metrics from transcript - Revise chapter 7 for production considerations with personal anecdotes and practical advice - Add chapter 5 and 6 transcripts for reference - Maintain conversational tone throughout with specific numbers and real examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent f809926 commit 0891a6b

File tree

8 files changed

+1458
-1676
lines changed

8 files changed

+1458
-1676
lines changed

docs/workshops/chapter5-1.md

Lines changed: 61 additions & 174 deletions
Large diffs are not rendered by default.

docs/workshops/chapter5-2.md

Lines changed: 87 additions & 440 deletions
Large diffs are not rendered by default.

docs/workshops/chapter5-transcript.txt

Lines changed: 323 additions & 0 deletions
Large diffs are not rendered by default.

docs/workshops/chapter6-1.md

Lines changed: 65 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -12,85 +12,90 @@ tags:
1212

1313
# Query Routing Foundations: Building a Cohesive RAG System
1414

15-
## What This Chapter Covers
15+
Welcome back to the last core session of systematically improving RAG applications. Last session, we talked mostly about how we improved our search indices one by one. This could mean extracting data, generating summaries, and then using that to combine both lexical search filters and semantic search to create a single index.
1616

17-
- Building unified RAG architectures with query routing
18-
- Designing tool interfaces for specialized retrievers
19-
- Implementing effective routing between components
20-
- Measuring system-level performance
17+
And today we'll talk about how we can combine them to create a cohesive application.
2118

22-
## The Query Routing Problem
19+
## The Goal of This Week
2320

24-
In Chapter 5, we built specialized retrievers for different content types. Now we need to decide when to use each one.
21+
The goal of this week is to talk a little bit more about query routing versus building these specific indices. We'll discuss the challenges we have, how we can improve things with more testing, and more importantly, think about what the UI could look like to improve the way we collect data and improve our search indices.
2522

26-
**Query routing** means directing user queries to the right retrieval components. Without it, even excellent specialized retrievers become useless if they're never called for the right queries.
23+
Overall, this should be a pretty quick session, and then we'll conclude with some food for thought, and then conclude the course as a whole.
2724

28-
The architecture we'll build:
25+
## Recap: Two Ways to Improve Search
2926

30-
1. Uses specialized retrievers built from user segmentation data
31-
2. Routes queries to appropriate components
32-
3. Provides clear interfaces for both models and users
33-
4. Collects feedback to improve routing accuracy
27+
In the previous week, we talked about the idea that there are two ways of improving our ability to search. One of them is to turn chunk data into more structured data, and the other was to build text summaries of our data so that we can fully represent them in full text search or embedding search.
3428

35-
## Tools as APIs Pattern
29+
## The Construction Company Blueprint Example
3630

37-
Treat each specialized retriever as an API that language models can call. This creates separation between:
31+
So as a hypothetical, imagine us being a construction company and we want to be able to search over the images of different blueprints. One thing we can do is define a blueprint extractor and extract out a description of the blueprint. And potentially the date the blueprint was set. And so you can imagine in this situation, we know that in this data set, all the images have blueprints and they have some date in the OCR.
3832

39-
1. **Tool Interfaces**: Definitions of what each tool does and its parameters
40-
2. **Tool Implementations**: The actual retrieval code
41-
3. **Routing Logic**: Code that selects which tools to call
33+
Once we extract that and put into a database, now we can think about querying that database.
4234

43-
This is similar to building microservices, except the primary client is a language model rather than another service. The pattern evolved from simple function calling in LLM APIs to more sophisticated tool selection frameworks.
35+
In the first example, we define a blueprint extractor, which saves a date and a description. Now we can build a search blueprint model that searches the description and potentially has start and end dates. And now we have to define an execute method that builds the query and then sends it off to the database.
4436

45-
### Benefits of the API Approach
37+
## Building APIs for Your Language Model
4638

47-
- **Clear Boundaries**: Teams work independently on different tools
48-
- **Testability**: Components can be tested in isolation
49-
- **Reusability**: Tools work for both LLMs and direct API calls
50-
- **Scalability**: Add new capabilities without changing existing code
51-
- **Performance**: Enable parallel execution
52-
- **Team Structure**: Different teams own different components
39+
With this simple tool, we can start testing whether or not that document we're looking for is returned in the arguments that we specified.
5340

54-
!!! example "Organizational Structure"
55-
One effective team structure:
56-
\- **Interface Team**: Designs the API contracts and tool specifications based on user needs
57-
\- **Implementation Team**: Builds and optimizes individual retrievers for specific content types
58-
\- **Router Team**: Creates and optimizes the query routing system
59-
\- **Evaluation Team**: Tests the performance of the entire system and identifies bottlenecks
41+
What we've basically done is two things. We've defined the extractor and saved it into a database, and then we defined the search method that calls that database. These are the two primary ideas. And now if you want to use a language model to be able to use this tool, we just have to specify it in the response model.
6042

61-
```mermaid
62-
graph TD
63-
A[User Query] --> B[Query Router]
64-
B --> C[Tool Selection]
65-
C --> D[Document Tool]
66-
C --> E[Image Tool]
67-
C --> F[Table Tool]
68-
D --> G[Ranking]
69-
E --> G
70-
F --> G
71-
G --> H[Context Assembly]
72-
H --> I[Response Generation]
73-
I --> J[User Interface]
74-
```
43+
I have a prompt that tells the language model that they have access to the search blueprint tool, and I've given it a couple of examples. "Find blueprints for the city hall built in 2010." And you can imagine we can search city hall blueprints with a start and end date.
7544

76-
This architecture resembles modern microservice patterns where specialized services handle specific tasks. The difference is that the "client" making API calls is often a language model rather than another service.
45+
The idea that these few-shot examples can help us understand how to use this response model.
7746

78-
### Moving from Monolithic to Modular
47+
Now, when a user asks something like, "can you help me find the plans for the 123 Main Street building," it's gonna be able to use the search blueprint model.
7948

80-
Most RAG systems start monolithic: one vector database, one chunking strategy, one retrieval method. This breaks down as content types diversify.
49+
Each of these retrieval requests should feel very much like a get request or a post request in a REST API. It's about building an index and then defining some kind of API over that index. But defining many APIs to query that database.
8150

82-
Typical migration path:
51+
## Model Context Protocol (MCP)
8352

84-
1. **Recognition**: Different queries need different retrieval
85-
2. **Separation**: Break into specialized components
86-
3. **Interface**: Define clear contracts between components
87-
4. **Orchestration**: Build routing layer
53+
This is something that Model Context Protocol could support in the future that would allow you to build this tool once, expose the protocol once and expose it to many interfaces, whether it's to Claude API, or potentially Cursor.
8854

89-
**Example**: A financial services client migrated from a single vector database to specialized components:
55+
## Modular API Development Benefits
9056

91-
- Development velocity: 40% faster feature delivery
92-
- Retrieval quality: 25-35% improvement by query type
93-
- Team coordination: Fewer cross-team dependencies
94-
- Scaling: New content types added without disrupting existing features
57+
By defining these APIs, separating our concerns, making this a lot more modular and allowing bigger teams to work together.
9558

96-
The key was treating each retriever as a service with a clear API contract.
59+
Individual teams can work on specific APIs, whether it's our ability to search emails versus blueprints or schedules or something else. And it allows us to have a bigger team work together. You realize you're effectively a framework developer for the language model.
60+
61+
From my own experience, I spent many years developing multiple microservices to do retrieval for other teams, and I think moving forward it's gonna feel a lot like building distributed microservices.
62+
63+
## Adding More Capabilities
64+
65+
Now we can take this a lot further.
66+
67+
Originally we had a search blueprint tool that had a description to search against, start and end date to filter. But we might also want to define some more text search abilities. Here we have a search query and also a filter by type. And here we can filter by contracts, proposals, bids, and all.
68+
69+
We could have discovered these were the important capabilities through our data analysis. We may have even discovered that bids, proposals, and contracts were the important kinds of filters through our data analysis and segmentation. And again, you can build a very simple filter to do this.
70+
71+
Here we are running some search query, applying some filter, and then executing the search query. It doesn't really matter whether it's LanceDB or Chroma or Turbopuffer, or Postgres. The detail really is that you're likely gonna be running these queries by yourself.
72+
73+
## Building the Gateway Router
74+
75+
Now what we do is we can define a gateway or router. We can have a system prompt that says that you can search documents and blueprints. We can probably include a bunch of different examples of how we use search blueprints and how we use search text, but this time we're gonna use parallel function calling.
76+
77+
I wanted to notice one thing here. I just talked about two tools, you know, search blueprint and search text. But more likely is that you're gonna have a lot more tools at your disposal. You might want to have a tool that specifically is prompted to clarify what the customer is asking. And other tools to give answers in previous examples.
78+
79+
An answer could contain not only the response, but citations and sources and follow up questions. And this can be all defined in the single model.
80+
81+
And now when we execute a search query, we can basically send it to the search function, return a list of queries, and then gather all the results. And then what we can do is we can pass these results back into a language model that answers the question, and then you can go forward from here.
82+
83+
## The Classic Architecture Pattern
84+
85+
This might harken back to the old school way of doing things with interfaces that we can experiment with. And they can define the interactions of the tools with our client and with our backend. Then we have implementations of individual tools. And then lastly, we have a gateway that puts it all together.
86+
87+
And these boundaries will ultimately help you figure out how to split your team and your resources. Each team can experiment with a different aspect of the interface, the implementation, and the gateway. One team could explore the segmentation of the tools and figure out what the right interfaces are. Another can run experiments to improve the implementation of each one, improving the per tool recall. And then the last team, for example, can test the tools and see how they can be connected and put together through the gateway router system.
88+
89+
And obviously we talked about the first two in sessions four and five.
90+
91+
## What's Next
92+
93+
So this week we're mostly gonna be talking about how we can think about testing. And again, we're gonna go back to the same concepts of precision and recall. You can imagine creating a simple data set in the beginning that is just for a certain question, what were the tools being called?
94+
95+
Once we have a data set that looks like this, we can go back to just doing precision and recall of tool selection.
96+
97+
---
98+
99+
IF you want to get discounts and 6 day email source on the topic make sure to subscribe to
100+
101+
<script async data-uid="010fd9b52b" src="https://fivesixseven.kit.com/010fd9b52b/index.js"></script>

0 commit comments

Comments
 (0)