Skip to content

Commit 366660e

Browse files
jxnlclaude
andcommitted
Expand chapters 4-1 and 4-2 with comprehensive content from transcript
- Chapter 4-1: Added segmentation analysis, 2x2 prioritization matrix, construction case study, Anthropic Clio analysis, automation paradox concept - Chapter 4-2: Added inventory vs capability distinction, prioritization framework, roadmap templates, customer support examples, financial search case - Removed all code snippets and replaced with descriptive explanations - Ensured chapters work cohesively as Parts 1 and 2 without redundancy 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 3146b4d commit 366660e

File tree

2 files changed

+944
-16
lines changed

2 files changed

+944
-16
lines changed

docs/workshops/chapter4-1.md

Lines changed: 377 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,21 +13,391 @@ tags:
1313

1414
# Topic Modeling and Analysis: Finding Patterns in User Feedback
1515

16-
## The Problem: Too Much Feedback, Not Enough Insight
16+
## Introduction
1717

1818
So you deployed your RAG system and added feedback collection. Great. Now you've got thousands of queries, ratings, and signals. Your manager asks "What should we improve next?" and you realize you have no idea.
1919

2020
This happened to me. We had tons of data but no systematic way to find patterns. Looking at individual bad ratings wasn't helping - we needed to see the bigger picture.
2121

2222
The solution? Topic modeling and clustering. Instead of reading through feedback one by one, you group similar queries and look for patterns. This lets you find the real problems worth fixing.
2323

24+
**Building on Previous Chapters:**
25+
- **[Chapter 1](chapter1.md)**: Evaluation metrics to measure each segment's performance
26+
- **[Chapter 2](chapter2.md)**: Training data generation from identified patterns
27+
- **[Chapter 3](chapter3-1.md)**: Feedback collection that feeds this analysis
28+
2429
Here's the thing: not all improvements matter equally. Some query types affect 80% of your users. Others might be rare but critical for your biggest customers. You need to know the difference.
2530

26-
Think of it like product management - you segment users and focus on what matters most. Same with RAG queries. A small fix for a common query type beats a perfect solution for something nobody asks.
31+
## Why Segmentation Beats Random Improvements
32+
33+
Let me share an analogy from marketing that really drives this home. Imagine you're selling a product and sales jump 80%. Sounds great, right? But you don't know why. Was it the Super Bowl ad? The new packaging? Pure luck?
34+
35+
Without segmentation, you're flying blind. But if you segment your data, you might discover that 60% of the increase came from 30-45 year old women in the Midwest. Now you know exactly where to double down.
36+
37+
### The Marketing Parallel
38+
39+
This exact approach worked at Stitch Fix. When their sales jumped 80%, they didn't just celebrate—they segmented the data and discovered that 60% of the increase came from 30-45 year old women in the Midwest. This insight was worth millions in targeted marketing spend.
40+
41+
```mermaid
42+
graph TD
43+
A[Total Sales +80%] --> B[Segment Analysis]
44+
B --> C[Midwest Women 30-45: +60%]
45+
B --> D[Urban Men 18-25: +15%]
46+
B --> E[Other Segments: +5%]
47+
48+
C --> F[Target podcasts with<br>this demographic]
49+
D --> G[Maintain current<br>strategy]
50+
E --> H[Monitor only]
51+
52+
style C fill:#90EE90,stroke:#006400,stroke-width:2px
53+
style F fill:#FFD700,stroke:#B8860B,stroke-width:2px
54+
```
55+
56+
Same principle applies to RAG queries. Without segmentation, you see "70% satisfaction" and think you're doing okay. With segmentation, you discover:
57+
- Document search: 85% satisfaction (great!)
58+
- Schedule queries: 35% satisfaction (disaster!)
59+
- Comparison queries: 60% satisfaction (needs work)
60+
61+
Now you know exactly what to fix first.
62+
63+
## The Core Formula for Decision Making
64+
65+
Every improvement decision should be based on this formula:
66+
67+
**Expected Value = Impact × Query Volume % × Probability of Success**
68+
69+
Let's break this down:
70+
- **Impact**: How valuable is solving this? (revenue, user retention, etc.)
71+
- **Query Volume %**: What percentage of total queries fall into this segment?
72+
- **Probability of Success**: How well does your system handle these queries?
73+
74+
### Practical Example: E-commerce Search
75+
76+
| Segment | Impact | Volume % | Success % | Expected Value |
77+
|---------|--------|----------|-----------|----------------|
78+
| Product by SKU | $100/query | 30% | 95% | 28.5 |
79+
| "Affordable shoes" | $50/query | 45% | 40% | 9.0 |
80+
| "Gift ideas under $50" | $75/query | 15% | 20% | 2.25 |
81+
| Technical specs | $25/query | 10% | 85% | 2.13 |
82+
83+
Even though "affordable shoes" has lower individual impact, its high volume and low success rate makes it the #2 priority. This is how you make data-driven decisions.
84+
85+
## Practical Implementation: From Raw Data to Insights
86+
87+
### Step 1: Initial Clustering
88+
89+
Start with embeddings and K-means. Don't overthink this—you're looking for patterns, not perfection.
90+
91+
The process is straightforward:
92+
1. Embed all your queries
93+
2. Use K-means clustering (start with 20 clusters)
94+
3. Group similar queries together
95+
4. Analyze patterns within each cluster
96+
97+
Don't overthink the clustering algorithm—simple K-means works fine. The insights come from manually reviewing the clusters, not from fancy algorithms.
98+
99+
### Step 2: Analyze Each Cluster
100+
101+
For each cluster, you need to understand:
102+
1. What are users actually asking? (sample 10-20 queries)
103+
2. How well are we performing? (average satisfaction)
104+
3. How big is this segment? (percentage of total)
105+
106+
!!! tip "The 10-10 Rule"
107+
For each cluster, manually review:
108+
- 10 queries with positive feedback
109+
- 10 queries with negative feedback
110+
111+
This tells you what's working and what's broken in that segment.
112+
113+
### Step 3: Build a Classification Model
114+
115+
Once you understand your clusters, build a classifier to categorize new queries in real-time:
116+
117+
Build a few-shot classifier using examples from each cluster. Take 3-5 representative queries per cluster and use them to classify new incoming queries. This lets you track segment distributions in real-time without re-clustering everything.
118+
119+
## The 2x2 Prioritization Matrix
120+
121+
Once you have your segments, plot them on this matrix:
122+
123+
```mermaid
124+
graph TD
125+
subgraph "Prioritization Matrix"
126+
A[High Volume<br>High Satisfaction<br>✅ Monitor Only]
127+
B[Low Volume<br>High Satisfaction<br>📢 Promote Features]
128+
C[High Volume<br>Low Satisfaction<br>🚨 DANGER ZONE]
129+
D[Low Volume<br>Low Satisfaction<br>🤔 Cost-Benefit Analysis]
130+
end
131+
132+
style C fill:#FF6B6B,stroke:#C92A2A,stroke-width:3px
133+
style A fill:#51CF66,stroke:#2B8A3E,stroke-width:2px
134+
style B fill:#4DABF7,stroke:#1864AB,stroke-width:2px
135+
style D fill:#FFE066,stroke:#F59F00,stroke-width:2px
136+
```
137+
138+
### What to Do in Each Quadrant
139+
140+
**High Volume + High Satisfaction (Monitor Only)**
141+
- You're doing great here
142+
- Set up alerts if performance drops
143+
- Use as examples of what works
144+
- Consider if you can break this down further
145+
146+
**Low Volume + High Satisfaction (Promote Features)**
147+
- Users don't know you're good at this
148+
- Add UI hints showing these capabilities
149+
- Include in onboarding
150+
- Show example queries below search bar
151+
152+
**High Volume + Low Satisfaction (DANGER ZONE)**
153+
- This is killing your product
154+
- Immediate priority for improvement
155+
- Conduct user research to understand why
156+
- Set sprint goals to fix this
157+
158+
**Low Volume + Low Satisfaction (Cost-Benefit)**
159+
- Maybe you don't need to solve this
160+
- Could be out of scope
161+
- Consider explicitly saying "we don't do that"
162+
- Or find low-effort improvements
163+
164+
## Real-World Case Study: Construction Project Management
165+
166+
Let me share a story that shows why this analysis matters. We built a RAG system for construction project management. The product team was convinced scheduling was the killer feature.
167+
168+
### The Initial Hypothesis
169+
- Product team: "Scheduling is critical"
170+
- Overall metrics: 70% satisfaction (seems okay)
171+
- Decision: Keep improving generally
172+
173+
### What the Data Actually Showed
174+
175+
Query Distribution:
176+
- Document search: 52% of queries (70% satisfaction)
177+
- Scheduling: 8% of queries (25% satisfaction)
178+
- Cost lookup: 15% of queries (82% satisfaction)
179+
- Compliance: 12% of queries (78% satisfaction)
180+
- Other: 13% of queries (65% satisfaction)
181+
182+
But here's the twist—when we looked at user cohorts:
183+
184+
```mermaid
185+
graph LR
186+
A[New Users] -->|Day 1| B[90% Scheduling Queries<br>25% Satisfaction]
187+
B -->|Day 7| C[60% Scheduling<br>40% Document Search]
188+
C -->|Day 30| D[20% Scheduling<br>80% Document Search]
189+
190+
style B fill:#FF6B6B,stroke:#C92A2A
191+
style D fill:#51CF66,stroke:#2B8A3E
192+
```
193+
194+
**The Hidden Pattern**: Users were adapting to our failures! They wanted scheduling but learned it didn't work, so they switched to document search (which worked better).
195+
196+
### The Solution
197+
198+
We fixed scheduling search by:
199+
1. Extracting date metadata from all documents
200+
2. Building a specialized calendar index
201+
3. Adding explicit date filtering capabilities
202+
4. Training the router to detect scheduling queries
203+
204+
Results:
205+
- Scheduling satisfaction: 25% → 78%
206+
- New user retention: +35%
207+
- Document search volume actually increased (users trusted the system more)
208+
209+
!!! warning "User Adaptation Blindness"
210+
Users adapt to your system's limitations. High satisfaction in one area might be masking failures elsewhere. Always look at user journeys, not just aggregate metrics.
211+
212+
## Advanced Segmentation Techniques
213+
214+
### Beyond Simple Clustering
215+
216+
Topic modeling is just the start. Here are advanced techniques that actually move the needle:
217+
218+
#### 1. Multi-Dimensional Segmentation
219+
220+
Don't just cluster by query text. Combine multiple signals:
221+
222+
Don't just cluster by query text. Combine multiple dimensions:
223+
- **Query embeddings**: What they're asking
224+
- **User metadata**: Who's asking (role, account tier)
225+
- **Temporal patterns**: When they ask (hour, day of week)
226+
- **Session context**: What they asked before
227+
228+
This multi-dimensional view reveals patterns invisible in simple clustering. For example, you might find that executives ask comparison queries on Monday mornings while engineers ask debugging queries on Friday afternoons.
229+
230+
#### 2. Conversation Flow Analysis
231+
232+
Look at query sequences, not just individual queries:
233+
234+
Look at query sequences within sessions to identify conversation patterns. Track transitions between query types to understand user journeys.
235+
236+
Common patterns we've found:
237+
- General question → Specific follow-up (good flow)
238+
- Specific question → Rephrase → Rephrase (retrieval failing)
239+
- Question → "Show me more" → Question on different topic (satisfaction signal)
240+
241+
#### 3. Failure Mode Analysis
242+
243+
Group queries by why they failed, not just that they failed:
244+
245+
Common failure modes to track:
246+
- **No results**: Lexical search returned nothing
247+
- **Low similarity**: Best match below 0.5 cosine similarity
248+
- **Wrong intent**: Misclassified query type
249+
- **Missing metadata**: Required filter not available
250+
- **Timeout**: Query took over 10 seconds
251+
- **Hallucination**: Answer not grounded in sources
252+
253+
This tells you exactly what to fix for each segment.
254+
255+
## Building Your Classification Pipeline
256+
257+
### From Exploration to Production
258+
259+
Once you've identified your segments, you need a production pipeline:
260+
261+
### From Exploration to Production
262+
263+
Once you've identified your segments, build a production pipeline that:
264+
1. Classifies incoming queries in real-time
265+
2. Detects required capabilities (comparison, summarization, filtering)
266+
3. Assigns queries to appropriate segments
267+
4. Tracks expected difficulty and historical satisfaction
268+
5. Suggests the best retriever for each segment
269+
270+
Capability detection is simple pattern matching:
271+
- Words like "compare", "versus" → comparison capability
272+
- Words like "summarize", "overview" → summarization capability
273+
- Year patterns (2022, 2023) → temporal filtering
274+
- Question words (how, why, what) → explanation capability
275+
276+
### Monitoring Dashboard Essentials
277+
278+
Track these metrics for each segment:
279+
280+
Essential metrics to track for each segment:
281+
- **Volume percentage**: What % of total queries
282+
- **Satisfaction score**: Average user satisfaction
283+
- **Retrieval quality**: Average cosine similarity
284+
- **Response time**: P50 and P95 latency
285+
- **Trend direction**: Improving or declining
286+
- **User retention**: Do users return after these queries
287+
- **Escalation rate**: How often users contact support
288+
289+
!!! example "Dashboard Implementation"
290+
Your dashboard should show:
291+
- Volume as percentage of total
292+
- Average satisfaction score
293+
- Retrieval quality distribution
294+
- Top 5 failure examples
295+
- Trend over time
296+
- Actionable recommendations
297+
- Alert conditions (performance drops)
298+
299+
## Common Patterns and Anti-Patterns
300+
301+
### Patterns That Work
302+
303+
**1. The Other Category**
304+
Always include an "other" category in your classification. When it grows above 10-15%, it's time to re-cluster.
305+
306+
**2. Cohort-Based Analysis**
307+
Look at segments across user cohorts:
308+
- New vs. returning users
309+
- Free vs. paid tiers
310+
- Different industries/use cases
311+
312+
**3. The Feedback Loop**
313+
Successful improvements change user behavior. After fixing scheduling (from our case study), document search queries actually increased because users trusted the system more.
314+
315+
### The Automation Paradox
316+
317+
I learned this from an operations book years ago, and it applies perfectly to RAG systems: automation saves time, but issues multiply if left unchecked.
318+
319+
Imagine a machine punching holes in metal sheets. If it's miscalibrated by an inch, and you don't check for a week, you've ruined thousands of products. The same principle applies to RAG—small retrieval issues compound into major user experience problems if you're not monitoring.
320+
321+
The solution is high-quality sampling at regular intervals. Check your segments weekly. Monitor that "other" category religiously—when it grows above 10%, it's time to re-cluster. This is your early warning system for concept drift.
322+
323+
Think of the "other" category as your canary in the coal mine. New query patterns emerge here first. Maybe you onboarded a new customer with different needs. Maybe a product update changed how users interact with your system. The "other" category tells you when your current segmentation is becoming stale.
324+
325+
### Anti-Patterns to Avoid
326+
327+
**1. Over-Segmentation**
328+
Having 100 micro-segments isn't actionable. Start with 10-20 and refine from there.
329+
330+
**2. Ignoring Cross-Segment Patterns**
331+
The same capability issue (like date filtering) might affect multiple topic segments.
332+
333+
**3. Static Segmentation**
334+
User behavior evolves. Re-run clustering monthly and track drift in your "other" category.
335+
336+
## Practical Exercises
337+
338+
### Exercise 1: Identify Your Segments
339+
340+
1. Load your query logs
341+
2. Generate embeddings for all queries
342+
3. Cluster into 15-20 groups
343+
4. For each cluster:
344+
- Check the size (% of total)
345+
- Review sample queries
346+
- Calculate satisfaction metrics
347+
- Identify whether it's inventory or capability issue
348+
349+
### Exercise 2: Build Your Classification Model
350+
351+
1. Take 10 examples from each analyzed cluster
352+
2. Create a few-shot classifier with these examples
353+
3. Test on 100 recent queries
354+
4. Validate classifications against manual labels
355+
5. Aim for 80%+ accuracy before deploying
356+
357+
## Real-World Validation: Anthropic's Clio Analysis
358+
359+
Anthropic used their Clio tool—a privacy-preserving analysis system—to analyze millions of Claude conversations. The results were striking: computer science and mathematics usage was dramatically above baseline compared to other fields.
360+
361+
Clio revealed that Natural Sciences and Mathematics showed 15.2% representation in Claude.ai compared to only 9.2% student enrollment in these fields. This over-indexing suggests Claude provides exceptional value for technical tasks.
362+
363+
But here's the strategic question this raises: Should Anthropic double down on computer/math capabilities where they're already strong? Or invest in underperforming areas like humanities and social sciences that have growth potential?
364+
365+
This is exactly the kind of decision your segmentation analysis enables. The data transforms subjective debates ("I think we should focus on X") into objective discussions ("Segment X represents 40% of queries with only 30% satisfaction").
366+
367+
## Comparing Organizations
368+
369+
When you have multiple customers or organizations using your system, compare their patterns. We had a client onboard Home Depot and Walmart on consecutive days. By comparing average Cohere ranker scores between them, we discovered Walmart's data was less rich, leading to worse retrieval.
370+
371+
This organization-level comparison helps identify:
372+
- Data quality issues
373+
- Different use patterns
374+
- Training needs
375+
- Custom requirements per customer
376+
377+
## Integration with Other Chapters
378+
379+
This segmentation analysis feeds directly into:
380+
381+
- **[Chapter 5](../chapter5-1.md)**: Building specialized retrievers for identified segments
382+
- **[Chapter 6](../chapter6-1.md)**: Routing queries to appropriate specialized systems
383+
- **[Chapter 2](../chapter2.md)**: Creating training data for underperforming segments
384+
385+
## Key Takeaways
386+
387+
1. **Segmentation reveals hidden patterns** - Aggregate metrics hide important details
388+
2. **Use the 2x2 matrix** - Volume vs. satisfaction tells you what to prioritize
389+
3. **Users adapt to failures** - Look at journey patterns, not just point-in-time metrics
390+
4. **Topic ≠ Capability** - Segment by both what users ask and what they want done
391+
5. **Monitor the "other" category** - Growing "other" means new patterns emerging
392+
393+
## Next Steps
394+
395+
In [Chapter 4-2](chapter4-2.md), we'll dive into how to turn these segments into a strategic roadmap, distinguishing between inventory and capability issues, and building a systematic improvement plan.
396+
397+
---
27398

28-
I'll show you how to:
399+
--8<--
400+
"snippets/enrollment-button.md"
401+
--8<--
29402

30-
- Segment queries into meaningful groups
31-
- Find performance patterns that actually matter
32-
- Build a roadmap based on data, not guesswork
33-
- Know exactly where to spend your time
403+
---

0 commit comments

Comments
 (0)