|
| 1 | +# Chapter 0 Slides |
| 2 | + |
| 3 | +## jxnl.co |
| 4 | + |
| 5 | +@jxnlco |
| 6 | + |
| 7 | +## Systematically Improving RAG Applications |
| 8 | + |
| 9 | +**Session 0:** Beyond Implementation to Improvement: A Product Mindset for RAG |
| 10 | + |
| 11 | +Jason Liu |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Welcome to the Course |
| 16 | + |
| 17 | +**Instructor:** Jason Liu - AI/ML Consultant & Educator |
| 18 | + |
| 19 | +**Mission:** Dismantle guesswork in AI development and replace it with structured, measurable, and repeatable processes. |
| 20 | + |
| 21 | +**Your Commitment:** |
| 22 | +- Stick with the material |
| 23 | +- Have conversations with teammates |
| 24 | +- Make time to look at your data |
| 25 | +- Instrument your systems |
| 26 | +- Ask yourself: "What work am I trying to do?" |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +## Who Am I? |
| 31 | + |
| 32 | +**Background:** Computer Vision, Computational Mathematics, Mathematical Physics (University of Waterloo) |
| 33 | + |
| 34 | +**Facebook:** Content Policy, Moderation, Public Risk & Safety |
| 35 | +- Built dashboards and RAG applications to identify harmful content |
| 36 | +- Computational social sciences applications |
| 37 | + |
| 38 | +**Stitch Fix:** Computer Vision, Multimodal Retrieval |
| 39 | +- Variational autoencoders and GANs for GenAI |
| 40 | +- **$50M revenue impact** from recommendation systems |
| 41 | +- $400K annual data curation budget |
| 42 | +- Hundreds of millions of recommendations/week |
| 43 | + |
| 44 | +--- |
| 45 | + |
| 46 | +## Current Focus |
| 47 | + |
| 48 | +**Why Consulting vs Building?** |
| 49 | +- Hand injury in 2021-2022 limited typing |
| 50 | +- Highest leverage: advising startups and education |
| 51 | +- Helping others build while hands recover |
| 52 | + |
| 53 | +**Client Experience:** |
| 54 | +- HubSpot, Zapier, Limitless, and many others |
| 55 | +- Personal assistants, construction AI, research tools |
| 56 | +- Query understanding, prompt optimization, embedding search |
| 57 | +- Fine-tuning, MLOps, and observability |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## Who Are You? |
| 62 | + |
| 63 | +**Cohort Composition:** |
| 64 | +- **30%** Founders and CTOs |
| 65 | +- **20%** Senior Engineers |
| 66 | +- **50%** Software Engineers, Data Scientists, PMs, Solution Engineers, Consultants |
| 67 | + |
| 68 | +**Companies Represented:** |
| 69 | +- OpenAI, Amazon, Microsoft, Google |
| 70 | +- Anthropic, NVIDIA, and many others |
| 71 | + |
| 72 | +**Excited to hear about your challenges and experiences!** |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +## Course Structure: 6-Week Journey |
| 77 | + |
| 78 | +### Week 1: Synthetic Data Generation |
| 79 | +- Create precision/recall evaluations |
| 80 | +- Start with text chunks → synthetic questions |
| 81 | +- Build baseline evaluation suite |
| 82 | + |
| 83 | +### Week 2: Fine-Tuning and Few-Shot Examples |
| 84 | +- Convert evals to few-shot examples |
| 85 | +- Fine-tune models for better performance |
| 86 | +- Evaluate rerankers and methodologies |
| 87 | + |
| 88 | +### Week 3: Deploy and Collect Feedback |
| 89 | +- Deploy system to real users |
| 90 | +- Collect ratings and feedback |
| 91 | +- Improve evals with real user data |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## Course Structure (continued) |
| 96 | + |
| 97 | +### Week 4: Topic Modeling and Segmentation |
| 98 | +- Use clustering to identify valuable topics |
| 99 | +- Decide what to double down on vs abandon |
| 100 | +- Focus resources on economically valuable work |
| 101 | + |
| 102 | +### Week 5: Multimodal RAG Improvements |
| 103 | +- Incorporate images, tables, code search |
| 104 | +- Contextual retrieval and summarization |
| 105 | +- Target specific query segments |
| 106 | + |
| 107 | +### Week 6: Function Calling and Query Understanding |
| 108 | +- Combine all systems with intelligent routing |
| 109 | +- Query → Path selection → Multimodal RAG → Final answer |
| 110 | +- Complete end-to-end orchestration |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## Learning Format |
| 115 | + |
| 116 | +**Asynchronous Lectures (Fridays)** |
| 117 | +- Watch videos on your schedule |
| 118 | +- Take notes and prepare questions |
| 119 | + |
| 120 | +**Office Hours (Tuesdays & Thursdays)** |
| 121 | +- Multiple time zones supported |
| 122 | +- Active learning and discussion |
| 123 | +- Question-driven sessions |
| 124 | + |
| 125 | +**Guest Lectures (Wednesdays)** |
| 126 | +- Industry experts and practitioners |
| 127 | +- Q&A with speakers |
| 128 | +- Real-world case studies |
| 129 | + |
| 130 | +**Slack Community** |
| 131 | +- Ongoing discussions |
| 132 | +- Peer support and collaboration |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## The Critical Mindset Shift |
| 137 | + |
| 138 | +### ❌ Implementation Mindset |
| 139 | +- "We need to implement RAG" |
| 140 | +- Obsessing over embedding dimensions |
| 141 | +- Success = works in demo |
| 142 | +- Big upfront architecture decisions |
| 143 | +- Focus on picking "best" model |
| 144 | + |
| 145 | +### ✅ Product Mindset |
| 146 | +- "We need to help users find answers faster" |
| 147 | +- Tracking answer relevance and task completion |
| 148 | +- Success = users keep coming back |
| 149 | +- Architecture that can evolve |
| 150 | +- Focus on learning from user behavior |
| 151 | + |
| 152 | +**Launching your RAG system is just the beginning!** |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Why Most RAG Implementations Fail |
| 157 | + |
| 158 | +**The Problem:** Treating RAG as a technical project, not a product |
| 159 | + |
| 160 | +**What Happens:** |
| 161 | +1. Focus on technical components (embeddings, vector DB, LLM) |
| 162 | +2. Consider it "complete" when deployed |
| 163 | +3. Works for demos, struggles with real complexity |
| 164 | +4. Users lose trust as limitations surface |
| 165 | +5. No clear metrics or improvement process |
| 166 | +6. Resort to ad-hoc tweaking based on anecdotes |
| 167 | + |
| 168 | +**The Solution:** Product mindset with continuous improvement |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## The Key Insight: RAG as Recommendation Engine |
| 173 | + |
| 174 | +**Stop thinking:** Retrieval → Augmentation → Generation |
| 175 | + |
| 176 | +**Start thinking:** Recommendation Engine + Language Model |
| 177 | + |
| 178 | +``` |
| 179 | +User Query → Query Understanding → Multiple Retrieval Paths |
| 180 | + ↓ |
| 181 | + [Document] [Image] [Table] [Code] |
| 182 | + ↓ |
| 183 | + Filtering & Ranking |
| 184 | + ↓ |
| 185 | + Context Assembly |
| 186 | + ↓ |
| 187 | + Generation |
| 188 | + ↓ |
| 189 | + User Response |
| 190 | + ↓ |
| 191 | + Feedback Loop |
| 192 | +``` |
| 193 | + |
| 194 | +--- |
| 195 | + |
| 196 | +## What This Means |
| 197 | + |
| 198 | +### 1. Generation Quality = Retrieval Quality |
| 199 | +- World's best prompt + garbage context = garbage answers |
| 200 | +- Focus on getting the right information to the LLM |
| 201 | + |
| 202 | +### 2. Different Questions Need Different Strategies |
| 203 | +- Amazon doesn't recommend books like electronics |
| 204 | +- Your RAG shouldn't use same approach for every query |
| 205 | + |
| 206 | +### 3. Feedback Drives Improvement |
| 207 | +- User interactions reveal what works |
| 208 | +- Continuous learning from real usage patterns |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +## What Does Success Look Like? |
| 213 | + |
| 214 | +### Feeling of Success |
| 215 | +- **Less anxiety** when hearing "just make the AI better" |
| 216 | +- **Less overwhelmed** when told to "look at your data" |
| 217 | +- **Confidence** in making data-driven decisions |
| 218 | + |
| 219 | +### Tangible Outcomes |
| 220 | +- Identify high-impact tasks systematically |
| 221 | +- Prioritize and make informed trade-offs |
| 222 | +- Choose metrics that correlate with business outcomes |
| 223 | +- Drive user satisfaction, retention, and usage |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## The System Approach |
| 228 | + |
| 229 | +**What is a System?** |
| 230 | +- Structured approach to solving problems |
| 231 | +- Framework for evaluating technologies |
| 232 | +- Decision-making process for prioritization |
| 233 | +- Methodology for diagnosing performance |
| 234 | +- Standard metrics and benchmarks |
| 235 | + |
| 236 | +**Why Systems Matter:** |
| 237 | +- Frees mental energy for innovation |
| 238 | +- Replaces guesswork with testing |
| 239 | +- Enables quantitative vs "feels better" assessments |
| 240 | +- Secures resources through data-driven arguments |
| 241 | + |
| 242 | +--- |
| 243 | + |
| 244 | +## RAG vs Recommendation Systems |
| 245 | + |
| 246 | +**The Reality:** RAG is a 4-step recommendation system |
| 247 | + |
| 248 | +1. **Multiple Retrieval Indices** (multimodal: images, tables, text) |
| 249 | +2. **Filtering** (top-k selection) |
| 250 | +3. **Scoring/Ranking** (rerankers, relevance) |
| 251 | +4. **Context Assembly** (prepare for generation) |
| 252 | + |
| 253 | +**The Problem:** Engineers focus on generation without knowing if right information is retrieved |
| 254 | + |
| 255 | +**The Solution:** Improve search to improve retrieval to improve generation |
| 256 | + |
| 257 | +--- |
| 258 | + |
| 259 | +## Experimentation Over Implementation |
| 260 | + |
| 261 | +**Instead of:** "Make the AI better" |
| 262 | + |
| 263 | +**Ask:** |
| 264 | +- Why am I looking at this data? |
| 265 | +- What's the goal and hypothesis? |
| 266 | +- What signals am I looking for? |
| 267 | +- Is the juice worth the squeeze? |
| 268 | +- How can I use this to improve? |
| 269 | + |
| 270 | +**Success Formula:** Flywheel in place + Consistent effort = Continuous improvement |
| 271 | + |
| 272 | +Like building muscle: track calories and workouts, don't just weigh yourself daily |
| 273 | + |
| 274 | +--- |
| 275 | + |
| 276 | +## Course Commitments |
| 277 | + |
| 278 | +### My Commitment to You |
| 279 | +- Be online and answer questions |
| 280 | +- Provide extensive office hours support |
| 281 | +- Share real-world experience and case studies |
| 282 | +- Connect you with industry experts |
| 283 | + |
| 284 | +### Your Commitment |
| 285 | +- Engage with the material actively |
| 286 | +- Look at your own data and systems |
| 287 | +- Participate in discussions and office hours |
| 288 | +- Apply learnings to your real projects |
| 289 | + |
| 290 | +**Together, we'll transform your RAG from demo to production-ready product** |
| 291 | + |
| 292 | +--- |
| 293 | + |
| 294 | +## Key Takeaway |
| 295 | + |
| 296 | +> **Successful RAG systems aren't projects that ship once—they're products that improve continuously.** |
| 297 | +
|
| 298 | +The difference between success and failure isn't the embedding model or vector database you choose. |
| 299 | + |
| 300 | +It's whether you treat RAG as: |
| 301 | +- **❌ Static implementation** that slowly decays |
| 302 | +- **✅ Living product** that learns from every interaction |
| 303 | + |
| 304 | +**Let's build systems that get better every week! 🚀** |
| 305 | + |
| 306 | +--- |
| 307 | + |
| 308 | +## Next Week |
| 309 | + |
| 310 | +**Week 1: Kickstart the Data Flywheel** |
| 311 | + |
| 312 | +- Synthetic data generation strategies |
| 313 | +- Building precision/recall evaluations |
| 314 | +- Creating your evaluation foundation |
| 315 | +- "Fake it till you make it" with synthetic data |
| 316 | + |
| 317 | +**Come prepared to look at your data!** |
0 commit comments