Skip to content

Commit 0e0a8e7

Browse files
jeremymanningclaude
andcommitted
style(lecture17): lowercase after colons, scale down overflow slides
- Apply lowercase style after colons throughout (parametric/non-parametric knowledge, hallucination/stale data/citations, embedding models, chunking strategies, advanced techniques, limitations, best practices) - Scale down Beyond basic RAG slide to 85% - Scale down ChromaDB slide to 70% Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent bcf20e0 commit 0e0a8e7

File tree

3 files changed

+48
-47
lines changed

3 files changed

+48
-47
lines changed

slides/week5/lecture17.html

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -724,14 +724,14 @@ <h1 id="announcements">Announcements</h1>
724724
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="4" data-theme="cdl-theme" lang="C" style="--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
725725
<h1 id="the-knowledge-problem">The knowledge problem</h1>
726726
<div class="definition-box" data-title="Two kinds of knowledge">
727-
<p><strong>Parametric knowledge</strong>: Facts stored <em>inside</em> the model's parameters during training. The model &quot;knows&quot; things because it memorized patterns from its training data.</p>
728-
<p><strong>Non-parametric knowledge</strong>: Facts stored <em>outside</em> the model in documents, databases, or APIs. The model accesses this knowledge at inference time.</p>
727+
<p><strong>Parametric knowledge</strong>: facts stored <em>inside</em> the model's parameters during training. The model &quot;knows&quot; things because it memorized patterns from its training data.</p>
728+
<p><strong>Non-parametric knowledge</strong>: facts stored <em>outside</em> the model in documents, databases, or APIs. The model accesses this knowledge at inference time.</p>
729729
</div>
730730
<div class="warning-box" data-title="Why parametric knowledge falls short">
731731
<ol>
732-
<li><strong>Hallucination</strong>: The model confidently generates plausible-sounding but incorrect information</li>
733-
<li><strong>Stale data</strong>: Knowledge is frozen at training time — the model doesn't know about recent events</li>
734-
<li><strong>No citations</strong>: The model can't tell you <em>where</em> it learned something, making claims hard to verify</li>
732+
<li><strong>Hallucination</strong>: the model confidently generates plausible-sounding but incorrect information</li>
733+
<li><strong>Stale data</strong>: knowledge is frozen at training time — the model doesn't know about recent events</li>
734+
<li><strong>No citations</strong>: the model can't tell you <em>where</em> it learned something, making claims hard to verify</li>
735735
</ol>
736736
</div>
737737
</section>
@@ -793,9 +793,9 @@ <h1 id="embeddings-for-retrieval">Embeddings for retrieval</h1>
793793
<div class="tip-box" data-title="Which embedding model?">
794794
<p>For RAG, we use <strong>sentence embedding</strong> models (not word-level) that produce a single vector for an entire passage. All of these are free and open-source:</p>
795795
<ul>
796-
<li><strong>Sentence-BERT</strong> (all-MiniLM-L6-v2): Fast, good quality, 384 dimensions</li>
797-
<li><strong>BGE</strong> (BAAI/bge-small-en-v1.5): State-of-the-art for retrieval</li>
798-
<li><strong>E5, GTE</strong>: Strong open-source alternatives</li>
796+
<li><strong>Sentence-BERT</strong> (all-MiniLM-L6-v2): fast, good quality, 384 dimensions</li>
797+
<li><strong>BGE</strong> (BAAI/bge-small-en-v1.5): state-of-the-art for retrieval</li>
798+
<li><strong>E5, GTE</strong>: strong open-source alternatives</li>
799799
</ul>
800800
</div>
801801
</section>
@@ -806,10 +806,10 @@ <h1 id="document-chunking">Document chunking</h1>
806806
</div>
807807
<div class="tip-box" data-title="Chunking strategies">
808808
<ul>
809-
<li><strong>Fixed-size</strong>: Split every <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span> characters/tokens (simple but may cut mid-sentence)</li>
810-
<li><strong>Sentence-based</strong>: Split on sentence boundaries (preserves meaning better)</li>
811-
<li><strong>Semantic</strong>: Use topic shifts to determine chunk boundaries (best quality but most complex)</li>
812-
<li><strong>Recursive</strong>: Split on paragraphs first, then sentences if chunks are still too long</li>
809+
<li><strong>Fixed-size</strong>: split every <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span> characters/tokens (simple but may cut mid-sentence)</li>
810+
<li><strong>Sentence-based</strong>: split on sentence boundaries (preserves meaning better)</li>
811+
<li><strong>Semantic</strong>: use topic shifts to determine chunk boundaries (best quality but most complex)</li>
812+
<li><strong>Recursive</strong>: split on paragraphs first, then sentences if chunks are still too long</li>
813813
</ul>
814814
</div>
815815
<div class="warning-box" data-title="Size tradeoffs">
@@ -905,7 +905,7 @@ <h1 id="python-generation-with-context-1">Python: generation with context</h1>
905905
<p>FLAN-T5 runs locally in Colab. For better quality, try <code>google/flan-t5-large</code> (requires GPU).</p>
906906
</div>
907907
</section>
908-
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="14" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
908+
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="14" data-class="scale-70" data-theme="cdl-theme" lang="C" class="scale-70" style="--class:scale-70;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
909909
<h1 id="end-to-end-rag-with-chromadb">End-to-end RAG with ChromaDB</h1>
910910
<div class="example-box" data-title="A complete mini RAG system">
911911
<pre><code class="language-python has-line-numbers" data-start-line="1">
@@ -1003,19 +1003,19 @@ <h1 id="rag-vs-fine-tuning">RAG vs fine-tuning</h1>
10031003
<p>Many production systems use <em>both</em>: fine-tune the model for the domain's style and behavior, then use RAG for up-to-date factual knowledge.</p>
10041004
</div>
10051005
</section>
1006-
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="18" data-theme="cdl-theme" lang="C" style="--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
1006+
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="18" data-class="scale-85" data-theme="cdl-theme" lang="C" class="scale-85" style="--class:scale-85;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
10071007
<h1 id="beyond-basic-rag">Beyond basic RAG</h1>
10081008
<div class="note-box" data-title="Advanced techniques">
10091009
<ol>
1010-
<li><strong>Re-ranking</strong>: After initial retrieval, use a cross-encoder model to re-score and re-order results for better precision</li>
1011-
<li><strong>Hybrid search</strong>: Combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval</li>
1012-
<li><strong>Query expansion</strong>: Rewrite or expand the user's query using an LLM before retrieval to improve recall</li>
1010+
<li><strong>Re-ranking</strong>: after initial retrieval, use a cross-encoder model to re-score and re-order results for better precision</li>
1011+
<li><strong>Hybrid search</strong>: combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval</li>
1012+
<li><strong>Query expansion</strong>: rewrite or expand the user's query using an LLM before retrieval to improve recall</li>
10131013
</ol>
10141014
</div>
10151015
<div class="warning-box" data-title="Limitations to keep in mind">
10161016
<ol>
1017-
<li><strong>Retrieval quality</strong>: If the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out</li>
1018-
<li><strong>Context window limits</strong>: There's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization</li>
1017+
<li><strong>Retrieval quality</strong>: if the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out</li>
1018+
<li><strong>Context window limits</strong>: there's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization</li>
10191019
<li><strong>Multi-source reasoning</strong>: RAG is great for finding a specific fact, but struggles when answers require synthesizing information across many documents</li>
10201020
</ol>
10211021
</div>
@@ -1024,11 +1024,11 @@ <h1 id="beyond-basic-rag">Beyond basic RAG</h1>
10241024
<h1 id="best-practices">Best practices</h1>
10251025
<div class="tip-box" data-title="Making RAG work well">
10261026
<ol>
1027-
<li><strong>Chunk wisely</strong>: Experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.</li>
1028-
<li><strong>Choose the right embedding model</strong>: Test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.</li>
1029-
<li><strong>Evaluate retrieval separately</strong>: Before blaming the LLM, check if the retriever is finding the right documents.</li>
1030-
<li><strong>Include metadata</strong>: Store source, date, and section info with chunks so the LLM can cite sources.</li>
1031-
<li><strong>Handle &quot;I don't know&quot;</strong>: Instruct the model to say when the context doesn't contain the answer rather than hallucinating.</li>
1027+
<li><strong>Chunk wisely</strong>: experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.</li>
1028+
<li><strong>Choose the right embedding model</strong>: test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.</li>
1029+
<li><strong>Evaluate retrieval separately</strong>: before blaming the LLM, check if the retriever is finding the right documents.</li>
1030+
<li><strong>Include metadata</strong>: store source, date, and section info with chunks so the LLM can cite sources.</li>
1031+
<li><strong>Handle &quot;I don't know&quot;</strong>: instruct the model to say when the context doesn't contain the answer rather than hallucinating.</li>
10321032
</ol>
10331033
</div>
10341034
</section>

slides/week5/lecture17.md

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -52,17 +52,17 @@ Due: **February 16 at 11:59 PM EST**.
5252

5353
<div class="definition-box" data-title="Two kinds of knowledge">
5454

55-
**Parametric knowledge**: Facts stored *inside* the model's parameters during training. The model "knows" things because it memorized patterns from its training data.
55+
**Parametric knowledge**: facts stored *inside* the model's parameters during training. The model "knows" things because it memorized patterns from its training data.
5656

57-
**Non-parametric knowledge**: Facts stored *outside* the model in documents, databases, or APIs. The model accesses this knowledge at inference time.
57+
**Non-parametric knowledge**: facts stored *outside* the model in documents, databases, or APIs. The model accesses this knowledge at inference time.
5858

5959
</div>
6060

6161
<div class="warning-box" data-title="Why parametric knowledge falls short">
6262

63-
1. **Hallucination**: The model confidently generates plausible-sounding but incorrect information
64-
2. **Stale data**: Knowledge is frozen at training time — the model doesn't know about recent events
65-
3. **No citations**: The model can't tell you *where* it learned something, making claims hard to verify
63+
1. **Hallucination**: the model confidently generates plausible-sounding but incorrect information
64+
2. **Stale data**: knowledge is frozen at training time — the model doesn't know about recent events
65+
3. **No citations**: the model can't tell you *where* it learned something, making claims hard to verify
6666

6767
</div>
6868

@@ -112,9 +112,9 @@ You already know that embeddings map text to vectors where **semantic similarity
112112
<div class="tip-box" data-title="Which embedding model?">
113113

114114
For RAG, we use **sentence embedding** models (not word-level) that produce a single vector for an entire passage. All of these are free and open-source:
115-
- **Sentence-BERT** (all-MiniLM-L6-v2): Fast, good quality, 384 dimensions
116-
- **BGE** (BAAI/bge-small-en-v1.5): State-of-the-art for retrieval
117-
- **E5, GTE**: Strong open-source alternatives
115+
- **Sentence-BERT** (all-MiniLM-L6-v2): fast, good quality, 384 dimensions
116+
- **BGE** (BAAI/bge-small-en-v1.5): state-of-the-art for retrieval
117+
- **E5, GTE**: strong open-source alternatives
118118

119119
</div>
120120

@@ -130,10 +130,10 @@ Real documents are long. We can't embed an entire book as a single vector (too m
130130

131131
<div class="tip-box" data-title="Chunking strategies">
132132

133-
- **Fixed-size**: Split every $n$ characters/tokens (simple but may cut mid-sentence)
134-
- **Sentence-based**: Split on sentence boundaries (preserves meaning better)
135-
- **Semantic**: Use topic shifts to determine chunk boundaries (best quality but most complex)
136-
- **Recursive**: Split on paragraphs first, then sentences if chunks are still too long
133+
- **Fixed-size**: split every $n$ characters/tokens (simple but may cut mid-sentence)
134+
- **Sentence-based**: split on sentence boundaries (preserves meaning better)
135+
- **Semantic**: use topic shifts to determine chunk boundaries (best quality but most complex)
136+
- **Recursive**: split on paragraphs first, then sentences if chunks are still too long
137137

138138
</div>
139139

@@ -238,7 +238,7 @@ FLAN-T5 runs locally in Colab. For better quality, try `google/flan-t5-large` (r
238238
</div>
239239

240240
---
241-
<!-- _class: scale-80 -->
241+
<!-- _class: scale-70 -->
242242

243243
# End-to-end RAG with ChromaDB
244244

@@ -298,21 +298,22 @@ Many production systems use *both*: fine-tune the model for the domain's style a
298298
</div>
299299

300300
---
301+
<!-- _class: scale-85 -->
301302

302303
# Beyond basic RAG
303304

304305
<div class="note-box" data-title="Advanced techniques">
305306

306-
1. **Re-ranking**: After initial retrieval, use a cross-encoder model to re-score and re-order results for better precision
307-
2. **Hybrid search**: Combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval
308-
3. **Query expansion**: Rewrite or expand the user's query using an LLM before retrieval to improve recall
307+
1. **Re-ranking**: after initial retrieval, use a cross-encoder model to re-score and re-order results for better precision
308+
2. **Hybrid search**: combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval
309+
3. **Query expansion**: rewrite or expand the user's query using an LLM before retrieval to improve recall
309310

310311
</div>
311312

312313
<div class="warning-box" data-title="Limitations to keep in mind">
313314

314-
1. **Retrieval quality**: If the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out
315-
2. **Context window limits**: There's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization
315+
1. **Retrieval quality**: if the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out
316+
2. **Context window limits**: there's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization
316317
3. **Multi-source reasoning**: RAG is great for finding a specific fact, but struggles when answers require synthesizing information across many documents
317318

318319
</div>
@@ -323,11 +324,11 @@ Many production systems use *both*: fine-tune the model for the domain's style a
323324

324325
<div class="tip-box" data-title="Making RAG work well">
325326

326-
1. **Chunk wisely**: Experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.
327-
2. **Choose the right embedding model**: Test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.
328-
3. **Evaluate retrieval separately**: Before blaming the LLM, check if the retriever is finding the right documents.
329-
4. **Include metadata**: Store source, date, and section info with chunks so the LLM can cite sources.
330-
5. **Handle "I don't know"**: Instruct the model to say when the context doesn't contain the answer rather than hallucinating.
327+
1. **Chunk wisely**: experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.
328+
2. **Choose the right embedding model**: test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.
329+
3. **Evaluate retrieval separately**: before blaming the LLM, check if the retriever is finding the right documents.
330+
4. **Include metadata**: store source, date, and section info with chunks so the LLM can cite sources.
331+
5. **Handle "I don't know"**: instruct the model to say when the context doesn't contain the answer rather than hallucinating.
331332

332333
</div>
333334

slides/week5/lecture17.pdf

-8.01 KB
Binary file not shown.

0 commit comments

Comments
 (0)