You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
style(lecture17): lowercase after colons, scale down overflow slides
- Apply lowercase style after colons throughout (parametric/non-parametric
knowledge, hallucination/stale data/citations, embedding models,
chunking strategies, advanced techniques, limitations, best practices)
- Scale down Beyond basic RAG slide to 85%
- Scale down ChromaDB slide to 70%
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
<divclass="definition-box" data-title="Two kinds of knowledge">
727
-
<p><strong>Parametric knowledge</strong>: Facts stored <em>inside</em> the model's parameters during training. The model "knows" things because it memorized patterns from its training data.</p>
728
-
<p><strong>Non-parametric knowledge</strong>: Facts stored <em>outside</em> the model in documents, databases, or APIs. The model accesses this knowledge at inference time.</p>
727
+
<p><strong>Parametric knowledge</strong>: facts stored <em>inside</em> the model's parameters during training. The model "knows" things because it memorized patterns from its training data.</p>
728
+
<p><strong>Non-parametric knowledge</strong>: facts stored <em>outside</em> the model in documents, databases, or APIs. The model accesses this knowledge at inference time.</p>
<p>For RAG, we use <strong>sentence embedding</strong> models (not word-level) that produce a single vector for an entire passage. All of these are free and open-source:</p>
795
795
<ul>
796
-
<li><strong>Sentence-BERT</strong> (all-MiniLM-L6-v2): Fast, good quality, 384 dimensions</li>
797
-
<li><strong>BGE</strong> (BAAI/bge-small-en-v1.5): State-of-the-art for retrieval</li>
<li><strong>Fixed-size</strong>: Split every <spanclass="katex"><spanclass="katex-mathml"><mathxmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotationencoding="application/x-tex">n</annotation></semantics></math></span><spanclass="katex-html" aria-hidden="true"><spanclass="base"><spanclass="strut" style="height:0.4306em;"></span><spanclass="mord mathnormal">n</span></span></span></span> characters/tokens (simple but may cut mid-sentence)</li>
810
-
<li><strong>Sentence-based</strong>: Split on sentence boundaries (preserves meaning better)</li>
811
-
<li><strong>Semantic</strong>: Use topic shifts to determine chunk boundaries (best quality but most complex)</li>
812
-
<li><strong>Recursive</strong>: Split on paragraphs first, then sentences if chunks are still too long</li>
809
+
<li><strong>Fixed-size</strong>: split every <spanclass="katex"><spanclass="katex-mathml"><mathxmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotationencoding="application/x-tex">n</annotation></semantics></math></span><spanclass="katex-html" aria-hidden="true"><spanclass="base"><spanclass="strut" style="height:0.4306em;"></span><spanclass="mord mathnormal">n</span></span></span></span> characters/tokens (simple but may cut mid-sentence)</li>
810
+
<li><strong>Sentence-based</strong>: split on sentence boundaries (preserves meaning better)</li>
811
+
<li><strong>Semantic</strong>: use topic shifts to determine chunk boundaries (best quality but most complex)</li>
812
+
<li><strong>Recursive</strong>: split on paragraphs first, then sentences if chunks are still too long</li>
@@ -1003,19 +1003,19 @@ <h1 id="rag-vs-fine-tuning">RAG vs fine-tuning</h1>
1003
1003
<p>Many production systems use <em>both</em>: fine-tune the model for the domain's style and behavior, then use RAG for up-to-date factual knowledge.</p>
<li><strong>Re-ranking</strong>: After initial retrieval, use a cross-encoder model to re-score and re-order results for better precision</li>
1011
-
<li><strong>Hybrid search</strong>: Combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval</li>
1012
-
<li><strong>Query expansion</strong>: Rewrite or expand the user's query using an LLM before retrieval to improve recall</li>
1010
+
<li><strong>Re-ranking</strong>: after initial retrieval, use a cross-encoder model to re-score and re-order results for better precision</li>
1011
+
<li><strong>Hybrid search</strong>: combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval</li>
1012
+
<li><strong>Query expansion</strong>: rewrite or expand the user's query using an LLM before retrieval to improve recall</li>
1013
1013
</ol>
1014
1014
</div>
1015
1015
<divclass="warning-box" data-title="Limitations to keep in mind">
1016
1016
<ol>
1017
-
<li><strong>Retrieval quality</strong>: If the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out</li>
1018
-
<li><strong>Context window limits</strong>: There's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization</li>
1017
+
<li><strong>Retrieval quality</strong>: if the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out</li>
1018
+
<li><strong>Context window limits</strong>: there's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization</li>
1019
1019
<li><strong>Multi-source reasoning</strong>: RAG is great for finding a specific fact, but struggles when answers require synthesizing information across many documents</li>
<divclass="tip-box" data-title="Making RAG work well">
1026
1026
<ol>
1027
-
<li><strong>Chunk wisely</strong>: Experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.</li>
1028
-
<li><strong>Choose the right embedding model</strong>: Test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.</li>
1029
-
<li><strong>Evaluate retrieval separately</strong>: Before blaming the LLM, check if the retriever is finding the right documents.</li>
1030
-
<li><strong>Include metadata</strong>: Store source, date, and section info with chunks so the LLM can cite sources.</li>
1031
-
<li><strong>Handle "I don't know"</strong>: Instruct the model to say when the context doesn't contain the answer rather than hallucinating.</li>
1027
+
<li><strong>Chunk wisely</strong>: experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.</li>
1028
+
<li><strong>Choose the right embedding model</strong>: test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.</li>
1029
+
<li><strong>Evaluate retrieval separately</strong>: before blaming the LLM, check if the retriever is finding the right documents.</li>
1030
+
<li><strong>Include metadata</strong>: store source, date, and section info with chunks so the LLM can cite sources.</li>
1031
+
<li><strong>Handle "I don't know"</strong>: instruct the model to say when the context doesn't contain the answer rather than hallucinating.</li>
<divclass="definition-box"data-title="Two kinds of knowledge">
54
54
55
-
**Parametric knowledge**: Facts stored *inside* the model's parameters during training. The model "knows" things because it memorized patterns from its training data.
55
+
**Parametric knowledge**: facts stored *inside* the model's parameters during training. The model "knows" things because it memorized patterns from its training data.
56
56
57
-
**Non-parametric knowledge**: Facts stored *outside* the model in documents, databases, or APIs. The model accesses this knowledge at inference time.
57
+
**Non-parametric knowledge**: facts stored *outside* the model in documents, databases, or APIs. The model accesses this knowledge at inference time.
For RAG, we use **sentence embedding** models (not word-level) that produce a single vector for an entire passage. All of these are free and open-source:
115
-
-**Sentence-BERT** (all-MiniLM-L6-v2): Fast, good quality, 384 dimensions
116
-
-**BGE** (BAAI/bge-small-en-v1.5): State-of-the-art for retrieval
117
-
-**E5, GTE**: Strong open-source alternatives
115
+
-**Sentence-BERT** (all-MiniLM-L6-v2): fast, good quality, 384 dimensions
116
+
-**BGE** (BAAI/bge-small-en-v1.5): state-of-the-art for retrieval
117
+
-**E5, GTE**: strong open-source alternatives
118
118
119
119
</div>
120
120
@@ -130,10 +130,10 @@ Real documents are long. We can't embed an entire book as a single vector (too m
1.**Re-ranking**: After initial retrieval, use a cross-encoder model to re-score and re-order results for better precision
307
-
2.**Hybrid search**: Combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval
308
-
3.**Query expansion**: Rewrite or expand the user's query using an LLM before retrieval to improve recall
307
+
1.**Re-ranking**: after initial retrieval, use a cross-encoder model to re-score and re-order results for better precision
308
+
2.**Hybrid search**: combine vector similarity (semantic) with keyword matching (BM25) for more robust retrieval
309
+
3.**Query expansion**: rewrite or expand the user's query using an LLM before retrieval to improve recall
309
310
310
311
</div>
311
312
312
313
<divclass="warning-box"data-title="Limitations to keep in mind">
313
314
314
-
1.**Retrieval quality**: If the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out
315
-
2.**Context window limits**: There's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization
315
+
1.**Retrieval quality**: if the retriever fails to find relevant documents, the generator can't produce a good answer — garbage in, garbage out
316
+
2.**Context window limits**: there's a limit to how much retrieved text fits in the prompt, requiring aggressive chunking or summarization
316
317
3.**Multi-source reasoning**: RAG is great for finding a specific fact, but struggles when answers require synthesizing information across many documents
317
318
318
319
</div>
@@ -323,11 +324,11 @@ Many production systems use *both*: fine-tune the model for the domain's style a
323
324
324
325
<divclass="tip-box"data-title="Making RAG work well">
325
326
326
-
1.**Chunk wisely**: Experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.
327
-
2.**Choose the right embedding model**: Test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.
328
-
3.**Evaluate retrieval separately**: Before blaming the LLM, check if the retriever is finding the right documents.
329
-
4.**Include metadata**: Store source, date, and section info with chunks so the LLM can cite sources.
330
-
5.**Handle "I don't know"**: Instruct the model to say when the context doesn't contain the answer rather than hallucinating.
327
+
1.**Chunk wisely**: experiment with chunk sizes (256-512 tokens is a good start). Use overlap between chunks.
328
+
2.**Choose the right embedding model**: test multiple models on your specific domain. General-purpose models may not capture domain-specific semantics.
329
+
3.**Evaluate retrieval separately**: before blaming the LLM, check if the retriever is finding the right documents.
330
+
4.**Include metadata**: store source, date, and section info with chunks so the LLM can cite sources.
331
+
5.**Handle "I don't know"**: instruct the model to say when the context doesn't contain the answer rather than hallucinating.
0 commit comments