jxnl
diff --git a/‎.cursor/hooks.json‎
Lines changed: 10 additions & 0 deletions b/‎.cursor/hooks.json‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎.cursor/hooks/keep-going.sh‎
Lines changed: 6 additions & 0 deletions b/‎.cursor/hooks/keep-going.sh‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎all_providers_test.md‎
Lines changed: 0 additions & 26 deletions b/‎all_providers_test.md‎
Lines changed: 0 additions & 26 deletions
diff --git a/‎benchmark_results.md‎
Lines changed: 0 additions & 18 deletions b/‎benchmark_results.md‎
Lines changed: 0 additions & 18 deletions
diff --git a/‎citations.db‎
16 KB b/‎citations.db‎
16 KB
diff --git a/‎latest/assignments/README.md‎
Lines changed: 74 additions & 0 deletions b/‎latest/assignments/README.md‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎latest/assignments/capstone/__init__.py‎ b/‎latest/assignments/capstone/__init__.py‎
diff --git a/‎latest/assignments/capstone/improvement_results.json‎
Lines changed: 31 additions & 0 deletions b/‎latest/assignments/capstone/improvement_results.json‎
Lines changed: 31 additions & 0 deletions
@@ -0,0 +1,10 @@
+{
+  "version": 1,
+  "hooks": {
+    "stop": [
+      {
+        "command": "./hooks/keep-going.sh"
+      }
+    ]
+  }
+}
@@ -0,0 +1,6 @@
+#!/bin/bash
+# Read input from stdin (required by hooks protocol)
+cat > /dev/null
+
+# Return follow-up message to keep the agent going
+echo '{"followup_message": "keep going until there is no more next steps. If you are done, say 'done'"}'
@@ -0,0 +1,74 @@
+# Coding Assignments
+
+This directory contains hands-on coding assignments for each week of the course. Each assignment reinforces key RAG concepts through practical implementation.
+
+## Assignment Structure
+
+Each assignment includes:
+
+- **Documentation** (`.md` files): Learning goals, setup, requirements, deliverables
+- **Working Code** (`.py` files): Tested, runnable implementations
+
+## Quick Start
+
+Run any assignment with:
+
+```bash
+cd /path/to/systematically-improving-rag
+uv run python latest/assignments/week1/metrics.py
+```
+
+## Assignments by Week
+
+| Week | Documentation | Code | Focus Area |
+|------|---------------|------|------------|
+| 0 | [RAG Metrics Dashboard](week0_assignment.md) | [rag_pipeline.py](week0/rag_pipeline.py) | Logging, dashboards, ChromaDB |
+| 1 | [Retrieval Evaluation](week1_assignment.md) | [metrics.py](week1/metrics.py), [evaluation_pipeline.py](week1/evaluation_pipeline.py) | Precision, recall, MRR, NDCG |
+| 2 | [Fine-tune Embeddings](week2_assignment.md) | [fine_tuning.py](week2/fine_tuning.py) | Triplet loss, hard negatives |
+| 3 | [Streaming RAG](week3_assignment.md) | [streaming.py](week3/streaming.py) | SSE, citations, validation |
+| 4 | [Query Clustering](week4_assignment.md) | [clustering.py](week4/clustering.py) | K-means, UMAP, prioritization |
+| 5 | [Multimodal Search](week5_assignment.md) | [multimodal.py](week5/multimodal.py) | Tables, images, rich descriptions |
+| 6 | [Tool Routing](week6_assignment.md) | [router.py](week6/router.py) | OpenAI tool calling, few-shot |
+| 7 | [Production RAG](week7_assignment.md) | [caching.py](week7/caching.py) | Multi-level caching, cost tracking |
+| Capstone | [End-to-End System](capstone_assignment.md) | [system.py](capstone/system.py) | Full RAG flywheel |
+
+## Code Overview
+
+### Week 1: Evaluation Metrics
+- `metrics.py`: Precision@k, Recall@k, MRR, NDCG implementations
+- `evaluation_pipeline.py`: Full evaluation pipeline with ChromaDB
+
+### Week 2: Fine-tuning
+- `fine_tuning.py`: Hard negative mining, triplet creation, evaluation
+
+### Week 3: Streaming
+- `streaming.py`: SSE streaming, citation tracking, response validation
+
+### Week 4: Query Analysis
+- `clustering.py`: K-means clustering, UMAP visualization, prioritization matrix
+
+### Week 6: Query Routing
+- `router.py`: OpenAI function calling, dynamic example selection
+
+### Week 7: Production
+- `caching.py`: Multi-level cache (memory/Redis/semantic), cost tracking
+
+### Capstone
+- `system.py`: Complete RAG system with improvement flywheel
+
+## Recommended Datasets
+
+These public datasets are used across assignments:
+
+- **SQuAD 2.0**: Question answering on Wikipedia (`rajpurkar/squad_v2`)
+- **MS MARCO**: Web search queries and passages (`microsoft/ms_marco`)
+- **HotpotQA**: Multi-hop reasoning questions (`hotpot_qa`)
+- **Natural Questions**: Real Google search queries (`google-research-datasets/natural_questions`)
+- **COCO**: Image captioning dataset (`HuggingFaceM4/COCO`)
+
+## Getting Started
+
+1. Ensure you have the course environment set up (see `latest/README.md`)
+2. Start with Week 0 if you're new to RAG evaluation
+3. Complete assignments in order - they build on each other
+4. Use the weekly notebooks as reference implementations
@@ -0,0 +1,31 @@
+{
+  "baseline": {
+    "retrieval_metrics": {
+      "precision": 0.5555555555555556,
+      "recall": 0.5555555555555556,
+      "mrr": 0.5555555555555556,
+      "ndcg": 0.5555555555555556,
+      "f1": 0.5555555555555556,
+      "map": 0.5555555555555556
+    },
+    "routing_accuracy": 0.0,
+    "total_queries": 9
+  },
+  "improved": {
+    "retrieval_metrics": {
+      "precision": 1.0,
+      "recall": 1.0,
+      "mrr": 1.0,
+      "ndcg": 1.0,
+      "f1": 1.0,
+      "map": 1.0
+    },
+    "routing_accuracy": 1.0,
+    "total_queries": 9
+  },
+  "improvement_pct": {
+    "precision": 80.0,
+    "recall": 80.0,
+    "mrr": 80.0
+  }
+}