Skip to content

Commit 61b260b

Browse files
committed
chore: remove scratch files and unused benchmark docs
1 parent d793915 commit 61b260b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+7208
-890
lines changed

.cursor/hooks.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"version": 1,
3+
"hooks": {
4+
"stop": [
5+
{
6+
"command": "./hooks/keep-going.sh"
7+
}
8+
]
9+
}
10+
}

.cursor/hooks/keep-going.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
# Read input from stdin (required by hooks protocol)
3+
cat > /dev/null
4+
5+
# Return follow-up message to keep the agent going
6+
echo '{"followup_message": "keep going until there is no more next steps. If you are done, say 'done'"}'

all_providers_test.md

Lines changed: 0 additions & 26 deletions
This file was deleted.

benchmark_results.md

Lines changed: 0 additions & 18 deletions
This file was deleted.

citations.db

16 KB
Binary file not shown.

latest/assignments/README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Coding Assignments
2+
3+
This directory contains hands-on coding assignments for each week of the course. Each assignment reinforces key RAG concepts through practical implementation.
4+
5+
## Assignment Structure
6+
7+
Each assignment includes:
8+
9+
- **Documentation** (`.md` files): Learning goals, setup, requirements, deliverables
10+
- **Working Code** (`.py` files): Tested, runnable implementations
11+
12+
## Quick Start
13+
14+
Run any assignment with:
15+
16+
```bash
17+
cd /path/to/systematically-improving-rag
18+
uv run python latest/assignments/week1/metrics.py
19+
```
20+
21+
## Assignments by Week
22+
23+
| Week | Documentation | Code | Focus Area |
24+
|------|---------------|------|------------|
25+
| 0 | [RAG Metrics Dashboard](week0_assignment.md) | [rag_pipeline.py](week0/rag_pipeline.py) | Logging, dashboards, ChromaDB |
26+
| 1 | [Retrieval Evaluation](week1_assignment.md) | [metrics.py](week1/metrics.py), [evaluation_pipeline.py](week1/evaluation_pipeline.py) | Precision, recall, MRR, NDCG |
27+
| 2 | [Fine-tune Embeddings](week2_assignment.md) | [fine_tuning.py](week2/fine_tuning.py) | Triplet loss, hard negatives |
28+
| 3 | [Streaming RAG](week3_assignment.md) | [streaming.py](week3/streaming.py) | SSE, citations, validation |
29+
| 4 | [Query Clustering](week4_assignment.md) | [clustering.py](week4/clustering.py) | K-means, UMAP, prioritization |
30+
| 5 | [Multimodal Search](week5_assignment.md) | [multimodal.py](week5/multimodal.py) | Tables, images, rich descriptions |
31+
| 6 | [Tool Routing](week6_assignment.md) | [router.py](week6/router.py) | OpenAI tool calling, few-shot |
32+
| 7 | [Production RAG](week7_assignment.md) | [caching.py](week7/caching.py) | Multi-level caching, cost tracking |
33+
| Capstone | [End-to-End System](capstone_assignment.md) | [system.py](capstone/system.py) | Full RAG flywheel |
34+
35+
## Code Overview
36+
37+
### Week 1: Evaluation Metrics
38+
- `metrics.py`: Precision@k, Recall@k, MRR, NDCG implementations
39+
- `evaluation_pipeline.py`: Full evaluation pipeline with ChromaDB
40+
41+
### Week 2: Fine-tuning
42+
- `fine_tuning.py`: Hard negative mining, triplet creation, evaluation
43+
44+
### Week 3: Streaming
45+
- `streaming.py`: SSE streaming, citation tracking, response validation
46+
47+
### Week 4: Query Analysis
48+
- `clustering.py`: K-means clustering, UMAP visualization, prioritization matrix
49+
50+
### Week 6: Query Routing
51+
- `router.py`: OpenAI function calling, dynamic example selection
52+
53+
### Week 7: Production
54+
- `caching.py`: Multi-level cache (memory/Redis/semantic), cost tracking
55+
56+
### Capstone
57+
- `system.py`: Complete RAG system with improvement flywheel
58+
59+
## Recommended Datasets
60+
61+
These public datasets are used across assignments:
62+
63+
- **SQuAD 2.0**: Question answering on Wikipedia (`rajpurkar/squad_v2`)
64+
- **MS MARCO**: Web search queries and passages (`microsoft/ms_marco`)
65+
- **HotpotQA**: Multi-hop reasoning questions (`hotpot_qa`)
66+
- **Natural Questions**: Real Google search queries (`google-research-datasets/natural_questions`)
67+
- **COCO**: Image captioning dataset (`HuggingFaceM4/COCO`)
68+
69+
## Getting Started
70+
71+
1. Ensure you have the course environment set up (see `latest/README.md`)
72+
2. Start with Week 0 if you're new to RAG evaluation
73+
3. Complete assignments in order - they build on each other
74+
4. Use the weekly notebooks as reference implementations

latest/assignments/capstone/__init__.py

Whitespace-only changes.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"baseline": {
3+
"retrieval_metrics": {
4+
"precision": 0.5555555555555556,
5+
"recall": 0.5555555555555556,
6+
"mrr": 0.5555555555555556,
7+
"ndcg": 0.5555555555555556,
8+
"f1": 0.5555555555555556,
9+
"map": 0.5555555555555556
10+
},
11+
"routing_accuracy": 0.0,
12+
"total_queries": 9
13+
},
14+
"improved": {
15+
"retrieval_metrics": {
16+
"precision": 1.0,
17+
"recall": 1.0,
18+
"mrr": 1.0,
19+
"ndcg": 1.0,
20+
"f1": 1.0,
21+
"map": 1.0
22+
},
23+
"routing_accuracy": 1.0,
24+
"total_queries": 9
25+
},
26+
"improvement_pct": {
27+
"precision": 80.0,
28+
"recall": 80.0,
29+
"mrr": 80.0
30+
}
31+
}

0 commit comments

Comments
 (0)