Skip to content

Commit f7d3cbd

Browse files
authored
Merge pull request #16 from sys-intelligence/docs_course_lab_bench
Course Lab Benchmark: Add Instructions for Extending the Benchmark
2 parents 55221f7 + 10f3d07 commit f7d3cbd

File tree

2 files changed

+54
-1
lines changed

2 files changed

+54
-1
lines changed

benchmarks/course_lab_bench/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,56 @@ The benchmark supports multiple AI agents:
8585

8686
To add your own agent to the benchmark, see [add_agents.md](add_agents.md).
8787

88+
## How to Extend the Benchmark
89+
90+
This section describes how to add additional labs to the benchmark. We show the workflow using the existing [MapReduce lab](http://nil.csail.mit.edu/6.5840/2024/labs/lab-mr.html) as an example:
91+
92+
### Step 1: Add a row to the CSV file
93+
94+
Edit `data/benchmark/lab_exam_data_20250529.csv` and add a new row. Here is what each column represents:
95+
96+
| Column | Value | Description |
97+
| ----------------- | -------------------------------- | ---------------------------------------------------------- |
98+
| `instance_id` | `1` | Unique numeric ID for the task |
99+
| `course` | `6.5840: Distributed Systems` | Course name |
100+
| `year` | `Spring 2024` | Course term/year |
101+
| `index` | `Lab 1: MapReduce` | Lab name |
102+
| `introduction` | `In this lab you'll build...` | Goes into markdown: Problem Context → Introduction |
103+
| `getting_started` | `You need to setup Go...` | Goes into markdown: Getting Started section |
104+
| `The code` | (starter code description) | Goes into markdown: The Code section |
105+
| `description` | `Your job is to implement...` | Goes into markdown: Your Task section |
106+
| `repo` | `6.5840-golabs-2024` | Repository folder name (will be prefixed with `projects/`) |
107+
| `test_method` | `cd src/main && bash test-mr.sh` | Shell command to run tests |
108+
| `test_results` | `*** PASSED ALL TESTS` | Expected test output when solution is correct |
109+
| `difficluty` | `moderate/hard` | Difficulty: `easy`, `moderate`, `moderate/hard`, or `hard` |
110+
| `link` | `http://.../lab-mr.html` | URL to original course lab assignment |
111+
112+
### Step 2: Run the conversion script
113+
114+
```bash
115+
cd data/benchmark
116+
python3 convert_promblems.py
117+
```
118+
119+
This generates:
120+
121+
- `problems/system_lab_<id>.md` - Markdown file with task description
122+
- Updates `system_lab_tasks.jsonl` - JSONL with all tasks
123+
124+
### Step 3: Update `install.sh` (if adding a new repository)
125+
126+
```bash
127+
if [ -d "6.5840-golabs-2024" ]; then
128+
echo "==> 6.5840-golabs-2024 already exists, skipping clone."
129+
else
130+
echo "==> Cloning 6.5840-golabs-2024..."
131+
git clone git://g.csail.mit.edu/6.5840-golabs-2024
132+
fi
133+
```
134+
135+
### Step 4: Test your addition
136+
137+
```bash
138+
./install.sh
139+
./run.sh <model_name>
140+
```

benchmarks/course_lab_bench/data/benchmark/convert_promblems.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ def covert_to_dict():
99
id = 0
1010
# instance_id,course,year,index,part_name,introduction,getting_started,The code,description,task,hint,rules,repo_location,test_method,test_results,difficluty,link
1111
for row in reader:
12-
if id > 25:
12+
if id > 100: # Process up to 100 tasks
1313
break
1414
id += 1
1515
unique_id = row['instance_id'] + row['course'] + '_' + row['year'] + '_' + row['index']

0 commit comments

Comments
 (0)