Skip to content

Commit 654f380

Browse files
committed
courselab: update PKU Pintos course details to include link and year
1 parent 6c97b73 commit 654f380

File tree

6 files changed

+42
-38
lines changed

6 files changed

+42
-38
lines changed

benchmarks/courseexam_bench/data/raw/cs162_operating_systems_and_system_programming_summer_2022_final/exam.md

Lines changed: 32 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ Your answer should be either "True" or "False", followed by a brief explanation
6767
"type": "Freeform",
6868
"tags": ["operating-systems", "virtual-memory", "page-faults"],
6969
"reference_materials": ["rust_reference.md"],
70-
"answer": "False. FIFO and Belady's anomaly",
70+
"answer": "False. FIFO and Belady's anomaly.",
7171
"llm_judge_instructions": "Award 1 point for correctly answering False. Award 1 point for a valid explanation. Longer explanations may get no credit."
7272
}
7373
```
@@ -147,7 +147,7 @@ Your answer should be either "True" or "False", followed by a brief explanation
147147
"type": "Freeform",
148148
"tags": ["distributed-systems", "idempotency"],
149149
"reference_materials": ["rust_reference.md"],
150-
"answer": "False. f(f(x)) != f(x)",
150+
"answer": "False. f(f(x)) != f(x).",
151151
"llm_judge_instructions": "Award 1 point for correctly answering False. Award 1 point for a valid explanation. Longer explanations may get no credit."
152152
}
153153
```
@@ -188,7 +188,7 @@ D. LRU
188188

189189
E. None of the above
190190

191-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
191+
Your answer should list all correct letters (e.g., "A, B").
192192

193193

194194
```json
@@ -198,8 +198,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
198198
"type": "Freeform",
199199
"tags": ["operating-systems", "virtual-memory", "page-replacement"],
200200
"reference_materials": ["rust_reference.md"],
201-
"answer": "A, B, D. LRU approximates MIN, and clock/second-chance approximate LRU.",
202-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
201+
"answer": "A, B, D",
202+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
203203
}
204204
```
205205

@@ -219,7 +219,7 @@ D. In programmed I/O, the CPU programs an external controller to do I/O while th
219219

220220
E. None of the above.
221221

222-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
222+
Your answer should list all correct letters (e.g., "A, B").
223223

224224
```json
225225
{
@@ -228,8 +228,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
228228
"type": "Freeform",
229229
"tags": ["operating-systems", "io"],
230230
"reference_materials": ["rust_reference.md"],
231-
"answer": "E. None of the options are correct. A: Top half and bottom half refer to interrupt-driven I/O; polling is not used. B: Memory-mapped I/O actively involves the CPU; in DMA, an external device writes data to memory independently, then interrupts the CPU when the data is ready. C: Port-mapped I/O does not require involvement of the disk. D: Programmed I/O involves special CPU instructions (eg. in and out), not an external I/O controller.",
232-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
231+
"answer": "E",
232+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
233233
}
234234
```
235235

@@ -249,7 +249,7 @@ D. Like RAM, storage devices are byte-addressed.
249249

250250
E. None of the above.
251251

252-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
252+
Your answer should list all correct letters (e.g., "A, B").
253253

254254
```json
255255
{
@@ -258,8 +258,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
258258
"type": "Freeform",
259259
"tags": ["operating-systems", "storage"],
260260
"reference_materials": ["rust_reference.md"],
261-
"answer": "A. A: Sequential reads on HDDs are usually faster than random reads because we only have to wait for the disk to spin under the head. B: SSD pages wear out when they are erased; HDDs do not wear out as quickly. C: SSDs are generally more expensive. D: Storage devices are addressed in units of sectors/pages/blocks.",
262-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
261+
"answer": "A",
262+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
263263
}
264264
```
265265

@@ -279,7 +279,7 @@ D. Compulsory misses grow linearly with the size of the cache.
279279

280280
E. None of the above.
281281

282-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
282+
Your answer should list all correct letters (e.g., "A, B").
283283

284284
```json
285285
{
@@ -288,8 +288,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
288288
"type": "Freeform",
289289
"tags": ["operating-systems", "caching"],
290290
"reference_materials": ["rust_reference.md"],
291-
"answer": "A, B. A: In a multiprocessor system, actions by one processor can invalidate the cache entries for another processor, possibly resulting in a coherence miss. B: A cache with a higher associativity can reduce the number of conflict misses. C: Making a cache larger should decrease the number of capacity misses. D: Compulsory misses occur when a cache line is loaded for the first time; increasing the size of the cache has no effect on the number of compulsory misses.",
292-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
291+
"answer": "A, B",
292+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
293293
}
294294
```
295295

@@ -324,7 +324,7 @@ D. The program will compile if we replace `kenobi(s: String)` and `kenobi(s)` wi
324324

325325
E. None of the above.
326326

327-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
327+
Your answer should list all correct letters (e.g., "A, B").
328328

329329
```json
330330
{
@@ -333,8 +333,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
333333
"type": "Freeform",
334334
"tags": ["rust", "ownership"],
335335
"reference_materials": ["rust_reference.md"],
336-
"answer": "C, D. A: The second s is a new variable due to the let declaration. It is not a reassignment of the original s. B: This program will result in compile-time, not run-time, errors. C: This solves the issue; s would no longer be used after being moved into kenobi. D: Passing a reference to s also fixes the problem, as ownership of s is not moved into kenobi",
337-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
336+
"answer": "C, D",
337+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
338338
}
339339
```
340340

@@ -354,7 +354,7 @@ D. A program in a safe state can eventually deadlock.
354354

355355
E. None of the above.
356356

357-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
357+
Your answer should list all correct letters (e.g., "A, B").
358358

359359
```json
360360
{
@@ -363,8 +363,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
363363
"type": "Freeform",
364364
"tags": ["operating-systems", "concurrency", "deadlock"],
365365
"reference_materials": ["rust_reference.md"],
366-
"answer": "A, B, D. A: Circular wait is a necessary condition for deadlock. B: Circular wait is a necessary condition for deadlock. C: Priority donation prevents starvation of a high-priority thread; it does not prevent deadlock. D: A safe state means that there is a non-blocking order of threads that allows all threads to complete. Deadlock can still occur starting from a safe state (eg. if the non-blocking ordering is not followed).",
367-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
366+
"answer": "A, B, D",
367+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
368368
}
369369
```
370370

@@ -384,7 +384,7 @@ D. In FFS, the file number is the index of an inode in the inode array.
384384

385385
E. None of the above.
386386

387-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
387+
Your answer should list all correct letters (e.g., "A, B").
388388

389389
```json
390390
{
@@ -393,8 +393,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
393393
"type": "Freeform",
394394
"tags": ["operating-systems", "file-systems"],
395395
"reference_materials": ["rust_reference.md"],
396-
"answer": "A, B, C, D. All statements are correct.",
397-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
396+
"answer": "A, B, C, D",
397+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
398398
}
399399
```
400400

@@ -414,7 +414,7 @@ D. Putting a directory and its file into common block groups.
414414

415415
E. None of the above.
416416

417-
Your answer should list all correct letters (e.g., "A, B"), followed by a brief explanation. Longer explanations may get no credit.
417+
Your answer should list all correct letters (e.g., "A, B").
418418

419419
```json
420420
{
@@ -423,8 +423,8 @@ Your answer should list all correct letters (e.g., "A, B"), followed by a brief
423423
"type": "Freeform",
424424
"tags": ["operating-systems", "file-systems", "ffs"],
425425
"reference_materials": ["rust_reference.md"],
426-
"answer": "C, D. C: FFS uses first-free allocation for sequential block placement. D: FFS groups directories and their files in common cylinder groups.",
427-
"llm_judge_instructions": "Award 1 point for listing exactly the correct letters that apply. Award 1 point for a valid explanation. Longer explanations may get no credit."
426+
"answer": "C, D",
427+
"llm_judge_instructions": "Award 2 points if and only if the selected letters exactly match the set of correct answers (no omissions or extras). Otherwise, award 1 point if the selection includes at least one correct letter, and 0 points if it includes none."
428428
}
429429
```
430430

@@ -503,7 +503,7 @@ Please answer in THREE SENTENCES OR LESS. Longer explanations may get no credit
503503
"type": "Freeform",
504504
"tags": ["operating-systems", "virtual-memory", "caching", "tlb"],
505505
"reference_materials": ["rust_reference.md"],
506-
"answer": "Cache lookups can be overlapped when the page offset in the virtual address contains the cache index. See lecture 15, slide 26.",
506+
"answer": "Cache lookups can be overlapped when the page offset in the virtual address contains the cache index.",
507507
"llm_judge_instructions": "Award full points for a fully correct explanation. award partial credit for partially correct explanations"
508508
}
509509
```
@@ -598,7 +598,10 @@ give partial credit if you show your work.
598598

599599
Suppose we have resources A, B, C and threads T1, T2, T3, T4. The total number of each resource as well as the current/max allocations for each thread are as follows:
600600

601-
Total Allocation: A=5, B=6, C=7
601+
Total Allocation:
602+
| A | B | C |
603+
|---|---|---|
604+
| 5 | 6 | 7 |
602605

603606
Current Allocation:
604607
| Thread | A | B | C |
@@ -734,7 +737,7 @@ Fill in sections [A], [B], and [C]:
734737
"tags": ["operating-systems", "virtual-memory", "page-tables"],
735738
"reference_materials": ["rust_reference.md"],
736739
"answer": "[A]:\nvpn[0] = vaddr >> 22;\nvpn[1] = zero(vaddr >> 12, 10, 32);\n\n[B]:\nptentry = get_word(pt + 4*vpn[i]);\nif ((ptentry & 1) == 0) {\n page_fault_handler(vaddr);\n return 0;\n}\npt = zero(ptentry, 0, 12);\n\n[C]:\nreturn pt + zero(vaddr, 12, 32);",
737-
"llm_judge_instructions": "Grade each section separately and sum the points"
740+
"llm_judge_instructions": "A is worth 2 points, B is worth 6 points and C is worth 2 points. Grade each section separately and sum the points"
738741
}
739742
```
740743

benchmarks/courselab_bench/data/courses.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,9 @@
3636
"num_tasks": 1
3737
},
3838
{
39-
"course_id": "pku_pintos",
39+
"course_id": "pku_pintos_2025",
4040
"name": "PKU Operating Systems: Pintos Labs",
41+
"link": "https://pkuflyingpig.gitbook.io/pintos",
4142
"institution": "Peking University",
4243
"year": 2024,
4344
"num_tasks": 4

benchmarks/courselab_bench/data/pku_pintos/task_1_threads/config.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"instance_id": "pku_pintos__threads",
3-
"course_id": "pku_pintos",
2+
"instance_id": "pku_pintos_2025__threads",
3+
"course_id": "pku_pintos_2025",
44
"timeout_minutes": 60,
55
"tags": ["operating-systems", "c", "threads", "synchronization", "scheduling", "pintos"],
66
"artifacts": [

benchmarks/courselab_bench/data/pku_pintos/task_2_userprog/config.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"instance_id": "pku_pintos__userprog",
3-
"course_id": "pku_pintos",
2+
"instance_id": "pku_pintos_2025__userprog",
3+
"course_id": "pku_pintos_2025",
44
"timeout_minutes": 90,
55
"tags": ["operating-systems", "c", "system-calls", "processes", "user-programs", "pintos"],
66
"artifacts": [

benchmarks/courselab_bench/data/pku_pintos/task_3a_vm/config.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"instance_id": "pku_pintos__vm",
3-
"course_id": "pku_pintos",
2+
"instance_id": "pku_pintos_2025__vm",
3+
"course_id": "pku_pintos_2025",
44
"timeout_minutes": 120,
55
"tags": ["operating-systems", "c", "virtual-memory", "demand-paging", "page-replacement", "pintos"],
66
"artifacts": [

benchmarks/courselab_bench/data/pku_pintos/task_3b_mmap/config.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"instance_id": "pku_pintos__mmap",
3-
"course_id": "pku_pintos",
2+
"instance_id": "pku_pintos_2025__mmap",
3+
"course_id": "pku_pintos_2025",
44
"timeout_minutes": 120,
55
"tags": ["operating-systems", "c", "virtual-memory", "memory-mapped-files", "stack-growth", "pintos"],
66
"artifacts": [

0 commit comments

Comments
 (0)