diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index a00a740c..a348b864 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -18,10 +18,10 @@ jobs:
       matrix:
         benchmark:
           - example_bench
+          - course_exam_bench
           # TODO: For now, we comment out other benchmarks as they have no tests
           # - arteval_bench
           # - cache_bench
-          # - course_exam_bench
           # - course_project_bench
 
     steps:
diff --git a/benchmarks/course_exam_bench/EVALUATION_RESULTS.md b/benchmarks/course_exam_bench/EVALUATION_RESULTS.md
new file mode 100644
index 00000000..5b379201
--- /dev/null
+++ b/benchmarks/course_exam_bench/EVALUATION_RESULTS.md
@@ -0,0 +1,8 @@
+# Evaluation Results
+
+| Course                                                     | # of questions | Score (gpt-4 1) (score/total) | Score (gpt-4o) (score/total) | Score (o3-mini) (score/total) | Student Score (max/average/median) |
+| ---------------------------------------------------------- | -------------- | ----------------------------- | ---------------------------- | ----------------------------- | ---------------------------------- |
+| 6.5840 Distributed System Engineering: Spring 2025 Exam I  | 11             | 29/65                         | 27/65                        | 25/65                         | 65/ **51.8** /52                   |
+| 6.5840 Distributed System Engineering: Spring 2024 Exam I  | 15             | 54/95                         | 55/95                        | 42/95                         | 95/ **77** /78                     |
+| 6.5840 Distributed System Engineering: Spring 2024 Exam II | 14             | 24/71                         | 24/71                        | 36/71                         | 72/ **56.6** /57                   |
+| 6.1810 Fall 2024 MIT 6.1810 Operating System Engineering   | 14             | 35/70                         | 40/70                        | 52/70                         | 65/ **49.8** /49                   |
diff --git a/benchmarks/course_exam_bench/README.md b/benchmarks/course_exam_bench/README.md
index b723b775..64b533b2 100644
--- a/benchmarks/course_exam_bench/README.md
+++ b/benchmarks/course_exam_bench/README.md
@@ -1,91 +1,196 @@
-# Sytsem Course Exam Benchmark
+# Course Exam Benchmark
 
-## Introduction
+This benchmark evaluates the performance of Large Language Models (LLMs) on system course exams.
 
-This benchmark evaluates the performance of Large Language Models (LLMs) on system course exams. Currently, this benchmark includes 4 exams from MIT, in total 54 questions, covering various topics such as operating system and distributed system. It contains single-choice questions, multiple-choice questions, and short-answer questions. The questions are designed to test the understanding of system concepts and problem-solving skills.
+- 69 questions from 5 MIT exams
+- Question types: Single-choice, multiple-choice, true/false, and short-answer
+- Includes real student performance data for comparison
 
-## Task Details
+For current model evaluation results, see [EVALUATION_RESULTS.md](EVALUATION_RESULTS.md).
 
-- **Input**: The questions in the system course exams. It include single-choice questions, multiple-choice questions, and short-answer questions.
-- **Output**: The answers to the questions. The output can be in the form of selected options for single-choice and multiple-choice questions, and detailed explanations for short-answer questions. For single-choice and multiple-choice questions, the output should be the selected option(s) (e.g., "A", "B", "C", etc.). For short-answer questions, the output should be a detailed explanation or answer to the question.
+| Exam                           | Questions | Topics              |
+| ------------------------------ | --------- | ------------------- |
+| MIT 6.5840 Spring 2025 Exam I  | 11        | Distributed Systems |
+| MIT 6.5840 Spring 2025 Exam II | 15        | Distributed Systems |
+| MIT 6.5840 Spring 2024 Exam I  | 15        | Distributed Systems |
+| MIT 6.5840 Spring 2024 Exam II | 14        | Distributed Systems |
+| MIT 6.1810 Fall 2024 Quiz II   | 14        | Operating Systems   |
 
-- **Evaluation**: For single-choice and multiple-choice questions, the evaluation is to compare the selected option(s) with the ground truth answers provided in the exam papers. The evaluation is binary: correct or incorrect. For multiple-choice questions, partial credit can be given if some of the selected options are correct. For short-answer questions, the evaluation is based on the correctness and completeness of the answer, which can be subjective and may require human evaluation or a predefined rubric. We use LLM combined with human defined rubric to evaluate the short-answer questions.
+## Quick Start
 
-## Eval Results
+### 1. Install dependencies
 
-You can see the detailed information of each exam in the table below.
+```bash
+./install.sh
+```
 
-| Course                                                                 | # of questions | Score (gpt-4 1) (score/total) | Score (gpt-4o) (score/total) | Score (o3-mini) (score/total) | Student Score (max/average/medium) |
-|------------------------------------------------------------------------|----------------|-------------------------------|------------------------------|-------------------------------|-------------------------------------|
-| 6.5840 Distributed System Engineering: Spring 2025 Exam I              | 11             | 29/65                         | 27/65                        | 25/65                         | 65/ **51.8** /52                    |
-| 6.5840 Distributed System Engineering: Spring 2024 Exam I              | 15             | 54/95                         | 55/95                        | 42/95                         | 95/ **77** /78                      |
-| 6.5840 Distributed System Engineering: Spring 2024 Exam II             | 14             | 24/71                         | 24/71                        | 36/71                         | 72/ **56.6** /57                    |
-| 6.1810 Fall 2024 MIT 6.1810 Operating System Engineering               | 14             | 35/70                         | 40/70                        | 52/70                         | 65/ **49.8** /49                    |
+This creates a Python virtual environment and installs required packages
 
-## Benchmark Setup
+### 2. Configure your LLM endpoint
 
-### Test in Docker
+Edit `env.toml` to add your API keys:
 
-To test your benchmark in a Docker container, follow these steps:
+```toml
+[llm]
+AZURE_API_KEY = "your-key-here"
+AZURE_API_BASE = "https://your-endpoint.openai.azure.com/"
+# or
+ANTHROPIC_API_KEY = "your-key-here"
+```
 
-1. Build the Docker image using the provided Dockerfile. You can do this by running the following command in the terminal:
+### 3. Run the benchmark
 
-   ```sh
-   docker build -t your_benchmark_image .
-   ```
+```bash
+./run.sh "gpt-4o"
+```
+
+Or run directly with Python:
 
-2. Once the image is built, you can run it using the following command:
+```bash
+source .venv/bin/activate
+python src/main.py --model_name "gpt-4o"
+```
 
-   ```sh
-   docker run -it --rm your_benchmark_image
-   # docker run --rm your_benchmark_image
-   ```
+### 4. Run tests
 
-3. Inside the container, navigate to the appropriate directory and execute the benchmark script to start the testing process.
+```bash
+./test.sh
+```
 
-   ```sh
-   ./run.sh
-   ```
+## How it works
 
-### Maunaly Test
+1. Load questions: Reads exam questions from `data/benchmark/`
+2. For each question:
+   - Prompts the LLM with the question
+   - Parses the LLM's JSON response
+   - Evaluates the answer (exact match for multiple-choice, LLM-as-judge for short-answer)
+   - Records the score
+3. Generate summary: Aggregates results by exam and overall
 
-To manually test your benchmark, follow these steps:
+## Output files
 
-#### Install Dependencies
+After running, you'll find results in `./outputs/course_exam__<model>__<timestamp>/`:
 
-To install and configure your benchmark, follow these steps:
+### 1. Per-question results (`results.jsonl`)
 
-1. configure `env.toml` to set LLM API endpoint
-2. install dependencies
+For each question, one JSON object per line:
 
-```bash
-./install.sh
+```json
+{
+  "instance_id": 1,
+  "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii",
+  "question_type": "SingleChoice",
+  "llm_answer": "C",
+  "correct_answer": "C",
+  "points_earned": 5,
+  "points_possible": 5,
+  "status": "correct"
+}
 ```
 
-#### Run
+Fields:
 
-To run your benchmark and obtain results for a specific task and model, follow these steps:
+- `instance_id`: Question identifier
+- `exam_id`: Exam identifier (links to exams_metadata.json)
+- `question_type`: Type of question (`SingleChoice`, `MultipleChoice`, `True/False Questions`, `ShortAnswerQuestion`)
+- `llm_answer`: LLM's answer
+- `correct_answer`: Correct answer
+- `points_earned`: Points the LLM earned
+- `points_possible`: Maximum points for this question
+- `status`: `correct`, `incorrect`, `partial`, or `error`
 
-1. Review the `run.sh` script to understand the expected commands and parameters.
-2. Execute the `run.sh` script to start the benchmark. The script will guide you through the process and generate the results.
+### 2. Full debugging information (`results_detailed.jsonl`)
 
-```bash
-./run.sh "gpt-4o" 
-```
+Extended format with prompts and LLM explanations (for debugging).
 
-or
+### 3. Aggregated statistics (`summary.json`)
 
-```bash
-python3 src/main.py --model_name $MODEL_NAME # default output: ./outputs/system_course)bench___${MODEL_NAME}___$(date +"%Y-%m-%d_%H-%M-%S")
+Overall performance and breakdown by exam with answered/unanswered/correct/incorrect counts.
+
+### 4. LLM vs student performance (`comparison.json`)
+
+Compares LLM performance against real student baseline data.
 
-# or specify the save path
-python3 src/main.py --model_name $MODEL_NAME --save_path ./outputs/BAISysEducation___${MODEL_NAME}___$(date +"%Y-%m-%d_%H-%M-%S")
+## Data format
+
+The benchmark data is stored in `data/benchmark/`:
+
+- `exams_metadata.json`: Exam-level metadata (one entry per exam)
+- `questions.jsonl`: Individual questions (one JSON object per line that links to an exam from `exams_metadata.json` via `exam_id`)
+
+## How to extend the benchmark
+
+Consider this [MIT 6.824 Distributed Systems quiz](https://pdos.csail.mit.edu/6.824/quizzes/q25-2-sol.pdf). The steps below show how to add this exam to the benchmark. The same process applies to any course exam you want to include.
+
+### Step 1: Add exam metadata to `exams_metadata.json`
+
+Create a unique `exam_id` for your exam. Here's the actual entry for the Spring 2024 Exam II:
+
+```json
+{
+  "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii",
+  "test_paper_name": "6.5840 Distributed System Engineering: Spring 2024 Exam II",
+  "course": "Distributed System Engineering",
+  "year": 2024,
+  "score_total": 71,
+  "score_max": 71.0,
+  "score_avg": 56.61,
+  "score_median": 57,
+  "score_standard_deviation": 9.13,
+  "num_questions": 14
+}
+```
+
+### Step 2: Add individual questions to `questions.jsonl`
+
+Append your questions to the file. Each line is a JSON object. Here's an example from the exam (a True/False question about FaRM):
+
+```json
+{
+  "instance_id": 33,
+  "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii",
+  "problem_num": 4,
+  "points": 8,
+  "problem": "# III FaRM  \n\nConsider the following statements about FaRM as described in No compromises: distributed transactions with consistency, availability, and performance. For each statement, circle True or False.  \n\n4. [8 points]:  \n\nTrue / False : Because FaRM uses primary-backup replication for a region (instead of Paxos), FaRM must reconfigure to remove a failed replica before FaRM can continue to use the region.  \n\nTrue / False : FaRM can use short leases (10ms by default) because it has communication and scheduling optimizations to renew leases quickly.  \n\nTrue / False : A transaction that modifies only one object will never abort.  \n\nTrue / False : Read-only transactions require only the validate step of the Commit phase in Figure 4.  ",
+  "answer": "True,True,False,True",
+  "explanation": "Answer: True, True, False, True. The first statement is true because FaRM requires a response from all replicas, thus it must reconfigure to remove the failed replica before it can continue with the affected shard. The third statement is false because another transaction may modify the one object causing this transaction's validation phase to fail (because the other transaction will have incremented the object's version number).",
+  "type": "True/False Questions"
+}
 ```
 
-### Output Description
+Required fields:
+
+- `instance_id`: Globally unique number (use next available number)
+- `exam_id`: Must match the `exam_id` from Step 1
+- `problem_num`: Question number within the exam (1, 2, 3, ...)
+- `points`: Points allocated to this question
+- `problem`: The question text
+- `answer`: Correct answer
+  - For SingleChoice: `"A"`, `"B"`, etc.
+  - For MultipleChoice: `"A,B,C"` (comma-separated, no spaces)
+  - For True/False: `"True,False,True"` (one per sub-question)
+  - For ShortAnswerQuestion: The model answer text
+- `explanation`: Explanation of the correct answer
+- `type`: One of `"SingleChoice"`, `"MultipleChoice"`, `"True/False Questions"`, `"ShortAnswerQuestion"`
+
+> Note: Questions should be sorted by `exam_id` then `instance_id`
+
+After adding the exam and questions, run `./test.sh` as a sanity check to valid the data format. This will also run in the CI pipeline.
+
+## Question types and evaluation
+
+| Type                 | Answer Format       | Evaluation Method | Partial Credit?                    |
+| -------------------- | ------------------- | ----------------- | ---------------------------------- |
+| SingleChoice         | `"A"`               | Exact match       | No                                 |
+| MultipleChoice       | `"A,B,C"`           | Subset check      | Yes (2 points for partial correct) |
+| True/False Questions | `"True,False,True"` | Exact match       | No                                 |
+| ShortAnswerQuestion  | Free text           | LLM-as-judge      | Yes (scored 0 to max points)       |
+
+For short-answer questions, an LLM evaluates the answer based on accuracy, completeness, logical consistency, and clarity.
+
+## Training data templates
+
+See the example files in:
 
-- `result.jsonl`: Detailed output information
-- `summary.json`: Summary of model results
-  - `reference`: Original test scores (ground truth student performance)
-  - `score`: Test scores
-  - `score_by_test_paper`: Test score by test paper
+- `data/sft/course_exam_sft_example.jsonl`: Format for supervised fine-tuning
+- `data/pretrain/course_exam_pretrain_example.jsonl`: Format for pre-training
diff --git a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper.jsonl b/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper.jsonl
deleted file mode 100644
index a14c09bc..00000000
--- a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper.jsonl
+++ /dev/null
@@ -1,54 +0,0 @@
-{"instance_id":1,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":1,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# I MapReduce  \n\nThe MapReduce paper (MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat) says in Section 3.1 that the intermediate key space is partitioned among the R reduce tasks using hash(key) mod R.  \n\nThea is running the word-count MapReduce job (pseudo-code in the paper\u2019s Section 2.1) on a cluster with 10 worker machines. M is 20 and R is 40. There are no failures, the network is reliable, no machines are slower than expected, and there is no competing work on any of the machines or networks involved. The Map input is divided into 20 pieces of 16 megabytes each.  \n\n1. [5 points]: By mistake, the hash(key) function Thea is using with MapReduce always returns 1. What effect will that have on the execution of the word-count job, compared to using a well-behaved hash function? Circle the single best answer.  A. the job will produce incorrect final output   \n   B. 10 times as much total CPU time will be needed for Reduce phase   \n   C. 10 times as much total wall-clock time will be needed for Reduce phase   \n   D. 40 times as much total CPU time will be needed for Reduce phase   \n   E. 40 times as much total wall-clock time will be needed for Reduce phase   \n   F. the job will never complete  ","answer":"C","explanation":"Answer: C. The total amount of computation is unchanged, but it\u2019s all done by one worker rather than divided up in parallel among 10 workers. B is not correct because the total amount of work doesn\u2019t change; the only thing that the hash function changes is which worker does the work.  ","type":"SingleChoice"}
-{"instance_id":2,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":2,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# II VMware FT  \n\nThe paper The Design of a Practical System for Fault-Tolerant Virtual Machines by Scales et al. describes a method for avoiding split-brain using shared storage. Instead of using the shared storage\u2019s test-and-set, Ben implements test-and-set using the kvsrv server from Lab 2 as follows:  \n\n```\nfunc test-and-set(clnt *tester.Clnt) bool {\n    val, version, err := clnt.Get(\"lock\")\n    if err != rpc.OK {\n        return false\n    }\n    if val == \"set\" {\n        return false\n    }\n    if err := clnt.Put(\"lock\", \"set\", version); err == rpc.OK {\n        return true\n    }\n    return false\n}\n```\n\nThe clnt.Put and clnt.Get are RPCs that invoke the server\u2019s Put and Get methods. You can assume that Ben has implemented the Put and Get methods correctly.  \n\nThe initial value of \u201clock\u201d is the empty string.  \n\nWhen the primary or the backup suspect that the other one has crashed, they invoke test-go-live, each with their own RPC client clnt:  \n\n```\nfunc test-go-live() {\n    for true {\n        if test-and-set(clnt) {\n            go-live()\n            return\n        }\n    }\n}\n```\n\nThe network may lose, delay, or duplicate a few messages, but most messages will be delivered.   \nThe computers (primary, backup, and kvsrv server) do not fail.  \n\n2. [5 points]: What statements about Ben\u2019s implementation are true? (Circle all that apply)  \n   A. Both the primary and backup may observe test-and-set returning true in test-go-live, and \u201cgo live\u201d, resulting in split brain   \n   B. The key\/value server may never store \u201cset\u201d for the \u201clock\u201d key   \n   C. The primary and backup may spin forever in test-go-live, retrying test-and-set, because it may never return true   \n   D. If all RPCs succeed with no timeouts while running test-go-live, either the primary or the backup will observe true from test-and-set, but not both  ","answer":"C","explanation":"Answer: C is true: the first Put may change lock to set, but the reply may be lost; a re-send will return ErrMaybe (since the version won\u2019t match); so neither primary nor backup will ever see rpc.OK from Put. D is true: if there are no timeouts (i.e. no packets are lost) the first Put to arrive at the kvsrv will succeed, and the sender will get rpc.OK. A is false, because Put is conditional and only one can set val to \u201cset\u201d. B is false, because most messages will be delivered and thus eventually a Put will succeed in setting the lock.  ","type":"SingleChoice"}
-{"instance_id":3,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":3,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# III Linearizability  \n\nAlyssa is experimenting with a linearizable put\/get key\/value storage service. Unlike Lab 2, her key\/value service has no versions; put calls look like put(key, value).  \n\nAlyssa has two clients. Client C1 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t + 1)\n```\n\nAt about the same time, client C2 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t * 2)\n```\n\nBefore either client starts, the value for key $^ { \\mathrm { 6 6 } } \\mathrm { X } ^ { \\mathrm { 7 } }$ in the storage system is 10. Both clients\u2019 calls complete without error. There is no other activity involving the storage system, and there are no failures.  \n\nSuppose the history of the execution, in the style of Lecture 4, with values omitted, looks like this:  \n\nC1: |--Rx?--| |--Wx?--|\nC2: |--Rx?--| |--Wx?--|\n\n3. [5 points]: After both clients have finished, what could the resulting value of x be in the storage system? (Circle all that apply)  \n   A. 10   \n   B. 11   \n   C. 20   \n   D. 21   \n   E. 22  ","answer":"B,C","explanation":"Answer: 11 and 20. Both C1\u2019s read and C2\u2019s read see the initial value of $\\mathbf { X }$ (10), so C1 writes 11 and C2 writes 20. The writes are concurrent, so linearizability allows either write to appear to execute last, and thus provide the final value.  ","type":"MultipleChoice"}
-{"instance_id":4,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":4,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# III Linearizability  \n\nAlyssa is experimenting with a linearizable put\/get key\/value storage service. Unlike Lab 2, her key\/value service has no versions; put calls look like put(key, value).  \n\nAlyssa has two clients. Client C1 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t + 1)\n```\n\nAt about the same time, client C2 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t * 2)\n```\n\nBefore either client starts, the value for key $^ { \\mathrm { 6 6 } } \\mathrm { X } ^ { \\mathrm { 7 } }$ in the storage system is 10. Both clients\u2019 calls complete without error. There is no other activity involving the storage system, and there are no failures.  \n\nSuppose the history of the execution, in the style of Lecture 4, with values omitted, looks like this:  \n\nC1: |--Rx?--| |--Wx?--|\nC2: |--Rx?--| |--Wx?--|\n\nAlyssa resets the value of $^ { 6 6 } \\mathrm { X } ^ { 7 3 }$ to 10, and re-runs the two client programs. This time, the execution history looks like this:  \n\nC1: |--Rx?--|    |--Wx?--|\nC2:     |---Rx?---|    |--Wx?--|  \n\n4. [5 points]: After both clients have finished, what could the resulting value of x be in the storage system? (Circle all that apply)  \n   A. 10   \n   B. 11   \n   C. 20   \n   D. 21   \n   E. 22  ","answer":"B,C,E","explanation":"Answer: 11, 20, and 22. 22 is possible if C2\u2019s read sees C1\u2019s write.  ","type":"MultipleChoice"}
-{"instance_id":5,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":5,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# IV GFS  \n\nConsider GFS as described in The Google File System by Ghemawat et al.  \n\n5. [5 points]: Which statements about GFS are true? (Circle all that apply)  \n   A. GFS ensures linearizability of client operations by allowing clients to read from chunk replicas.   \n   B. The primary server of a chunk ensures that Append operations are executed exactly once.   \n   C. A chunk server uses 64 Mbytes of disk space for each chunk.   \n   D. Leases help ensure that each chunk has only one primary.  ","answer":"D","explanation":"Answer:  D. A is false, because GFS allows reading chunks from backups, which may have not seen the last update to a chunk, violating linearizability. B is false because if an Append fails, the client retries the Append, which the primary executes, causing some Appends to be execute twice. C is false; the paper\u2019s Section 2.5 says that chunks are stored as Linux files and are extended only as needed, with disk space allocated lazily; this means that if only a few bytes of a chunk are written, only that part of the Linux chunk file will consume disk space. D is true; Section 3.1 says that the coordinator grants a chunk\u2019s lease to just one of the replicas, and only grants the lease to a different replica if the lease expires.  ","type":"SingleChoice"}
-{"instance_id":6,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":6,"points":10,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# V Raft  \n\nRefer to Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version).  \n\n6. [10 points]: Which statements about Raft are true? (Circle all that apply)  \n   A. If a follower receives an AppendEntries RPC from the leader and the follower\u2019s term matches the one in the RPC, then the prevLogIndex in the RPC must be equal to or higher than the follower\u2019s lastApplied   \n   B. Raft is optimized for the case that term switch happen frequently   \n   C. Raft guarantees that a leader in term $t$ is leader in term $t + 1$   \n   D. If a leader sends the command in log index $i$ on the apply channel, the leader must have persisted log index $i$   \n   E. If a follower crashes in a term and quickly reboots, it remembers who it voted for before the crash   \n   F. The leader\u2019s matchIndex for a peer is always equal to or smaller than the leader\u2019s nextIndex for that peer.   \n   G. A candidate who becomes leader sends out AppendEntries to all followers to suppress further elections   \n   H. If Raft doesn\u2019t use snapshots, a crashed follower will send all committed log entries on the apply channel after it reboots, even ones that it sent before the crash  ","answer":"D,E,F,G,H","explanation":"Answer: D, E, F, G, H are true.  \nA is false, because an AppendEntries RPC from the leader may be delayed and arrive after later AppendEntries RPCs that bump up lastApplied; when the follower processes the first RPC, the prevLogIndex may be smaller than its lastApplied. B is false, because the authors believe terms change infrequently and therefore don\u2019t think the fast-backup optimization is necessary.  ","type":"MultipleChoice"}
-{"instance_id":7,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":7,"points":10,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# V Raft  \n\nRefer to Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version).  \n\n[10 points]: Which of the following bugs causes a Raft implementation to violate the safety properties listed in Figure 3? (Circle all that apply)  \nA. A deadlock in a follower   \nB. A follower who starts an election very quickly   \nC. A partitioned leader who on rejoining updates its term to the new leader\u2019s term and sends AppendEntries for commands in its log with the new term   \nD. A race condition in the follower\u2019s implementation that causes two followers to send different commands on the apply channel for log index $i$   \nE. A candidate that forgets to vote for itself   \nF. A follower who appends a log entry to its log even if the term in the AppendEntries is smaller than its own and who then sends the log entry on the apply channel   \nG. A follower that forgets to implement the rollback optimization presented at the end of section 5.3   \nH. A leader who always sends only one entry in an AppendEntries RPC to a follower","answer":"A,B,E,G,H","explanation":"Answer: A, B, E, G, and H are examples of what are called \u201cliveness\u201d bugs: these bugs don\u2019t cause wrong behavior but may prevent any progress. A: A deadlock in the follower may cause Raft to not make forward progress at some point (e.g., if the follower is necessary to form a majority). B may prevent a leader from being elected but it doesn\u2019t violate the safety properties. E is another variation of B. G and H may cause Raft to run slowly but that doesn\u2019t violate the safety properties.   C, D, F, on the other hand, are \u201csafety\u201d bugs that cause incorrect behavior that violates the safety rules of Raft of Figure 3.  ","type":"MultipleChoice"}
-{"instance_id":8,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":8,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# VI Lab 3A-3C  \n\nGeorge is implementing Raft as in Lab 3A-3C. Eager to test his implementation, George runs a git pull to get the latest changes from the 6.5840 staff. The latest changes introduce a new test, TestMiniReElection, which tests whether Raft can re-elect a leader after a single network partition.  \n\n```\nfunc TestMiniReElection(t *testing.T) {\n    servers := 3 \/\/ initialize three servers\n    ...\n    \/\/ wait for a leader to be elected; get the leader\u2019s index\n    leader1 := ts.checkOneLeader()\n    \n    ts.g.DisconnectAll(leader1) \/\/ disconnect leader1 from other servers\n    \n    \/\/ wait for a new leader to be elected; get the leader\u2019s index\n    leader2 := ts.checkOneLeader() \/\/ ***\n}\n```\n\nts.checkOneLeader() repeatedly polls only the connected servers until one of the connected servers returns that it is a leader. If it cannot find a leader within 5s, it returns a timeout error.  \n\nUnfortunately, there is a bug in the Raft test infrastructure. When leader1 is disconnected, leader1 can still send RPCs to the other servers but not receive responses from the other servers. George runs TestMiniReElection, and finds that the test fails at the line marked with \u201c\\*\\*\\*\u201d with the timeout error \u201cexpected one leader, got none\u201d.  \n\n8. [5 points]: Assume George\u2019s Raft implementation is completely correct, and that the network is reliable. Briefly explain why the buggy test infrastructure causes George\u2019s implementation to fail the new test.  ","answer":"Answer: All the peers will continue to receive leader1\u2019s heartbeat AppendEntries RPCs, which will prevent them from ever starting an election.  ","explanation":"Answer: All the peers will continue to receive leader1\u2019s heartbeat AppendEntries RPCs, which will prevent them from ever starting an election.  ","type":"ShortAnswerQuestion"}
-{"instance_id":9,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":9,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems, by Hunt, Konar, Junqueira, and Reed, and to Lecture 9.  \n\nAlyssa runs a ZooKeeper service with a ZooKeeper leader and multiple followers. Alyssa has three ZooKeeper client programs, P1, P2, and P3:  \n\n```\nP1:\n    s = openSession()\n    if create(s, \"\/leader\", \"one\", flags=ephemeral) == true:\n        print \"P1 starting as leader\"\n        _, version = getData(s, \"\/x\", watch=false)\n        setData(s, \"\/x\", \"one\", version)\n        _, version = getData(s, \"\/y\", watch=false)\n        setData(s, \"\/y\", \"one\", version)\n```\n\n```\nP2:\n    s = openSession()\n    if create(s, \"\/leader\", \"two\", flags=ephemeral) == true:\n        print \"P2 starting as leader\"\n        _, version = getData(s, \"\/x\", watch=false)\n        setData(s, \"\/x\", \"two\", version)\n        _, version = getData(s, \"\/y\", watch=false)\n        setData(s, \"\/y\", \"two\", version)\n        print \"P2 done\"\n```\n\n```\nP3:\n    s = openSession()\n    sync(s, \"\/\")\n    x = getData(s, \"\/x\", watch=false)\n    y = getData(s, \"\/y\", watch=false)\n    print x, y\n```\n\nInitially, znode \u201c\/leader\u201d does not exist, znode \u201c\/x\u201d exists and contains the string \u201cempty\u201d, and znode \u201c\/y\u201d exists and also contains the string \u201cempty\u201d.  \n\nThe ZooKeeper calls in Alyssa\u2019s code are all synchronous. The ZooKeeper client call create() is exclusive, returning false if the file already exists, and true if it was able to create the file. The programs might end up talking to different ZooKeeper followers.  \n\nAlyssa starts P1, waits until she sees it print \u201cP1 starting as leader\u201d, then (on a different computer) starts P2. Just at this point in time, P1\u2019s network connection starts to become slow and unreliable, so that sometimes it delivers packets, sometimes not. Alyssa sees that P2 prints \u201cP2 starting as leader\u201d, and after a little while \u201cP2 done\u201d. P2\u2019s network connection is reliable and fast.  \n\nAfter Alyssa sees \u201cP2 done\u201d, she runs P3.  \n\n9. [5 points]: What output from P3 could Alyssa see? (Circle all that apply)  \n   A. one, one \n   B. two, two \n   C. one, two \n   D. two, one  ","answer":"B","explanation":"Answer: B. Only two, two. We know P1\u2019s session must have terminated, because Alyssa saw P2 print \u201cP2 starting as leader,\u201d which could only have happened if ZooKeeper deleted P1\u2019s ephemeral \/leader file. So P2 will only start reading and writing data after P1 is guaranteed to have stopped writing (since ZooKeeper terminated its session). So P1 and P2\u2019s activities won\u2019t be intermixed; P2 runs strictly after P1. So both P2\u2019s sets will succeed. P3 starts after P2 finishes, and P3 calls sync(), so P3 will see P2\u2019s writes.  ","type":"SingleChoice"}
-{"instance_id":10,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":10,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# VIII Distributed Transactions  \n\nAlyssa has a database that supports serializable transactions. Records \u201cx\u201d and \u201cy\u201d both start out containing the value 1. Alyssa starts three transactions at the same time:  \n\n```\nT1:\n    BEGIN-X\n        temp1 = get(\"x\")\n        temp2 = get(\"y\")\n        put(\"x\", temp1 + temp2)\n    END-X\n\nT2:\n    BEGIN-X\n        temp1 = get(\"y\")\n        put(\"x\", temp1 * 2)\n    END-X\n\nT3:\n    BEGIN-X\n        put(\"y\", 3)\n    END-X\n```\n\nBEGIN-X marks the start of a transaction, and END-X marks the end. All three transactions commit and finish. There are no aborts, deadlocks, or failures. There is no other activity in the database.  \n\nWhen Alyssa looks at record \u201cx\u201d in the database after the transactions complete, she sees the value 5.  \n\n10. [5 points]: Briefly explain how the value 5 could have resulted from these transactions.  ","answer":"Answer: The database system could have executed the transactions one at a time, in the order T2, T3, T1.  ","explanation":"Answer: The database system could have executed the transactions one at a time, in the order T2, T3, T1.  ","type":"ShortAnswerQuestion"}
-{"instance_id":11,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":11,"points":5,"score_total":65,"score_max":65.0,"score_median":52,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# IX Spanner  \n\nRefer to Spanner: Google\u2019s Globally-Distributed Database, by Corbett et al.  \n\n11. [5 points]: Suppose you only wanted to support read-write transactions (not read-only and not snapshot reads). You want therefore to eliminate all of the Spanner mechanisms that are not needed for read-write transactions. Which of these techniques can be eliminated? (Circle all that apply)  \n    A. commit wait (Sections 4.1.2, 4.2.1)   \n    B. safe time (Section 4.1.3)   \n    C. deadlock avoidance (Section 4.2.1)   \n    D. assignment of timestamps to read\/write transactions (Section 4.1.2)  ","answer":"A,B,D","explanation":"vAnswer: A, B, and D. The time-stamp mechanism is only needed for read-only transactions. Read-write transactions are made serializable and externally consistent by Spanner\u2019s two-phase locking and two-phase commit; the correctness of read-write transactions thus does not rely on time-stamps.  ","type":"MultipleChoice"}
-{"instance_id":12,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":1,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n1. [6 points]: Roughly when is the first time at which the sort application\u2019s Reduce() function is called? Circle the best answer.  \n   - A. 0 seconds\n   - B. 50 seconds\n   - C. 150 seconds\n   - D. 300 seconds  ","answer":"C","explanation":"Answer: C. The best answer is 150 seconds. No Reduce function can be called until every Map function has finished; the top graph suggests that the Maps stop running around 150 seconds, and the paper text mentions 200 seconds.  ","type":"SingleChoice"}
-{"instance_id":13,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":2,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n2. [7 points]: Roughly how long does it take a single application Reduce function to sort its share of the data (just the sort, not including either the shuffle or the writing of the output)? Circle the best answer.  \n\n- A. 10 seconds\n- B. 75 seconds\n- C. 200 seconds\n- D. 250 seconds\n- E. 650 seconds  \n- F. None of the above answers are correct.","answer":"F","explanation":"Answer: F. This question is broken: the application Reduce function does not sort the data. MapReduce\u2019s reduce task framework does the sort, and (for this application) the application Reduce function just returns its argument.","type":"SingleChoice"}
-{"instance_id":14,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":3,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n[6 points]: Why are there two bumps in the Shuffle graph? That is, why does the Shuffle graph go up and then down from time 20 to 200, remain at zero for 100 seconds, and then go up and then down from time 300 to 600? Circle the best answer.  \n\nA. There are more Map tasks $( \\mathbf { M } = 1 5 , 0 0 0 )$ than there are machines.   \nB. There are more Reduce tasks $\\begin{array} { r } { \\mathrm { R } = 4 0 0 0 \\mathrm { , } } \\end{array}$ ) than there are machines.   \nC. There are more Map tasks than there are Reduce tasks.   \nD. The aggregate network throughput is smaller than the aggregate disk throughput.   \nE. The Map tasks consume more CPU time than the Reduce tasks.  ","answer":"B","explanation":"Answer: B. The best answer is the second one (more Reduce tasks than machines). Intermediate data can only be moved from Map machines to Reduce machines for Reduce tasks that have been allocated to machines. There are only 1800 machines, so at first only 1800 of the 4000 Reduce tasks are assigned to machines, so only about 1800\/4000ths of the shuffles can happen at first. That\u2019s the first bump. The second bump starts once the first set of Reduce tasks finishes, moving intermediate data to the machines that will run the remaining Reduces.","type":"SingleChoice"}
-{"instance_id":15,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":4,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n[7 points]: Why does the shuffle begin a long time before the Map phase has finished? Circle the best answer.  \n\nA. There are more Map tasks $( \\mathbf { M } = 1 5 , 0 0 0 )$ ) than there are machines.\nB. There are more Reduce tasks $\\begin{array} { r } { \\mathrm { { R } } = 4 0 0 0 \\mathrm { { } } } \\end{array}$ ) than there are machines.\nC. There are more Map tasks than there are Reduce tasks.\nD. The aggregate network throughput is smaller than the aggregate disk throughput.   \nE. The Map tasks consume more CPU time than the Reduce tasks.  ","answer":"A","explanation":"Answer: A. The best answer is the first one (more Map tasks than machines). Shuffles can start as soon as Map functions finish. The system runs 1800 Maps at a time; the first of these finishes a long time before the last of the 15,000 Maps finishes at time 200.  ","type":"SingleChoice"}
-{"instance_id":16,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":5,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# II Linearizability  \n\nThese questions concern the material from Lecture 4, Consistency and Linearizability.  \n\nYou have a service whose state is a single string, and that exposes two RPC operations to clients: one operation appends the RPC argument to the state, and the other RPC operation returns the current state. The timelines below indicate the start time, end time, argument string, and reply string for each client operation. Ax indicates an append operation with argument $\\mathbf { X }$ , and Ry indicates a read operation to which the server replied y. The vertical bars indicate the start and end times of each operation (the times at which the client sends the request, and receives the reply). The service\u2019s state string starts out empty at the beginning of each history.  \n\nFor example,  \n\nC1:|---Ax---|\n\nC2:        |---Ay---|\n\nC3:    |--Ryx--|  \n\nmeans that client C1 sent an append RPC with \u201cx\u201d as the argument, C2 sent an append RPC with \u201cy\u201d as the argument, and C3 read the state and received the reply \u201cyx\u201d.  \n\nConsider this history, in which the reply string sent to C4 has been omitted:  \n\nC1:|---Ax---|  \nC2:    |---Ay---|  \nC3:        |---Az---|  \nC4:                |--R?--|  \n\n5. [6 points]: Which values could C4\u2019s read yield that are consistent with linearizability? Circle all of the correct answers. \n   - A. xzy\n   - B. yxz\n   - C. yzx\n   - D. xy\n   - E. xz\n   - F. yx\n   - G. zy  ","answer":"A,B,D,F","explanation":"Answer: A,B,D,F. xzy, yxz, xy, and yx. The result C4 receives can\u2019t start with z (since the Az starts after the Ax finishes); if both x and z appear, x must come first; and it must include both x and y (since Ax and Ay both finish before the C4\u2019s read starts).  ","type":"MultipleChoice"}
-{"instance_id":17,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":6,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# II Linearizability  \n\nThese questions concern the material from Lecture 4, Consistency and Linearizability.  \n\nYou have a service whose state is a single string, and that exposes two RPC operations to clients: one operation appends the RPC argument to the state, and the other RPC operation returns the current state. The timelines below indicate the start time, end time, argument string, and reply string for each client operation. Ax indicates an append operation with argument $\\mathbf { X }$ , and Ry indicates a read operation to which the server replied y. The vertical bars indicate the start and end times of each operation (the times at which the client sends the request, and receives the reply). The service\u2019s state string starts out empty at the beginning of each history.  \n\nFor example,  \n\nC1:|---Ax---|\n\nC2:        |---Ay---|\n\nC3:    |--Ryx--|  \n\nmeans that client C1 sent an append RPC with \u201cx\u201d as the argument, C2 sent an append RPC with \u201cy\u201d as the argument, and C3 read the state and received the reply \u201cyx\u201d.  \n\nConsider this history, in which the reply string sent to C4 has been omitted:  \n\nC1:|---Ax---|  \nC2:    |---Ay---|  \nC3:        |---Az---|  \nC4:                |--R?--|  \n\nNow look at this history:  \n\nC1: |-------Ax-------|  \nC2:         |---Ay---|  \nC3:     |--Ry--|  \nC4:             |----R?----|  \n\n6. [7 points]: Which values could C4\u2019s read yield that are consistent with linearizability? Circle all of the correct answers. \n   - A. y \n   - B. x \n   - C. yx \n   - D. xy  ","answer":"A,C","explanation":"Answer: A,C. y and yx. The fact that C3 read y, and that C3\u2019s read finished before C4\u2019s read started, means that C4\u2019s result must include y, and, if it includes x, the x must come after y.  ","type":"MultipleChoice"}
-{"instance_id":18,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":7,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# III GFS and Raft  \n\nAfter reading the GFS paper (The Google File System by Ghemawat et al.) and the Raft paper (Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version)), Ben replaces the GFS master with a new coordinator that uses Raft. The Raft-based coordinator provides the same functions as before but replicates the log of operations using 3 Raft peers. All other parts of GFS stay the same.  \n\nWhich of the following statements are true? (Circle all that apply)  \n\n7. [6 points]:  \n\n- A. The coordinator can continue operation in the presence of network partitions without any additional monitoring infrastructure, if one partition with peers is able to achieve a majority.   \n- B. The coordinator can continue operation correctly even if one of the 3 peers has failed (and there are no other failures).   \n- C. None of the above are true  ","answer":"A,B","explanation":"Answer: Both A and B are true; these are properties of Raft.  ","type":"MultipleChoice"}
-{"instance_id":19,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":8,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# III GFS and Raft  \n\nAfter reading the GFS paper (The Google File System by Ghemawat et al.) and the Raft paper (Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version)), Ben replaces the GFS master with a new coordinator that uses Raft. The Raft-based coordinator provides the same functions as before but replicates the log of operations using 3 Raft peers. All other parts of GFS stay the same.  \n\nWhich of the following statements are true? (Circle all that apply)  \n\nBen also considers using Raft for chunk replication. He runs many Raft clusters and has the GFS master assign chunks to a specific Raft cluster (i.e., each chunk is assigned to one Raft cluster, consisting of a leader and two followers). GFS clients submit write and append operations for a chunk to the leader of the Raft cluster for that chunk (i.e., Ben\u2019s design doesn\u2019t implement the separate data flow). The leader of the Raft cluster replicates write and append operation using the Raft library. All other parts of GFS (e.g., assigning leases to the leader, client caching locations of chunk servers, reading from the closest server, and so on) stay the same. (You can assume that chunk servers have enough disk space for operations to succeed.)  \n\nWhich of the following statements are true? (Circle all that apply)  \n\n8. [7 points]:  \n\n- A. Unlike the old design, Ben\u2019s design can achieve linearizability for chunk operations.   \n- B. Unlike the old design, Ben\u2019s design can continue operation despite the failure of one chunk server.   \n- C. By using Raft, Ben\u2019s design allows clients to perform more mutating chunk operations per second than the old design.   \n- D. Raft\u2019s snapshots allow a chunk server to catch up in a few seconds if has been down for a long time (assuming the same network as in the GFS paper).   \n- E. None of the above are true  ","answer":"E","explanation":"Answer: E. None of the above are true. A is false because the client\u2019s cache that maps file names to chunk handles can yield stale results. B is false because the old design can continue despite one failure as well. C is false because Ben\u2019s scheme moves data less efficiently (via the leader, rather than the separate data flow). D is false because the snapshot mechanism sends the leader\u2019s entire database of chunks, which will likely take far longer than a few seconds.  ","type":"MultipleChoice"}
-{"instance_id":20,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":9,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# IV Raft  \n\nConsider the Raft paper (Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version)). Ben wonders what the impact of network behavior is on Raft\u2019s performance. Ben runs a Raft-replicated server that receives many client requests. If the network delivers AppendEntries RPCs in order, Ben\u2019s Raft implementation is fast (i.e., completes many client requests per second). But, if the network delivers AppendEntries frequently out of order, Ben\u2019s Raft implementation performs badly (i.e., completes fewer client requests per second). Using the rules in Figure 2 explain why this is the case.  \n\n9. [6 points]:  ","answer":"Answer: This question is broken. Figure 2 implies that each AppendEntries should include all as-yet-unacknowledged log entries. So if there are two such RPCs outstanding, the one that was sent second contains a copy of all the log entries in the first. This means that, if the second RPC arrives first, it will be accepted. So it\u2019s not clear why Ben would see any different performance due to out-of-order delivery.  ","explanation":"Answer: This question is broken. Figure 2 implies that each AppendEntries should include all as-yet-unacknowledged log entries. So if there are two such RPCs outstanding, the one that was sent second contains a copy of all the log entries in the first. This means that, if the second RPC arrives first, it will be accepted. So it\u2019s not clear why Ben would see any different performance due to out-of-order delivery.  ","type":"ShortAnswerQuestion"}
-{"instance_id":21,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":10,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# V Lab 3A-3C  \n\nAlyssa is implementing Raft as in Lab 3A-3C. She implements advancing the commitIndex at the leader (i.e., last bullet of Leaders in Fig 2) as follows:  \n\n```go\nfunc (rf *Raft) advanceCommit() {\n    start := rf.commitIndex + 1\n    if start < rf.log.start() { \/\/ on restart start could be 1\n    \tstart = rf.log.start()\n    }\n    for index := start; index <= rf.log.lastindex(); index++ {\n    \tif rf.log.entry(index).Term != rf.currentTerm { \/\/ 5.4\n    \t\tcontinue \/\/ ***\n        }\n        n := 1 \/\/ leader always matches\n        for i := 0; i < len(rf.peers); i++ {\n        \tif i != rf.me && rf.matchIndex[i] >= index {\n        \t\tn += 1\n            }\n        }\n        if n > len(rf.peers)\/2 { \/\/ a majority?\n        \tDPrintf(\"%v: Commit %v\\n\", rf.me, index)\n        \trf.commitIndex = index\n        }\n    }\n}\n```\n\nAssume that all omitted parts of Alyssa\u2019s code are correct.  \n\nBen argues that the line marked with \u201c\\*\\*\\*\u201d could be replaced by a break statement so that the loop terminates immediately.  \n\n10. [7 points]: Explain what could go wrong if one adopted Ben\u2019s proposal; please include a specific sequence of events to illustrate your answer.  ","answer":"Answer: If there\u2019s a term mis-match, the leader won\u2019t be able to commit any further log entries.   \nThe paper\u2019s Figure 8e shows an example of such a scenario.  ","explanation":"Answer: If there\u2019s a term mis-match, the leader won\u2019t be able to commit any further log entries.   \nThe paper\u2019s Figure 8e shows an example of such a scenario.  ","type":"ShortAnswerQuestion"}
-{"instance_id":22,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":11,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VI More lab 3A-3C  \n\nAlyssa is implementing Raft as in Lab 3A-3C. She implements the rule for conversion to follower in her AppendEntries RPC handler as shown below:  \n\n```go\nfunc (rf *Raft) convertToFollower(term int) {\n    rf.state = Follower\n    rf.votedFor = -1\n    rf.currentTerm = term\n    rf.persist()\n}\n```\n\n```go\nfunc (rf *Raft) AppendEntries(args * AppendEntriesArgs,\n                            reply *AppendEntriesReply) {\n    rf.mu.Lock()\n    defer rf.mu.Unlock()\n    if args.Term >= rf.currentTerm {\n        rf.convertToFollower(args.Term)\n    }\n    ...\n}\n```\n\nAssume that all omitted parts of Alyssa\u2019s code are correct.  \n\n11. [6 points]: Describe a specific sequence of events that would cause Alyssa\u2019s implementation to break the safety guarantees provided by Raft.  ","answer":"Answer: The code shown can cause a peer to forget it has cast a vote for the current term. Suppose peer P1 has been elected for this term. The peers that elected it may forget that they voted for P1. Then some other peer P2 may become candidate for this term, and get votes from those forgetful peers, and become a second leader for the same term. This will lead to split brain.  ","explanation":"Answer: The code shown can cause a peer to forget it has cast a vote for the current term. Suppose peer P1 has been elected for this term. The peers that elected it may forget that they voted for P1. Then some other peer P2 may become candidate for this term, and get votes from those forgetful peers, and become a second leader for the same term. This will lead to split brain.  ","type":"ShortAnswerQuestion"}
-{"instance_id":23,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":12,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems by Hunt, Konar, Junqueira, and Reed, and to the notes for Lecture 9.  \n\nThe code fragments below are simplified versions of how something like GFS or MapReduce might use ZooKeeper to elect a coordinator, and for that coordinator to store state such as the assignments of GFS data to chunkservers.  \n\nSuppose server S1 executes the following code to become elected and to then store coordinator state in \/A and \/B. Initially, znode \/coord-lock does not exist, znode \/A starts out containing A0, and znode \/B starts out containing B0.  \n\n```\n    s = openSession()\n    if create(s, \"\/coord-lock\", data=\"S1\", ephemeral=true) == true:\n        setData(s, \"\/A\", \"A1\", version=-1)\n        setData(s, \"\/B\", \"B1\", version=-1)\n```\n\n\n\n12. [7 points]: Briefly explain why, for coordinator election, it makes sense that \/coord-lock should be an ephemeral znode rather than a regular znode.  ","answer":"Answer: If a server is elected as coordinator, and then fails, ZooKeeper automatically deletes the ephemeral \/coord-lock; now another server can create that file and become coordinator.  ","explanation":"Answer: If a server is elected as coordinator, and then fails, ZooKeeper automatically deletes the ephemeral \/coord-lock; now another server can create that file and become coordinator.  ","type":"ShortAnswerQuestion"}
-{"instance_id":24,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":13,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems by Hunt, Konar, Junqueira, and Reed, and to the notes for Lecture 9.  \n\nThe code fragments below are simplified versions of how something like GFS or MapReduce might use ZooKeeper to elect a coordinator, and for that coordinator to store state such as the assignments of GFS data to chunkservers.  \n\nSuppose server S1 executes the following code to become elected and to then store coordinator state in \/A and \/B. Initially, znode \/coord-lock does not exist, znode \/A starts out containing A0, and znode \/B starts out containing B0.  \n\n```\n    s = openSession()\n    if create(s, \"\/coord-lock\", data=\"S1\", ephemeral=true) == true:\n        setData(s, \"\/A\", \"A1\", version=-1)\n        setData(s, \"\/B\", \"B1\", version=-1)\n```\n\nS1\u2019s create() finishes and returns true to indicate success. But just after that, and before ZooKeeper has received S1\u2019s setData() requests, ZooKeeper decides that S1 has failed, and ZooKeeper terminates S1\u2019s session.  \n\nAfter ZooKeeper terminates S1\u2019s session, server S2 runs this to become coordinator:  \n\n```\ns = openSession()\nif create(s, \"\/coord-lock\", data=\"S2\", ephemeral=true) == true:\n    setData(s, \"\/A\", \"A2\", version=-1)\n    setData(s, \"\/B\", \"B2\", version=-1)\n```\n\nHowever, S1 is actually still alive, and it proceeds to send the two setData() requests, and they arrive at ZooKeeper.  \n\nThen client C1 reads \/B and \/A and sees B2 and A2, respectively.  \n\nNow a different client, C2, reads \/B, and then reads \/A. Both reads succeed.  \n\n13. [6 points]: Given the way ZooKeeper works, what can C2 observe? Circle all of the possible read results.  \n        \/B \/A   \n    A. B0 A0   \n    B. B0 A1   \n    C. B0 A2   \n    D. B2 A0   \n    E. B2 A1  ","answer":"A,C","explanation":"Answer: B0 A0 and B0 A2 are the only possible results. B0 is possible because, in the absence of other constraints, ZooKeeper can yield stale data to reads. A1 is never possible because ZooKeeper terminated S1\u2019s session before ZooKeeper receive S1\u2019s setData()s, so ZooKeeper ignore those setData()s. B2 A0 is not possible since, once ZooKeeper has revealed a write to a client, the \u201cLinearizable writes\u201d guarantee in Section 2.3 implies that all previous writes have been applied.  ","type":"MultipleChoice"}
-{"instance_id":25,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":14,"points":7,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VIII Grove  \n\nIn the ApplyReadonly function in Figure 7, Ben decides to delete the check for s.waitForCommitted() The new code is as as follows:  \n\n```\nfunc (s *Server) ApplyReadonly(op) Result {\n\ts.mutex.Lock()\n    if s.leaseExpiry > GetTimeRange().latest {\n        e := s.epoch\n        idx, res := s.stateLogger.LocalRead(op)\n        s.mutex.Unlock()\n        return res\n    } else {\n        s.mutex.Unlock()\n        return ErrRetry\n    }\n}\n```\n\n14. [7 points]: Explain why this modification can result in non-linearizable reads.  ","answer":"Answer: If a Grove backup server reveals an update without waiting to ensure it has been committed, then it may reveal an uncommitted write. If the primary then fails, the backup whose database is used to recover may not have recent uncommitted writes. So the write may disappear, and other clients issuing strictly subsequent reads may not see that write. That would not be linearizable.  ","explanation":"Answer: If a Grove backup server reveals an update without waiting to ensure it has been committed, then it may reveal an uncommitted write. If the primary then fails, the backup whose database is used to recover may not have recent uncommitted writes. So the write may disappear, and other clients issuing strictly subsequent reads may not see that write. That would not be linearizable.  ","type":"ShortAnswerQuestion"}
-{"instance_id":26,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":15,"points":6,"score_total":97,"score_max":95.0,"score_median":78,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# IX Distributed Transactions  \n\nMouseGPT is designing a distributed transaction system using two-phase commit and two-phase locking, as discussed in Lecture 12 and Chapter 9 of the 6.033 reading. The goal is to provide serializable results. The question arises of what should happen if a participant computer crashes while in the PREPARED state for a transaction. MouseGPT thinks that all-or-nothing atomicity would be satisfied if such a transaction were completely forgotten. So MouseGPT designs the system so that if a participant computer crashes and restarts while it is in the PREPARED state for a transaction that it\u2019s part of, the recovery software on that computer un-does any local modifications the interrupted transaction might have performed and releases its locks, and sends a network message to each other participant and to the TC to tell them to undo any changes made by the transaction and to release its locks.  \n\n15. [6 points]: Explain why MouseGPT\u2019s plan would cause the system to produce nonserializable (incorrect) results.  ","answer":"Answer: The TC may have decided to commit the transaction, and sent out COMMIT messages to the other participating workers, and they may have committed, and revealed committed results to other transactions. At that point, there is no way to back out of the transaction without violating serializability and atomicity.  ","explanation":"Answer: The TC may have decided to commit the transaction, and sent out COMMIT messages to the other participating workers, and they may have committed, and revealed committed results to other transactions. At that point, there is no way to back out of the transaction without violating serializability and atomicity.  ","type":"ShortAnswerQuestion"}
-{"instance_id":27,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":1,"points":5,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# I Spanner  \n\nThe intelligent computer HAL is using Spanner (as described in Spanner: Google\u2019s GloballyDistributed Database by Corbett et al.) to store data. HAL notes that read\/write transactions are being slowed down by Spanner\u2019s commit-wait mechanism (see Section 4.2.1). HAL disables commit-wait in his Spanner installation; as a result, everything works just as described in the paper except that the coordinator leader does not wait until the timestamp $s$ is guaranteed to be in the past.  \n\nHAL uses just these three transactions:  \n\n```\nT1:\n    X=1\n    Y=1\nT2:\n    X=22\n    Y=22\nT3:\n    print X, Y\n```\n\nInitially, database records X and Y both have value 0. X and $\\mathrm { Y }$ are in different Spanner shards, managed by different Paxos groups. T1 and T2 are read\/write transactions; T3 is a read-only transaction.  \n\nHAL starts T1; waits for Spanner to say that T1 has completed; starts T2, waits for Spanner to say that T2 has completed; then starts T3 and observes T3\u2019s output.  \n\n1. [5 points]: Which outputs from T3 are possible? (For each statement, circle True or False.)  \n   A. True \/ False : 22, 22  \n   B. True \/ False : 1, 1  \n   C. True \/ False : 1, 22  \n   D. True \/ False : 0, 0  ","answer":"A,B,D","explanation":"Answer: A,B,D. 22,22, 1,1, and 0,0 are all possible; 1,22 is not. Omitting commit-wait means that either or both of T1 and T2 might commit with time-stamps later than the time-stamp that T3 chooses, so T3 might see the result of either T1 or T2, or neither. T3 can\u2019t see 1,22 because both T1 and T2 do both their writes at the same timestamp, so T3 will either see both writes of one of the transactions, or neither.  ","type":"MultipleChoice"}
-{"instance_id":28,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":2,"points":4,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# II Chardonnay  \n\nConsider the paper Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases, by Eldeeb et al.  \n\nA read\/write Chardonnay transaction reads database record A, then reads B, and then writes C. The system is busy with other read\/write transactions at the same time, some of which might also use A, B, and\/or C.  \n\n2. [4 points]: In which situation will Chardonnay\u2019s \u201cdry run\u201d mechanism yield the most benefit? (Circle the single best answer.)  \n   A. A is hot, B is cold.   \n   B. A is cold, B is hot.   \n   C. A is cold, B is cold.   \n   D. A is hot, B is hot.  \n\n\u201cCold\u201d means used rarely. \u201cHot\u201d means used by many transactions.  ","answer":"A","explanation":"Answer: A. Only the first answer (hot, cold) is correct. Chardonnay\u2019s dry run mechanism helps avoid situations in which a transaction holds the lock for a record that other transactions need, while waiting to read a record from the disk. This situation arises when a read\/write transaction uses a hot record followed by a cold record.  \n\n---\n","type":"SingleChoice"}
-{"instance_id":29,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":3,"points":4,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# II Chardonnay  \n\nConsider the paper Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases, by Eldeeb et al.  \n\nA read\/write Chardonnay transaction reads database record A, then reads B, and then writes C. The system is busy with other read\/write transactions at the same time, some of which might also use A, B, and\/or C.  \n\nA system that uses Chardonnay issues just these three transactions:  \n\n```\nT1:\n    X=1\nT2:\n    Y=1\nT3:\n    print X, Y\n```\n\nInitially, both database records (X and Y) start out with value 0. X and Y are in different ranges. T1 and T2 are read\/write transactions. T3 is a read-only transaction (described in the paper\u2019s Section 6). T3 does not use the the waiting idea described in the last paragraph of Section 6.2.  \n\nOne client starts T1. After T1 completes, another client starts T2. After T2 completes, a third client runs T3.  \n\nThis version of Chardonnay has a bug somewhere in its code, causing T3 to print the incorrect output 0,1.  \n\n3. [4 points]: Which of the following bugs is the most plausible explanation for T3 printing   \n   0,1? Circle the single most correct answer.  \n   A. The epoch server is stuck: it always returns the same epoch number, and never increases it.  \n   B. The epoch server is incrementing too quickly: more than once per 10 milliseconds.  \n   C. The epoch server is working correctly except it gave T2 an epoch that was too small.  \n   D. The epoch server is working correctly except it gave T2 an epoch that was too large.  ","answer":"C","explanation":"Answer: C. The third answer is correct. 0,1 is not a correct output because serializability requires that if T3 observes the results of T2, and T1 finished before T2 started, then T3 is required to also see the results of T1. If the epoch server gives T2 an epoch that\u2019s less than T1\u2019s epoch, and T3 and T1 run in the same epoch, then T3 will see T2\u2019s $\\scriptstyle \\ Y = 1$ but not T1\u2019s $\\scriptstyle \\mathrm { X = 1 }$ .  ","type":"SingleChoice"}
-{"instance_id":30,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":4,"points":8,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# III FaRM  \n\nConsider the following statements about FaRM as described in No compromises: distributed transactions with consistency, availability, and performance. For each statement, circle True or False.  \n\n4. [8 points]:  \n\nTrue \/ False : Because FaRM uses primary-backup replication for a region (instead of Paxos), FaRM must reconfigure to remove a failed replica before FaRM can continue to use the region.  \n\nTrue \/ False : FaRM can use short leases (10ms by default) because it has communication and scheduling optimizations to renew leases quickly.  \n\nTrue \/ False : A transaction that modifies only one object will never abort.  \n\nTrue \/ False : Read-only transactions require only the validate step of the Commit phase in Figure 4.  ","answer":"True,True,False,True","explanation":"Answer: True, True, False, True. The first statement is true because FaRM requires a response from all replicas, thus it must reconfigure to remove the failed replica before it can continue with the affected shard. The third statement is false because another transaction may modify the one object causing this transaction\u2019s validation phase to fail (because the other transaction will have incremented the object\u2019s version number).  ","type":"True\/False Questions"}
-{"instance_id":31,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":5,"points":4,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# IV Ray  \n\nConsider the following Ray program, which creates a sqrt task task for each number in the list mylist. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:  \n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\ndef sqrts0(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n    \n    r = [ ]\n    for f in l:\n\t\tr.append(get(f)) # collect the result\n\treturn r\n\nprint(sqrts0(mylist)) # invoke sqrts0 with a list of numbers and print result\n```\n\nAssume Ray behaves in the way described in Ownership: a distributed futures system for finegrained tasks by Wang et al., and Ray is running on a cluster of computers.  \n\n5. [4 points]:  Will the sqrt computations complete in the order that sqrts0 appends to r? (Briefly explain your answer)  ","answer":"Answer: No. The sqrt tasks run concurrently with each other, and may finish in an arbitrary order. All that is guaranteed is that the task has finished executing (at least once) by the time get(f) returns.  ","explanation":"Answer: No. The sqrt tasks run concurrently with each other, and may finish in an arbitrary order. All that is guaranteed is that the task has finished executing (at least once) by the time get(f) returns.  ","type":"ShortAnswerQuestion"}
-{"instance_id":32,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":6,"points":4,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# IV Ray  \n\nConsider the following Ray program, which creates a sqrt task task for each number in the list mylist. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:  \n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\ndef sqrts0(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n    \n    r = [ ]\n    for f in l:\n\t\tr.append(get(f)) # collect the result\n\treturn r\n\nprint(sqrts0(mylist)) # invoke sqrts0 with a list of numbers and print result\n```\n\nAlyssa creates a function sqrts1 whose body is the same as sqrts0, but is declared as a remote task. She then modifies the program to invoke many sqrts1\u2019s, each with a large distinct, nonoverlapping slice of the number list. The code is as follows:  \n\n```\n@ray.remote\ndef sqrts1(n_list):\n    ...\n    # same code as sqrts0\n    ...\n    return r\n    \nf0 = sqrts1(mylist[...])\nf1 = sqrts1(mylist[...])\nf2 = sqrts1(mylist[...])\n...\nprint(get(f0))\nprint(get(f1))\n...\n```\n\n6. [4 points]:  Ben is worried that the above program creates so many sqrt tasks tasks that Ray will be bottle-necked by managing the tasks and the futures they yield. Briefly explain why Ray can manage many tasks in parallel for the above program?  ","answer":"Answer: The worker machine that invokes sqrts1(...) is the owner of the metadata for the value returned by each sqrts1 call. The many workers that execute sqrts1() each independently own the metadata for their sqrt task\u2019s, resulting in no one machine being required to manage all the sqrt tasks.  ","explanation":"Answer: The worker machine that invokes sqrts1(...) is the owner of the metadata for the value returned by each sqrts1 call. The many workers that execute sqrts1() each independently own the metadata for their sqrt task\u2019s, resulting in no one machine being required to manage all the sqrt tasks.  ","type":"ShortAnswerQuestion"}
-{"instance_id":33,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":7,"points":5,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# V Memcache at Facebook  \n\nBen Bitdiddle runs a web site. Ben reads the paper Scaling Memcache at Facebook by Nishtala et al., and thinks that the design is too complex. So Ben decides to ignore the paper\u2019s design: he doesn\u2019t use leases, mcrouter, pools, etc. Ben uses only the mechanisms described below.  \n\nBen has just a single region, with some web servers, some memcache servers, and a single database server. Ben programs each of his web servers to use the following client code to read and write data:  \n\n```\nread(k):\n    if v = memcache_get(k) succeeds\n    \treturn v\n    else\n\t\treturn database_get(k)\n\nwrite(k, v):\n    database_put(k, v)\n    memcache_put(k, v)\n```\n\nNote that read() does not insert anything into memcache, and note that write() always inserts the new data into memcache, whether it was already cached or not. Ben knows this may be wasteful, since it may cause memcache to cache data that\u2019s never read, but he doesn\u2019t mind.  \n\n7. [5 points]: Sadly, Ben sees that read()s sometimes return stale data for a long time after the write() of a newer value has succeeded and returned. Explain how this could happen.  ","answer":"Answer: If there are concurrent writes by different clients to the same key, the calls to database put() may execute in a different order that the calls to memcache put(), so that memcache and the database end up with different values. This condition can persistent for a long time: until the next time a client writes the same key.  ","explanation":"Answer: If there are concurrent writes by different clients to the same key, the calls to database put() may execute in a different order that the calls to memcache put(), so that memcache and the database end up with different values. This condition can persistent for a long time: until the next time a client writes the same key.  ","type":"ShortAnswerQuestion"}
-{"instance_id":34,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":8,"points":4,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# VI Lab 4  \n\nBen implements the RPC handlers and the applier in Lab 4 as follows. The RPC handlers for Get, Put, and Append take the following steps:  \n\nA. Submit a command to the Raft library via Start. The command includes the client ID, request ID, operation type, and arguments.   \nB. Loop to wait until the reply for that command to show up in the reply table, which maps from client IDs to the replies of clients\u2019 latest requests. Each reply contains the request ID and the result to that request. If Raft\u2019s leadership changes during the loop, return ErrWrongLeader.   \nC. Return the result stored in the reply table.  \n\nThe applier detail is irrelevant to this question and is shown on the next page.  \n\n8. [4 points]:  Ben observes that Get does not modify the application state. He changes Get\u2019s RPC handler to read the key-value table and return immediately to the client the result. Does this implementation preserve linearizability? (Briefly explain your answer.)  ","answer":"Answer: No. Get could return a stale result if Raft the leadership changes. For instance, if a client submits an Append to the old leader and succeeds, and then submits a Get to the new leader, the Get result could miss the appended value if the new leader handles the Get before applying the Append.  ","explanation":"Answer: No. Get could return a stale result if Raft the leadership changes. For instance, if a client submits an Append to the old leader and succeeds, and then submits a Get to the new leader, the Get result could miss the appended value if the new leader handles the Get before applying the Append.  ","type":"ShortAnswerQuestion"}
-{"instance_id":35,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":9,"points":4,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"Ben implements the RPC handlers and the applier in Lab 4 as follows. The RPC handlers for Get, Put, and Append take the following steps:  \n\nA. Submit a command to the Raft library via Start. The command includes the client ID, request ID, operation type, and arguments.   \nB. Loop to wait until the reply for that command to show up in the reply table, which maps from client IDs to the replies of clients\u2019 latest requests. Each reply contains the request ID and the result to that request. If Raft\u2019s leadership changes during the loop, return ErrWrongLeader.   \nC. Return the result stored in the reply table.  \n\nThe applier takes the following steps:  \n\nD. Read a command from the apply channel.   \nE. De-duplicate the command with the reply table: if the request ID in the reply table for the client is greater than or equal to that in the command, then skip the command.   \nF. Apply the command and insert the result to the reply table.  \n\n9. [4 points]:  Separately from the previous change, Ben modifies his implementation to perform de-duplication early in the RPC handlers. Concretely, he removes step $\\mathbf { E }$ in the applier, and adds an additional step at the start of the RPC handlers (i.e., before step A) as follows:  \n   If the request ID in the reply table for the client is greater than or equal to that in the RPC arguments, return the result stored in the reply table.  \n   Does this implementation preserve linearizability? (Briefly explain your answer.)  ","answer":"Answer: No. An operation could be applied twice if the client re-sends it before the first RPC is applied.  ","explanation":"Answer: No. An operation could be applied twice if the client re-sends it before the first RPC is applied.  ","type":"ShortAnswerQuestion"}
-{"instance_id":36,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":10,"points":8,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# VII AWS Lambda  \n\nConsider the guest lecture about the paper On-demand container loading in AWS Lambda by Brooker et al. For each of the following statements, indicate whether it is true or false.  \n\n10. [8 points]:  \n\nTrue \/ False $\\because$ AWS Lambda is attractive to customers because it allows them to run cloud computations without having to provision a machine.  \n\nTrue \/ False : Many containers of AWS Lambda customers don\u2019t contain unique chunks because customers upload the same container multiple times.  \n\nTrue \/ False : AWS Lambda may deduplicate popular chunks less than unpopular chunks.  \n\nTrue \/ False $\\because$ AWS Lambdas use LRU-K to ensure that if many infrequently-used Lambdas are running at the same time, they don\u2019t evict the chunks of frequently-used Lambdas.  ","answer":"True,True,True,True","explanation":"Answer: True, True, True, True. The third option is true because AWS does this to reduce the blast radius of popular chunks (see Section 3.3).  ","type":"True\/False Questions"}
-{"instance_id":37,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":11,"points":5,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# VIII Boki  \n\nConsider Figure 6(a) in Boki: Stateful Serverless Computing with Shared Logs by Jia and Witchel. The left column describes how Boki makes the execution of a workflow of serverless functions with database side-effects exactly-once.  \n\nAlyssa notices that if Boki reruns a workflow it will append a record to the workflow\u2019s LogBook, even if an append of an earlier failed execution already logged the record. Alyssa proposes to change the pattern of append-read to read-append-read: that is, she modifies Boki to read before an append to see if the append already logged its record; if so, it uses the first value returned by the read and skips the subsequent append and read. (If not, Boki executes as before, doing an append followed by read.)  \n\nFor example, Alyssa changes write as follows:  \n\ndef write(table, key, val): tag $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ hashLogTag([ID, STEP]) # first read rec $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ logReadNext(tag: tag, minSeqnum: 0) # if no record, then append and read again if rec $\\scriptstyle = =$ None: logAppend([tags: [tag], data: [table, key, val]) rec $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ logReadNext(tag: tag, minSeqnum: 0) rawDBWRITE(...) # same call as before $\\mathrm { S T E P } ~ = ~ \\mathrm { S T E P } ~ + ~ 1$  \n\n11. [5 points]:  \n\nAlyssa runs one workflow on her modified Boki. The workflow crashes during its execution and then restarts from the beginning and completes. With Alyssa\u2019s modification will write preserve exactly-once semantics? (Briefly explain your answer.)  ","answer":"Answer: It will preserve exactly-once semantics. In the case that logReadNext() returns something non-None initially, it will always return that same log record. So even if write() did a logAppend(), the final logReadNext() would have the same value as the logReadNext() that is executed before logAppend().  ","explanation":"Answer: It will preserve exactly-once semantics. In the case that logReadNext() returns something non-None initially, it will always return that same log record. So even if write() did a logAppend(), the final logReadNext() would have the same value as the logReadNext() that is executed before logAppend().  ","type":"ShortAnswerQuestion"}
-{"instance_id":38,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":12,"points":6,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# IX SUNDR  \n\nConsider the straw-man design in the paper Secure Untrusted Data Repository (SUNDR) by Li et al.  \n\nUsers A, B, and C share a SUNDR server. The server may be malicious, though the server does not know any of the private keys. User A creates a new file aaa in the SUNDR file system. After that, user B looks for file aaa, but does not see the file. After that, user C creates a new empty file ccc.  \n\nThere is no client activity other than what is described here. None of the stronger consistency ideas from the paper\u2019s Section 3.2 are in use. All three users are honest and run correct SUNDR client software.  \n\nAll three users now use the ls command to check whether they can see file ccc. All three users\u2019 client SUNDR implementations report that the data they receive from SUNDR passes all validity checks. Nevertheless, a malicious SUNDR server can cause a number of different outcomes.  \n\n12. [6 points]: What combinations are possible for which users can see ccc? For each statement, circle True if the SUNDR server could cause the indicated results, and False if not.  \n\nTrue \/ False : All three users can see ccc.  \n\nTrue \/ False : Only A and B can see ccc, but not C.  \n\nTrue \/ False $\\because$ Only A and C can see ccc, but not B.  \n\nTrue \/ False $\\because$ Only B and C can see ccc, but not A.  \n\nTrue \/ False $\\because$ Only C can see ccc, but not A or B.  \n\nTrue \/ False $\\because$ None of the users can see ccc.  ","answer":"False,False,True,True,True,False","explanation":"Answer: False,False,True,True,True,False. The correct answers are A and C but not B, B and C but not A, and only C. We know that the server has forked A and B from the fact that B cannot see aaa. So A and B have seen different operation histories, and each has appended an operation to the history it saw, and remembered that operation. Thus, when C asks the server for the current history (before C creates ccc), the SUNDR server can show C A\u2019s fork of the history, B\u2019s fork, or perhaps the history as of before A\u2019s creation of aaa. As a result, after C creates ccc, ccc will be visible to A (but not B), to B (but not A), and to C alone, respectively.  ","type":"True\/False Questions"}
-{"instance_id":39,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":13,"points":5,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# X PBFT  \n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.  \n\n13. [5 points]:  \n\nPBFT chooses the primary for a view deterministically based on the view number. What could go wrong if PBFT were to use Raft\u2019s voting algorithm to select a primary for a view? (Briefly explain your answer.)  ","answer":"Answer: Raft\u2019s voting algorithm does not result in a single leader-per-term under byzantine faults. Consider a 7 node system with 2 Byzantine nodes. The nodes that vote for A for term T are A, B, C, D, and allow A to conclude it is leader. The nodes that vote for D for term T are E, F, G, D, and allow D to conclude it is leader. Of these, only D is Byzantine and has equivocated by voting for both A and D. All the other nodes may vote this way while acting non-byzantine. This results in two primaries for a single term and violates the assumptions that the rest of pbft builds on.  ","explanation":"Answer: Raft\u2019s voting algorithm does not result in a single leader-per-term under byzantine faults. Consider a 7 node system with 2 Byzantine nodes. The nodes that vote for A for term T are A, B, C, D, and allow A to conclude it is leader. The nodes that vote for D for term T are E, F, G, D, and allow D to conclude it is leader. Of these, only D is Byzantine and has equivocated by voting for both A and D. All the other nodes may vote this way while acting non-byzantine. This results in two primaries for a single term and violates the assumptions that the rest of pbft builds on.  ","type":"ShortAnswerQuestion"}
-{"instance_id":40,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":14,"points":5,"score_total":71,"score_max":71.0,"score_median":57,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# XI Bitcoin  \n\nSection 4 of Nakamoto\u2019s Bitcoin paper explains that the difficulty of mining is determined by the number of required leading zeros in the SHA-256 hash of the block. The paper also says that Bitcoin automatically varies the difficulty of mining (the number of required leading zeros) by observing the recent average rate of new block mining, relative to the target block every ten minutes; if blocks have been generated too quickly, the difficulty is increased; if too slowly, decreased. All honest Bitcoin peers use the same algorithm to determine the difficulty.  \n\nBen dreams of being able to buy tickets to the latest Taylor Swift concert. To obtain the money required, Ben has been running the Bitcoin peer software on his laptop, but he hasn\u2019t been earning mining rewards very quickly, because his laptop is only the winning miner very infrequently. Hoping to realize his dream faster, Ben modifies his copy of the Bitcoin peer software so that the difficulty determination algorithm always yields a low difficulty, with the result that his peer can mine new blocks very quickly, often before any other Bitcoin miner produces a given new block in the chain.  \n\n14. [5 points]: It turns out that Ben won\u2019t actually earn any bitcoins with this scheme.   \n    Explain why not.  ","answer":"Answer: Bitcoin peers that run correct software will check that any proposed new block has a hash with the expected number of leading zeros. Those peers are running the correct difficultydetermining algorithm, so they will reject Ben\u2019s blocks because their hashes have too few leading zeros.  ","explanation":"Answer: Bitcoin peers that run correct software will check that any proposed new block has a hash with the expected number of leading zeros. Those peers are running the correct difficultydetermining algorithm, so they will reject Ben\u2019s blocks because their hashes have too few leading zeros.  ","type":"ShortAnswerQuestion"}
-{"instance_id":41,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":1,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a\/b\n```\n\n1. [5 points]: How many inodes will xv6 allocate while executing these two commands? (Circle the one best answer.)  \n   A. 0   \n   B. 1   \n   C. 2   \n   D. 3  ","answer":"C","explanation":"Answer: C. A directory is implemented using an inode, and one inode is created for directory \u201ca\u201d and one is created for directory \u201cb\u201d.  ","type":"SingleChoice"}
-{"instance_id":42,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":2,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a\/b\n```\n\nAlyssa adds the statement:  \n\n```\nprintf(\"write: %d\\n\", b->blockno);\n```\n\nto xv6\u2019s log_write in log.c. She then makes a fresh fs.img, boots xv6, and runs the following command:  \n\n```\n$ mkdir a\nwrite: 33\nwrite: 33\nwrite: 45\nwrite: 770\nwrite: 770\nwrite: 33\nwrite: 770\nwrite: 33\nwrite: 46\nwrite: 32\nwrite: 32\n```\n\n2. [5 points]: What does block 770 contain? (Circle the one best answer.)  \n   A. directory entries   \n   B. inodes   \n   C. file data   \n   D. a bitmap  ","answer":"A","explanation":"Answer: A. Block 770 is a data block and data blocks of directories contain directory entries.  ","type":"SingleChoice"}
-{"instance_id":43,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":3,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a\/b\n```\n\nBen makes a fresh fs.img, boots xv6, and runs a program that makes the following system call:  \n\n```\nsymlink(\"b\", \"b\");  \n```\n\nFrom the shell he then runs:  \n\n```\n$ cat b\n```\n\n3. [5 points]: What will the result of the cat be? (Circle the one best answer.)  \n   A. \u201cb\u201d \n   B. an error because \u201cb\u201d doesn\u2019t exist \n   C. an error because \u201cb\u201d points to itself \n   D. nothing because xv6 will panic  ","answer":"C","explanation":"Answer: C. When the kernel resolves the symlink \u201cb\u201d in open, it will find the symlink \u201cb\u201d. The fs lab requires your solution detects this cycle and return an error.  ","type":"SingleChoice"}
-{"instance_id":44,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":4,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# II EXT3  \n\nRecall the Linux EXT3 journaling file system from Journaling the Linux ext2fs Filesystem and Lecture 15.   \nThe paper\u2019s \u201cext2fs\u201d is the same as EXT3.  \n\nSuppose that the current compound transaction has just closed (see step 1 on the paper\u2019s page 6) and is starting to commit.  \n\n4. [5 points]: How long must new file-system system calls wait until they can start executing? (Circle the one best answer.)  \n   A. New system calls can start immediately.   \n   B. New system calls must wait until all system calls in the just-closed transaction have completed.   \n   C. New system calls must wait until the just-closed transaction has started to write journal blocks to the journal.   \n   D. New system calls cannot start until the just-closed transaction has finished committing to the journal. \n   E. New system calls cannot start until all updated buffers from the just-closed transaction have been synced to their homes on disk.  ","answer":"B","explanation":"Answer: B. The delay serves to prevent partial modifications made by system calls in the next transaction from being seen by system calls that are finishing in the first transaction.  ","type":"SingleChoice"}
-{"instance_id":45,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":5,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# II EXT3  \n\nRecall the Linux EXT3 journaling file system from Journaling the Linux ext2fs Filesystem and Lecture 15.   \nThe paper\u2019s \u201cext2fs\u201d is the same as EXT3.  \n\nHatshepsut is building an application on Linux that creates a set of directories, and she would like the set of creations to be atomic with respect to crashes. She\u2019s using the EXT3 file system. She experiments with this application code:  \n\n```\nint main() {\n    mkdir(\"\/aaa\", 0777);\n    mkdir(\"\/zzz\", 0777);\n    exit(0);\n}\n```\n\n(The 0777 is needed for Linux, though not for xv6; it does not affect this question.)  \n\nHatshepsut runs this program. Both calls to mkdir() return success. Hatshepsut causes her computer to crash just after the program exits. Then she re-starts the computer, which runs the EXT3 recovery program.  \n\n5. [5 points]: What could Hatshepsut see after recovery? (Circle all that apply.)  \n   A. She might see neither \/aaa nor \/zzz.   \n   B. She might see \/aaa but not \/zzz.   \n   C. She might see \/zzz but not \/aaa.   \n   D. She might see both \/zzz and \/aaa.   \n   E. None of the above.  ","answer":"A,B,D","explanation":"Answer: A, B, and D. A can occur if the system crashed before the transaction(s) reflecting the mkdir\u2019s finished committing. B can occur if the two mkdirs are in different transactions, and only the first manages to finish committing. C cannot occur because system calls are placed in transactions in order, and the transactions are also replayed in order during recovery.  ","type":"MultipleChoice"}
-{"instance_id":46,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":6,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# III VM primitives  \n\nBelow is a code fragment illustrating how a user program can implement a large table of square roots with Linux VM primitives while using little physical memory. (The full code presented in lecture is in the appendix of this quiz.)  \n\n```\n1 static size_t page_size;\n2 #define MAX_SQRTS (1 << 27) \/\/ Maximum limit on sqrt table entries\n3\n4 static double *sqrts;\n5\n6 \/\/ The page handler catching page faults\n7 static void\n8 handle_sigsegv(int sig, siginfo_t *si, void *ctx)\n9 {\n10 \tuintptr_t fault_addr = (uintptr_t)si->si_addr;\n11 \tdouble *page_base = (double * )align_down(fault_addr, page_size);\n12 \tstatic double *last_page_base = NULL;\n13\n14 \tif (last_page_base && munmap(last_page_base, page_size) == -1) {\n15 \t\tfprintf(stderr, \"Couldn\u2019t munmap(); %s\\n\", strerror(errno));\n16 \texit(EXIT_FAILURE);\n17 }\n18\n19 if (mmap(page_base, page_size, PROT_READ | PROT_WRITE,\n20 \t\t\tMAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {\n21 \tfprintf(stderr, \"Couldn\u2019t mmap(); %s\\n\", strerror(errno));\n22 \texit(EXIT_FAILURE);\n23 }\n24\n25 \tcalculate_sqrts(page_base, page_base - sqrts, page_size \/ sizeof(double));\n26 \tlast_page_base = page_base;\n27 }\n28\n29 \/\/ Simplified version of the test function\n30 static void\n31 test_sqrt_region(void)\n32 {\n33 \tint i, pos;\n34 \tdouble s;\n35\n36 \t\/\/ Find a sufficiently-large unused range of virtual addresses, and\n37 \t\/\/ sets sqrts to the start.\n38 \tsetup_sqrt_region();\n39\n40 \t\/\/ look up some numbers in the sqrt table\n41 \tfor (i = 0; i < 8192; i++) {\n42 \t\ts = sqrts[i];\n43 \t\tprintf(\"sqrt %f\", s);\n44 \t}\n45 }\n```\n\nAssume size of double is 8 bytes and page size is 4096 bytes.  \n\n6. [5 points]: Assume the sqrts table occupies 0 pages of physical memory after the return from setup sqrt region. How many pages of physical memory does the sqrts table occupy when test sqrt region returns? (You can ignore physical memory pages used for the page table itself.) (Circle the one best answer.)  \n   A. 0   \n   B. 1   \n   C. 1000   \n   D. $\\left( \\left( 1 \\ll 2 7 \\right) ^ { \\ast } 8 \\right) \/ 4 0 9 6$  ","answer":"B","explanation":"Answer: B. The page fault handler uses only 1 page. If it maps a new page, it unmaps the old page.  ","type":"SingleChoice"}
-{"instance_id":47,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":7,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# III VM primitives  \n\nBelow is a code fragment illustrating how a user program can implement a large table of square roots with Linux VM primitives while using little physical memory. (The full code presented in lecture is in the appendix of this quiz.)  \n\n```\n1 static size_t page_size;\n2 #define MAX_SQRTS (1 << 27) \/\/ Maximum limit on sqrt table entries\n3\n4 static double *sqrts;\n5\n6 \/\/ The page handler catching page faults\n7 static void\n8 handle_sigsegv(int sig, siginfo_t *si, void *ctx)\n9 {\n10 \tuintptr_t fault_addr = (uintptr_t)si->si_addr;\n11 \tdouble *page_base = (double * )align_down(fault_addr, page_size);\n12 \tstatic double *last_page_base = NULL;\n13\n14 \tif (last_page_base && munmap(last_page_base, page_size) == -1) {\n15 \t\tfprintf(stderr, \"Couldn\u2019t munmap(); %s\\n\", strerror(errno));\n16 \texit(EXIT_FAILURE);\n17 }\n18\n19 if (mmap(page_base, page_size, PROT_READ | PROT_WRITE,\n20 \t\t\tMAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {\n21 \tfprintf(stderr, \"Couldn\u2019t mmap(); %s\\n\", strerror(errno));\n22 \texit(EXIT_FAILURE);\n23 }\n24\n25 \tcalculate_sqrts(page_base, page_base - sqrts, page_size \/ sizeof(double));\n26 \tlast_page_base = page_base;\n27 }\n28\n29 \/\/ Simplified version of the test function\n30 static void\n31 test_sqrt_region(void)\n32 {\n33 \tint i, pos;\n34 \tdouble s;\n35\n36 \t\/\/ Find a sufficiently-large unused range of virtual addresses, and\n37 \t\/\/ sets sqrts to the start.\n38 \tsetup_sqrt_region();\n39\n40 \t\/\/ look up some numbers in the sqrt table\n41 \tfor (i = 0; i < 8192; i++) {\n42 \t\ts = sqrts[i];\n43 \t\tprintf(\"sqrt %f\", s);\n44 \t}\n45 }\n```\n\nAssume size of double is 8 bytes and page size is 4096 bytes.  \n\n[5 points]: How many total page faults will the repeated execution of line 42 cause? (Circle the one best answer.)  \nA. 0   \nB. 1   \nC. 2   \nD. 16   \nE. 8192  ","answer":"D","explanation":"Answer: D. The loop goes through the first 8192 entries in the sqrts table. A double is 8 bytes and 512 fit on a single page of 4096 bytes $( 4 0 9 6 \/ 8 = 5 1 2 )$ . Thus, the total number of virtual pages referenced in the loop is $8 1 9 2 \/ 5 1 2 = 1 6$ . The page fault handler will be invoked once for each of the 16 pages.  ","type":"SingleChoice"}
-{"instance_id":48,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":8,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# IV L4Linux  \n\nConsider The Performance of $\\mu$ -Kernel-Based Systems, by Ha\u00a8rtig et al., along with Lecture 17.   \nSuppose that an sh Linux process running under $\\mathrm { L } ^ { 4 }$ Linux performs a fork().  \n\n8. [5 points]: Which of the following are true? (Circle all that apply.)  \n   A. The L4 kernel\u2019s fork() system call copies the sh process\u2019s memory.   \n   B. When the Linux kernel server task has finished executing the system call implementation, it executes the x86 equivalent of RISC-V sret to return to the sh process.   \n   C. When the Linux kernel server task returns to the newly created child process, the Linux kernel changes the hardware page table register (equivalent of RISC-V satp) to point to the child process\u2019s page table.  \n   D. Copy-on-write fork() is not possble for $\\mathrm { L } ^ { 4 }$ Linux because the CPU delivers page faults to the L4 kernel, not to the Linux kernel task.   \n   E. None of the above.  ","answer":"E","explanation":"Answer: E. Not A: fork is a Linux system call, implemented by the Linux kernel server, not by the L4 kernel. Not B: Linux processes communicate with the Linux server via IPC messages, not by system call traps. Not C: The Linux kernel server is not privileged, and cannot modify the page table register; only the L4 kernel can do this. Not D: the L4 kernel forwards page faults to the Linux kernel server.  ","type":"MultipleChoice"}
-{"instance_id":49,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":9,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# V RedLeaf  \n\nConsider RedLeaf: Isolation and Communication in a Safe Operating System by Narayanan et al.  \n\n9. [5 points]: Which of the following are true statements about RedLeaf\u2019s design? (Circle all that apply.)  \n   A. Because the RedLeaf microkernel and domains run in a single address space, a domain can read any kernel memory by dereferencing a Rust pointer.   \n   B. User programs can avoid data copies by passing pointers to their private memory to other user programs.   \n   C. Two domains can have a Rust pointer to an object on the shared heap at the same time.   \n   D. The rv6 file system can be modified to support memory-mapped files using the same ideas as in the mmap lab without modifying the RedLeaf microkernel.   \n   E. A divide-by-zero error in the network domain won\u2019t crash the rv6 file system.   \n   F. None of the above.  ","answer":"C,E","explanation":"Answer: C and E. A is false because RedLeaf uses language techniques that disallow domains to dereferences arbitrary addresses. B is false, because RedLeaf explicitly disallows this; only pointers in the shared heap can be passed to other domains. C is true, because RedLeaf follows the Rust borrow rules, which allow two domains to have an immutable reference to the same object. D is false, because RedLeaf doesn\u2019t use page tables but relies on language techniques for isolation; the mmap lab requires the use of page tables. E is true, because RedLeaf is designed to catch errors like these and clean up a domain that experience such an error.  ","type":"MultipleChoice"}
-{"instance_id":50,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":10,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VI Networking lecture\/reading  \n\nConsider Eliminating Receive Livelock in an Interrupt-driven Kernel, by Mogul et al., and Lecture 20.  \n\nBen implements the paper\u2019s polling design (Section 6.4), in which the NIC interrupt handler just wakes up the polling thread. However, Ben\u2019s implementation leaves NIC interrupts enabled (in contrast to Section 6.4, which specifies that they be disabled until the polling thread is done).  \n\nBen\u2019s computer has just one CPU (i.e. just a single core).  \n\n10. [5 points]: What will Ben observe as the rate of packet arrivals increases? (Circle the one best answer.)  A. He won\u2019t see livelock, because the interrupt handler doesn\u2019t process the packets; only the polling thread handles the packets.   \n    B. He won\u2019t see livelock, because the polling design eliminates the IP input queue, which was the point at which packets were discarded in the old design.   \n    C. He will see livelock, because at high enough arrival rates the CPU will spend all its time in the polling thread.   \n    D. He will see livelock, because at high enough arrival rates the CPU will spend all its time in the interrupt handler.   \n    E. He will see livelock, because the polling thread can only process packets at some finite rate, and the input rate could be higher than that.  ","answer":"D","explanation":"Answer: D.  \n\nZoe\u2019s xv6 computer has a UART that has no limit on how fast it can transmit bytes. The UART interrupts once per byte transmitted, to indicate that it has finished transmitting the byte. Zoe has a program whose standard output (file descriptor 1) is connected to the xv6 console, which uses the UART; the program sends bytes as fast as it can:  \n\n```\nwhile(1){\n    char c = \u2019x\u2019;\n    write(1, &c, 1);\n}\n```\n\nZoe\u2019s computer has just one CPU (i.e. just a single core).  ","type":"SingleChoice"}
-{"instance_id":51,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":11,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VI Networking lecture\/reading  \n\nConsider Eliminating Receive Livelock in an Interrupt-driven Kernel, by Mogul et al., and Lecture 20.  \n\nBen implements the paper\u2019s polling design (Section 6.4), in which the NIC interrupt handler just wakes up the polling thread. However, Ben\u2019s implementation leaves NIC interrupts enabled (in contrast to Section 6.4, which specifies that they be disabled until the polling thread is done).  \n\nZoe\u2019s computer has just one CPU (i.e. just a single core).  \n\n11. [5 points]: Could this program cause interrupt livelock due to the CPU spending all its time in the UART interrupt handler, and thus no time executing Zoe\u2019s program? Explain briefly.  ","answer":"Answer: No. The UART interrupts just once for each call to write(). There can\u2019t be more than a brief period of time in which UART interrupts prevent Zoe\u2019s program from running, because the UART driver will soon run out of bytes to transmit.  ","explanation":"Answer: No. The UART interrupts just once for each call to write(). There can\u2019t be more than a brief period of time in which UART interrupts prevent Zoe\u2019s program from running, because the UART driver will soon run out of bytes to transmit.  ","type":"ShortAnswerQuestion"}
-{"instance_id":52,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":12,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VII Meltdown  \n\nBelow is Listing 2 of the paper Meltdown: reading kernel memory from user space by Lipp et al., written in a C-like notation instead of x86 assembly.  \n\n```\n1 char buf[8192]\n2\n3 \/\/ The flush part of Flush+Reload\n4 cflush buf[0]\n5 cflush buf[4096]\n6\n7 \/\/ The core attack from listing 2\n8 r1 = 0x79cbcc0 \/\/ a kernel virtual address\n9 r2 = *r1\n10 r2 = r2 & 1\n11 r2 = r2 * 4096\n12 r3 = buf[r2]\n```\n\n12. [5 points]: Which of the following are true statements? (Circle all that apply.)  \n    A. In Linux as described in the paper, page tables of user programs map all of kernel memory.   \n    B. Loading the value at kernel address 0x79cbcc0 on line 9 will lead to an exception.   \n    C. If the attack succeeds, then buf[0] will be in the L1 cache if the low bit of the value at address 0x79cbcc0 is a 0.   \n    D. One reason why one run of Meltdown might not succeed is because buf[0] maybe evicted from the L1 cache before the attack can measure its presence using Reload.   \n    E. The Meltdown attack on xv6 wouldn\u2019t be able to dump all of xv6 kernel memory because like KAISER the xv6 kernel and user processes have separate page tables.   \n    F. None of the above.  ","answer":"A,B,C,D,E","explanation":"Answer: A, B, C, D, and E. B is true because eventually the CPU will generate an exception, perhaps after speculating on a few instructions. $\\mathbf { E }$ is true, because xv6 has separate kernel and user page tables, and the user page tables don\u2019t map all of kernel memory.  ","type":"MultipleChoice"}
-{"instance_id":53,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":13,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VIII RCU  \n\nBen has a Linux kernel that uses RCU as described in RCU Usage In the Linux Kernel: One Decade Later, by McKenney et al. He modifies udp sendmsg() in the paper\u2019s Figure 6, adding a call to new function() on line 8, so that the code now reads:  \n\n```\n1 void udp_sendmsg(sock_t *sock, msg_t *msg)\n2 {\n3 \tip_options_t *opts;\n4 \tchar packet[];\n5 \tcopy_msg(packet, msg);\n6 \trcu_read_lock();\n7 \topts = rcu_dereference(sock->opts);\n8 \tnew_function(); \/\/*** Ben adds this line. ***\n9 \tif (opts != NULL)\n10 \t\tcopy_opts(packet, opts);\n11 \trcu_read_unlock();\n12 \tqueue_packet(packet);\n13 }\n14 void setsockopt(sock_t *sock, int opt, void *arg)\n15 {\n16 \tif (opt == IP_OPTIONS) {\n17 \tip_options_t *old = sock->opts;\n18 \tip_options_t *new = arg;\n19 \trcu_assign_pointer(&sock->opts, new);\n20 \tif (old != NULL)\n21 \t\tcall_rcu(kfree, old);\n22 \treturn;\n23 \t}\n24 }\n```\n\nThis code is otherwise identical to the paper\u2019s Figure 6.  \n\nnew_function() performs a context switch (i.e., it calls the Linux equivalent of xv6\u2019s sleep() or yield()).  \n\n13. [5 points]: Ben has made a mistake. Explain a scenario in which something goes wrong with the Figure 6 code as a result of Ben\u2019s change.  ","answer":"Answer: Use-after-free. If new function() causes a context switch, then call rcu() could call kfree(old), and that memory could be re-allocated for something else and overwritten. But that\u2019s the same memory that opts points to on line 10, which would therefore copy the wrong data.  ","explanation":"Answer: Use-after-free. If new function() causes a context switch, then call rcu() could call kfree(old), and that memory could be re-allocated for something else and overwritten. But that\u2019s the same memory that opts points to on line 10, which would therefore copy the wrong data.  ","type":"ShortAnswerQuestion"}
-{"instance_id":54,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":14,"points":5,"score_total":70,"score_max":67.5,"score_median":49,"score_avg":49.88,"score_standard_seviation":11.07,"problem":" # VIII RCU  \n\nBen has a Linux kernel that uses RCU as described in RCU Usage In the Linux Kernel: One Decade Later, by McKenney et al. He modifies udp sendmsg() in the paper\u2019s Figure 6, adding a call to new function() on line 8, so that the code now reads:  \n\n```\n1 void udp_sendmsg(sock_t *sock, msg_t *msg)\n2 {\n3 \tip_options_t *opts;\n4 \tchar packet[];\n5 \tcopy_msg(packet, msg);\n6 \trcu_read_lock();\n7 \topts = rcu_dereference(sock->opts);\n8 \tnew_function(); \/\/*** Ben adds this line. ***\n9 \tif (opts != NULL)\n10 \t\tcopy_opts(packet, opts);\n11 \trcu_read_unlock();\n12 \tqueue_packet(packet);\n13 }\n14 void setsockopt(sock_t *sock, int opt, void *arg)\n15 {\n16 \tif (opt == IP_OPTIONS) {\n17 \tip_options_t *old = sock->opts;\n18 \tip_options_t *new = arg;\n19 \trcu_assign_pointer(&sock->opts, new);\n20 \tif (old != NULL)\n21 \t\tcall_rcu(kfree, old);\n22 \treturn;\n23 \t}\n24 }\n```\n\nThis code is otherwise identical to the paper\u2019s Figure 6.  \n\nnew_function() performs a context switch (i.e., it calls the Linux equivalent of xv6\u2019s sleep() or yield()).  \n\nNow Ben is working on the code in the RCU paper\u2019s Figure 7. He reasons that the kfree(local table) in retract_table() really belongs inside the critical section, so that the entire sequence is atomic. He moves that line, resulting in this code:  \n\n```\n...;\nspin_lock(&table_lock);\nlocal_table = table;\nrcu_assign_pointer(&table, NULL);\nkfree(local_table); \/\/ *** Ben moved this line. ***\nspin_unlock(&table_lock);\n...;\n```\n\n14. [5 points]: What problem is Ben\u2019s change likely to cause? (Circle the one best answer.)  \n    A. Ben\u2019s change could cause a deadlock.   \n    B. Ben\u2019s change could allow a context switch to occur just before the kfree() call, which would be illegal. \n    C. Ben\u2019s change could cause invoke_syscall() to dereference a pointer to freed memory.   \n    D. Ben\u2019s change could cause retract_table() to dereference a pointer to freed memory.  ","answer":"C","explanation":"Answer: C. Ben\u2019s modified retract table() frees local table before the call to synchronize_rcu(). An execution of invoke_systemcall() might be active at the same time on another CPU, and might read the old value of local table just after it has been freed and re-used for something else.  ","type":"SingleChoice"}
diff --git a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper.xlsx b/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper.xlsx
deleted file mode 100644
index 5887aa50..00000000
Binary files a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper.xlsx and /dev/null differ
diff --git a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper_20250527.jsonl b/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper_20250527.jsonl
deleted file mode 100644
index 05e37ec7..00000000
--- a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper_20250527.jsonl
+++ /dev/null
@@ -1,69 +0,0 @@
-{"instance_id":1,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":1,"points":8,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# I FaRM\n\nConsider the following statements about FaRM as described in No compromises: distributed transactions with consistency, availability, and performance. For each statement, circle True or False.\n\n1.[8 points]:\n\n1. True \/ False : Short leases are important for FaRM, because FaRM must reconfigure to remove a failed replica of a region before FaRM can continue to use the region.\n2. True \/ False : For small messages RDMA performs $4 \\times$ better than RPC because the CPU is a performance bottleneck for RPCs.\n3. True \/ False : To obtain write locks FaRM uses RDMA writes so that the destination machine\u2019s CPU doesn\u2019t have to do any work on a lock request.\n4. True \/ False $\\because$ For the TATP workload the median latency on the left end of the graph is 9 microseconds (see Figure 7), rather than the 19 microsecond mean commit latency mentioned in Section 6.3, because not all operations update multiple rows.","answer":"True,True,False,True","explanation":"Answer: True,True,False,True 1. True, since FaRM uses primary-backup replication instead of Paxos\/Raft and all replicas of a chunk must be up for FaRM to be able to continue using the chunk. 2. True; see Figure 3. False; in this case FaRM uses RDMA to implement RPC, which runs locking code on the destination machine. 4. True; see description of TATP workload and the explanation for the left-end of the graph.","type":"True\/False Questions"}
-{"instance_id":2,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":2,"points":3,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# II Chardonnay\n\nAnswer these questions with reference to Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases by Eldeeb et al.\n\n2. [3 points]: Section 6.1 says that all of a read-write transaction\u2019s writes are tagged with a version containing the same epoch number. For example, if a read-write transaction executes at around the time the Epoch Service changes the epoch from 20 to 21, either all the transaction\u2019s updated records should have a VID prefixed with 20, or all should be prefixed with 21, but not a mix of the two. Explain briefly why having the same epoch number in all the written items\u2019 versions is important for correctness.","answer":"Answer: A read-only (snapshot) transaction must see record versions that, for every readwrite transaction, either reflect all of that read-write transaction\u2019s writes, or none. Chardonnay\u2019s strategy is for read-only transactions to read a snapshot as of the end of some epoch. This only works if every read-write transaction\u2019s writes appear all in the same epoch, and are not spread over multiple epochs.","explanation":"Answer: A read-only (snapshot) transaction must see record versions that, for every readwrite transaction, either reflect all of that read-write transaction\u2019s writes, or none. Chardonnay\u2019s strategy is for read-only transactions to read a snapshot as of the end of some epoch. This only works if every read-write transaction\u2019s writes appear all in the same epoch, and are not spread over multiple epochs.","type":"ShortAnswerQuestion"}
-{"instance_id":3,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":3,"points":3,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# II Chardonnay\n\nAnswer these questions with reference to Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases by Eldeeb et al.\n\n3. [3 points]: Section 6.1 describes how a transaction chooses a version ID (VID): it is the current epoch, with a counter appended. Suppose, instead, that version IDs were just the current epoch (with nothing appended). Briefly explain how this change could cause snapshot reads to yield incorrect (non-serializable) values.","answer":"Answer: Read-only (snapshot) transactions would not be able to tell which version was the last one in an epoch, and thus might read a mix of versions from early and late in the epoch.","explanation":"Answer: Read-only (snapshot) transactions would not be able to tell which version was the last one in an epoch, and thus might read a mix of versions from early and late in the epoch.","type":"ShortAnswerQuestion"}
-{"instance_id":4,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":4,"points":3,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# II Chardonnay\n\nAnswer these questions with reference to Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases by Eldeeb et al.\n\n4. [3 points]: Section 8 says that the locks for a transaction are acquired one at a time using a \u201cchain\u201d technique. It would be faster if the client sent out all the lock requests in parallel. Briefly explain why sending a transaction\u2019s lock request messages in parallel would be a bad idea.","answer":"Answer: The purpose of acquiring the locks one at a time is to ensure that they are acquired in a deadlock-avoiding order. Sending out the requests in parallel would not ensure that, and could lead to deadlock.","explanation":"Answer: The purpose of acquiring the locks one at a time is to ensure that they are acquired in a deadlock-avoiding order. Sending out the requests in parallel would not ensure that, and could lead to deadlock.","type":"ShortAnswerQuestion"}
-{"instance_id":5,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":5,"points":8,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# III Grove\n\nConsider \u201cGrove: a separation-logic library for verifying distributed systems\u201d by Sharma et al. For each statement, circle True or False.\n\n5.[8 points]:\n\n1. True \/ False : Because any replica in Grove can serve a Get request, Grove achieves higher throughput for Get\u2019s than for Put\u2019s (see Figure 19), but sacrifices linearizability for Get\u2019s.\n2. True \/ False $\\because$ If a response to a Get is lost, the Grove exactlyonce library will resend the Get, which the server executes again and the second response may contain a different value than the first (but lost) response contained.\n3. True \/ False $\\because$ If the Go code for vKV had a bug that caused a backup to lose the effect of a Put after a crash, then Grove\u2019s specification and proof would catch this bug.\n4. True \/ False $\\because$ If the configservice had an infinite loop that caused Reconfigure to never return, then this bug would be caught by Grove\u2019s specification and proof.","answer":"False,True,True,False","explanation":"Answer: False,True,True,False1. False; Grove doesn\u2019t sacrifice linearizability for reads from backups. 2. True; for example, if a Put happened between the first and second Get (but it doesn\u2019t matter for correctness). 3. True; that is a safety property that Grove\u2019s specification and proof capture. 4. False; Grove doesn\u2019t prove liveness properties.","type":"True\/False Questions"}
-{"instance_id":6,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":6,"points":5,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# IV DynamoDB Transactions\n\nWith respect to Distributed Transactions at Scale in Amazon DynamoDB by Idziorek et al., imagine that a read-only transaction (TransactGetItems) and a read-write transaction (TransactWriteItems) execute concurrently. The read-write transaction updates multiple items stored on different storage nodes; the read-only transaction reads the same set of items. Serializability requires that the readonly transaction see all of the items as they were before the read-write transaction, or all as they are after the read-write transaction (but not a mix of before and after).\n\n6. [5 points]: For the above scenario, which mechanisms help to ensure this all-before or all-after property? Refer to the design as described in the paper\u2019s Sections 2 and 3 (but not Section 4). Circle all that apply.\n   A. The item.timestamp $<$ input.timestamp check in Listing 3.   \n   B. Multiple time-stamped versions stored for each key.   \n   C. Log sequence numbers (LSNs).   \n   D. Two-phase locking.   \n   E. item.ongoingTransactions","answer":"C,E","explanation":"Answer: C,E. C and E are correct. A is not correct because read-only transactions do not involve the time-stamps. B is not correct because DynamoDB doesn\u2019t store multiple versions of a given record. D is not correct because read-only transactions don\u2019t use two-phase locking.","type":"MultipleChoice"}
-{"instance_id":7,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":7,"points":8,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# V AWS Lambda\n\nConsider the paper, and guest lecture about, On-demand container loading in AWS Lambda by Brooker et al. For each of the following statements, indicate whether it is true or false.\n\n7.[8 points]:\n\n1. True \/ False : AWS Lambda is attractive to customers because the customer can spawn many lambdas in response to a load spike without having to provision machines in advance.\n2. True \/ False $\\because$ Replication of chunks in the AZ-level cache is important to ensure that chunks are not lost forever when a cache node fails.\n3. True \/ False : Erasure coding of cached chunks helps improve tail latency, because a worker can reconstruct a chunk without having to download all stripes of the chunk.\n4. True \/ False : The convergent encryption scheme described in Section 3.1 helps protect against an attacker who compromises a worker and attempts to read containers of any customer.","answer":"True,False,True,True","explanation":"Answer: True,False,True,True. 1. True; customers can spawn many Lambdas in response to a spike in load. 2. False; AWS Lambda replicates for low latency instead of durability. 3. True; erasure coding allows the client to reconstruct the data with a few stripes without having to wait for all stripes. 4. True; the goal is that a worker can access only the data that it needs to run the function sent to it, but because \u201cany\u201d is ambiguous we accepted False too.","type":"True\/False Questions"}
-{"instance_id":8,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":8,"points":5,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# VI Lab 4\n\nBen Bitdiddle is implementing the rsm package from Lab 4. He uses a shared reply table to implement communication between the rsm\u2019s Submit() method and the reader goroutine. The reply table maps a completed operation\u2019s id to its reply from DoOp(). In addition, rsm contains a nextId field that is initialized to 0 and is incremented by 1 each time Submit() is called.\n\nHis code takes the following steps:\n\n- A. The kvraft server calls `rsm.Submit()` with a client request req.\n- B. `Submit()` increments `rsm.nextId`, packs the request into a new operation `Op{id:rsm.nextId, req: req}`, and calls`rsm.rf.Start()` with the Op. `Ifrsm.rf.Start()` indicates that the current peer is the Raft leader, it then waits for the reply table to contain a reply under the key `Op.id` (the one passed to `Start()`).\n- C. When the rsm\u2019s reader goroutine receives an `ApplyMsg` from the `applyCh`, it calls `DoOp()`, then populates the reply table with the result from `DoOp()` under key `Op.id`.\n- D. When `Submit()` sees the `Op.id` key in the reply table, it deletes the `Op.id` key\/value pair from the reply table and returns the corresponding reply.\n\nAssume that all omitted parts of Ben\u2019s design are correct.\n\n8. [5 points]: Ben notices that his implementation can result in incorrect behavior. Explain why.","answer":"Answer: Each peer will, when it is leader, assign operations IDs starting with ID zero. Thus, if there is a change in leader, the new leader may assign IDs to new operations that are the same as IDs being waited for by Submit()s in the old leader, but the operations are different. So clients may receive responses for the wrong operations.","explanation":"Answer: Each peer will, when it is leader, assign operations IDs starting with ID zero. Thus, if there is a change in leader, the new leader may assign IDs to new operations that are the same as IDs being waited for by Submit()s in the old leader, but the operations are different. So clients may receive responses for the wrong operations.","type":"ShortAnswerQuestion"}
-{"instance_id":9,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":9,"points":8,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# VII Ray\n\nConsider the following Ray program, which creates a `sqrt_task` task for each number in the list `mylist`. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:\n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\n@ray.remote\ndef sum_task(f):\n    # sum is a python function, which takes a future and returns the sum\n    l = get(f) # collect the list l\n    return sum(l) # return the sum of the numbers in list l\n\n# A call to sqrt_list_task yields a shared DFut\n@ray.remote\ndef sqrt_list_task(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n\n\tr = [ ]\n\tfor f in l:\n\t\tr.append(get(f)) # collect the result\n\n\treturn r # return a SharedDFut for r\n\n# invoke sqrt_list_task with a large list of numbers, sum, and print result\nf = sqrt_list_task(mylist)\ns = sum_task(f)\nprint(s)\n```\n\nAssume Ray behaves in the way described in Ownership: a distributed futures system for finegrained tasks by Wang et al., and Ray is running on a cluster of computers.\n\nFor each of the following statements, indicate whether it is true or false.\n\n9. [8 points]:\n\n- True \/ False : a Ray worker may start running sum task before sqrt list task has finished\n- True \/ False : the driver that invokes sum task receives the list with square-rooted numbers from the worker that ran sqrt list task.\n- True \/ False : the driver is the owner for each future that sqrt task returns.\n- True \/ False : the driver is the owner for the shared future returned by sqrt list task.","answer":"True,False,False,True","explanation":"Answer: True,False,False,True. 1. True, since remote invocations are asynchronous. 2. False; the worker running sum task will fetch the data from the worker that ran sqrt list task. 3. False; the worker who runs sqrt list task is the owner of these futures; 4. True; the driver starts sqrt list task and is thus the owner.","type":"True\/False Questions"}
-{"instance_id":10,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":10,"points":5,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# VIII SUNDR\n\nConsider the straw-man design in the paper Secure Untrusted Data Repository (SUNDR) by Li et al.\n\nIn the straw-man design, a client asks the server to append a \u201cfetch\u201d operation to the history after reading the history. Each client also remembers the most recent operation it has appended to the history. One reason for these mechanisms is to prevent the server from first showing some operations to the client, and then later hiding those operations.\n\nSuppose one modified the SUNDR straw-man protcol in three ways. First, eliminate \u201cfetch\u201d operations, so that the history only includes modify operations. Second, have each client remember the last entry in the most recent history the client obtained from the server; call this remembered entry ELAST. Third, each client checks that ELAST is present in the next history it obtains from the server. With these modifications, a client can still detect a situation in which the server first shows an operation to a client, and then omits that operation in a later history shown to the same client.\n\n10. [5 points]: It turns out this modification is a bad idea. Explain briefly how this modification would allow a malicious server to violate fork consistency.","answer":"Answer: Suppose the last two entries in the (correct) history are E1 followed by E2. When client C1 asks for the history, the SUNDR server could return the history just through E1, omitting E2. C1 would then remember E1 as its ELAST. This is a fork (and is allowed by fork consistency) because C1 isn\u2019t seeing the complete history. When C1 next asks the SUNDR server for the history, the SUNDR server could return the complete history, including E2. C1 would accept this history because ELAST (=E1) is present and all other checks pass (e.g. E1\u2019s and E2\u2019s signatures over the preceding history will validate). At this point the fork has been healed (since C1 sees the previously concealed E2); this is a violation of fork consistency. Note that every entry (including whatever is in ELAST) has a signature over the entire preceding history, and the client checks all of these signatures every time it obtains a history from the SUNDR server, so the SUNDR server cannot successfully change anything in the history before a client\u2019s ELAST.","explanation":"Answer: Suppose the last two entries in the (correct) history are E1 followed by E2. When client C1 asks for the history, the SUNDR server could return the history just through E1, omitting E2. C1 would then remember E1 as its ELAST. This is a fork (and is allowed by fork consistency) because C1 isn\u2019t seeing the complete history. When C1 next asks the SUNDR server for the history, the SUNDR server could return the complete history, including E2. C1 would accept this history because ELAST (=E1) is present and all other checks pass (e.g. E1\u2019s and E2\u2019s signatures over the preceding history will validate). At this point the fork has been healed (since C1 sees the previously concealed E2); this is a violation of fork consistency. Note that every entry (including whatever is in ELAST) has a signature over the entire preceding history, and the client checks all of these signatures every time it obtains a history from the SUNDR server, so the SUNDR server cannot successfully change anything in the history before a client\u2019s ELAST.","type":"ShortAnswerQuestion"}
-{"instance_id":11,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":11,"points":3,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# IX Bitcoin\n\nBitcoin: A Peer-to-Peer Electronic Cash System, by Nakamoto, mentions in Section 4 that the cryptographic hash of a valid block must start with a certain number of zero bits. Assume that the hash algorithm is SHA-256, which returns a 256-bit hash.\n\n11. [3 points]: You are trying to mine a new block. The required number of zero bits is seven. You set the block\u2019s 32-bit nonce field to a random value, and compute the SHA-256 hash of the block. What\u2019s the probability that the first seven bits of the hash are zeros? Circle the one best answer.\n    A. 1\/2   \n    B. 1\/7   \n    C. 1\/128   \n    D. 1\/256   \n    E. 1\/249   \n    F. $1 \/ ( 2 ^ { 3 2 } )$","answer":"C","explanation":"Answer: C (1\/128) is correct.","type":"SingleChoice"}
-{"instance_id":12,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":12,"points":3,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# IX Bitcoin\n\nBitcoin: A Peer-to-Peer Electronic Cash System, by Nakamoto, mentions in Section 4 that the cryptographic hash of a valid block must start with a certain number of zero bits. Assume that the hash algorithm is SHA-256, which returns a 256-bit hash.\n\nBen runs a Bitcoin node. A few hours ago Ben\u2019s node learned about block B747, which is a valid block. Ben sees a transaction T27 in B747 that pays some money to a certain public key, signed by the correct private key. Ben would like to steal the money involved in T27. He modifies his copy of block B747 so that the payee\u2019s public key in T27 is Ben\u2019s own public key. He doesn\u2019t change anything else in B747. He modifies his Bitcoin node software to announce the block to other nodes as if it were a valid block.\n\n12. [3 points]: Which of the following will cause other Bitcoin nodes to decide that Ben\u2019s B747 is invalid? Circle all that apply.\n    A. The \u201cPrev Hash\u201d field in the next block in the chain doesn\u2019t refer to Ben\u2019s B747.   \n    B. Other peers will already know about the real B747.   \n    C. The \u201cPrev Hash\u201d field in Ben\u2019s B747 isn\u2019t valid.   \n    D. The hash of Ben\u2019s B747 won\u2019t start with enough zeroes.   \n    E. The signature in T27 in Ben\u2019s B747 isn\u2019t correct.","answer":"D,E","explanation":"Answer: D,E. D and E are correct. A and B are not correct: peers have to at least temporarily accept otherwise-valid blocks with no successor because they might turn out to be the start of a new winning fork. C is not correct because Ben didn\u2019t modify the Prev Hash field, so it continues to refer to the predecessor of the original B747. D is correct because modifying the block will modify its cryptographic hash; the real B747\u2019s hash started with enough zeroes, but a modified B747 is fantastically unlikely to happen also to start with enough zeroes. E is correct because the signature was correct for T27\u2019s original payee public key, so the signature won\u2019t be correct with Ben as the payee.","type":"MultipleChoice"}
-{"instance_id":13,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":13,"points":3,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# IX Bitcoin\n\nBitcoin: A Peer-to-Peer Electronic Cash System, by Nakamoto, mentions in Section 4 that the cryptographic hash of a valid block must start with a certain number of zero bits. Assume that the hash algorithm is SHA-256, which returns a 256-bit hash.\n\nNow Ben is designing a new crypto-currency system, identical to Bitcoin, except with a different agreement scheme to resolve forks in the block-chain: instead of the longest fork winning, nodes compute the hash of the last block in each fork, and the fork with the lowest last-block hash value wins. Ben reasons that all nodes will compute the same hashes, and thus all nodes will agree about which fork wins.\n\n13. [3 points]: Why is Ben\u2019s fork-resolution idea a disaster? Explain briefly.","answer":"Answer: In real Bitcoin, if an attacker wants to eliminate a transaction that occurs many blocks in the past by creating a fork from before that transaction, the attacker has to sustain a block mining rate faster than the main chain long enough to catch up, which requires compute power believed to be too expensive for most attackers. But with Ben\u2019s scheme, an attacker only needs to mine a single block that happens to have a hash smaller than the corresponding block in the main chain; then all nodes will switch to the attacker\u2019s new short fork. The attacker needs relatively little compute power to mine this single block.","explanation":"Answer: In real Bitcoin, if an attacker wants to eliminate a transaction that occurs many blocks in the past by creating a fork from before that transaction, the attacker has to sustain a block mining rate faster than the main chain long enough to catch up, which requires compute power believed to be too expensive for most attackers. But with Ben\u2019s scheme, an attacker only needs to mine a single block that happens to have a hash smaller than the corresponding block in the main chain; then all nodes will switch to the attacker\u2019s new short fork. The attacker needs relatively little compute power to mine this single block.","type":"ShortAnswerQuestion"}
-{"instance_id":14,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":14,"points":6,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# X PBFT\n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.\n\nFor each of the following statements, indicate whether it is true or false.\n\n14. [6 points]:\n\n- True \/ False $\\because$ Assume all replicas of a PBFT-replicated system are running the same software and the software has a bug. If an attacker exploits this bug in all replicas to control each replica, then the attacker can commit arbitrary operations in the log.\n- True \/ False : If an attacker controls the primary of a PBFT-replicated system, then the attacker can commit arbitrary operations in the log.\n- True \/ False : Honest replicas must include in a VIEW-CHANGE the messages already prepared so that a new primary cannot omit already-committed operations.","answer":"True,False,True","explanation":"Answer: True,False,True. 1. True; PBFT doesn\u2019t guarantee correctness if more than f machines are compromised. 2. False; the attacker must control more than f machines to be able to compromise PBFT; 3. True; see protocol description.","type":"ShortAnswerQuestion"}
-{"instance_id":15,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam II","course":"Distributed System Engineering","year":2025,"problem_num":15,"points":2,"score_total":73,"score_max":71.5,"score_median":57.5,"score_avg":55.95,"score_standard_seviation":7.22,"problem":"# X PBFT\n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.\n\nFor each of the following statements, indicate whether it is true or false.\n\n15. [2 points]: If there are 10 machines in our system, and at most 2 machines are malicious, how many REPLY messages does a client need to receive before it knows its operation was executed? (Circle best answer)\nA. 1   \nB. 3   \nC. 4   \nD. 6   \nE. 7","answer":"B","explanation":"Answer: B: $\\mathrm { f } { + } 1$ , where $\\mathrm { f } = 2$ ; as stated in the paper, we only need to guarantee that one honest replica voted REPLY.","type":"SingleChoice"}
-{"instance_id":16,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":1,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# I MapReduce  \n\nThe MapReduce paper (MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat) says in Section 3.1 that the intermediate key space is partitioned among the R reduce tasks using hash(key) mod R.  \n\nThea is running the word-count MapReduce job (pseudo-code in the paper\u2019s Section 2.1) on a cluster with 10 worker machines. M is 20 and R is 40. There are no failures, the network is reliable, no machines are slower than expected, and there is no competing work on any of the machines or networks involved. The Map input is divided into 20 pieces of 16 megabytes each.  \n\n1. [5 points]: By mistake, the hash(key) function Thea is using with MapReduce always returns 1. What effect will that have on the execution of the word-count job, compared to using a well-behaved hash function? Circle the single best answer.  A. the job will produce incorrect final output   \n   B. 10 times as much total CPU time will be needed for Reduce phase   \n   C. 10 times as much total wall-clock time will be needed for Reduce phase   \n   D. 40 times as much total CPU time will be needed for Reduce phase   \n   E. 40 times as much total wall-clock time will be needed for Reduce phase   \n   F. the job will never complete  ","answer":"C","explanation":"Answer: C. The total amount of computation is unchanged, but it\u2019s all done by one worker rather than divided up in parallel among 10 workers. B is not correct because the total amount of work doesn\u2019t change; the only thing that the hash function changes is which worker does the work.  ","type":"SingleChoice"}
-{"instance_id":17,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":2,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# II VMware FT  \n\nThe paper The Design of a Practical System for Fault-Tolerant Virtual Machines by Scales et al. describes a method for avoiding split-brain using shared storage. Instead of using the shared storage\u2019s test-and-set, Ben implements test-and-set using the kvsrv server from Lab 2 as follows:  \n\n```\nfunc test-and-set(clnt *tester.Clnt) bool {\n    val, version, err := clnt.Get(\"lock\")\n    if err != rpc.OK {\n        return false\n    }\n    if val == \"set\" {\n        return false\n    }\n    if err := clnt.Put(\"lock\", \"set\", version); err == rpc.OK {\n        return true\n    }\n    return false\n}\n```\n\nThe clnt.Put and clnt.Get are RPCs that invoke the server\u2019s Put and Get methods. You can assume that Ben has implemented the Put and Get methods correctly.  \n\nThe initial value of \u201clock\u201d is the empty string.  \n\nWhen the primary or the backup suspect that the other one has crashed, they invoke test-go-live, each with their own RPC client clnt:  \n\n```\nfunc test-go-live() {\n    for true {\n        if test-and-set(clnt) {\n            go-live()\n            return\n        }\n    }\n}\n```\n\nThe network may lose, delay, or duplicate a few messages, but most messages will be delivered.   \nThe computers (primary, backup, and kvsrv server) do not fail.  \n\n2. [5 points]: What statements about Ben\u2019s implementation are true? (Circle all that apply)  \n   A. Both the primary and backup may observe test-and-set returning true in test-go-live, and \u201cgo live\u201d, resulting in split brain   \n   B. The key\/value server may never store \u201cset\u201d for the \u201clock\u201d key   \n   C. The primary and backup may spin forever in test-go-live, retrying test-and-set, because it may never return true   \n   D. If all RPCs succeed with no timeouts while running test-go-live, either the primary or the backup will observe true from test-and-set, but not both  ","answer":"C","explanation":"Answer: C is true: the first Put may change lock to set, but the reply may be lost; a re-send will return ErrMaybe (since the version won\u2019t match); so neither primary nor backup will ever see rpc.OK from Put. D is true: if there are no timeouts (i.e. no packets are lost) the first Put to arrive at the kvsrv will succeed, and the sender will get rpc.OK. A is false, because Put is conditional and only one can set val to \u201cset\u201d. B is false, because most messages will be delivered and thus eventually a Put will succeed in setting the lock.  ","type":"SingleChoice"}
-{"instance_id":18,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":3,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# III Linearizability  \n\nAlyssa is experimenting with a linearizable put\/get key\/value storage service. Unlike Lab 2, her key\/value service has no versions; put calls look like put(key, value).  \n\nAlyssa has two clients. Client C1 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t + 1)\n```\n\nAt about the same time, client C2 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t * 2)\n```\n\nBefore either client starts, the value for key $^ { \\mathrm { 6 6 } } \\mathrm { X } ^ { \\mathrm { 7 } }$ in the storage system is 10. Both clients\u2019 calls complete without error. There is no other activity involving the storage system, and there are no failures.  \n\nSuppose the history of the execution, in the style of Lecture 4, with values omitted, looks like this:  \n\nC1: |--Rx?--| |--Wx?--|\nC2: |--Rx?--| |--Wx?--|\n\n3. [5 points]: After both clients have finished, what could the resulting value of x be in the storage system? (Circle all that apply)  \n   A. 10   \n   B. 11   \n   C. 20   \n   D. 21   \n   E. 22  ","answer":"B,C","explanation":"Answer: 11 and 20. Both C1\u2019s read and C2\u2019s read see the initial value of $\\mathbf { X }$ (10), so C1 writes 11 and C2 writes 20. The writes are concurrent, so linearizability allows either write to appear to execute last, and thus provide the final value.  ","type":"MultipleChoice"}
-{"instance_id":19,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":4,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# III Linearizability  \n\nAlyssa is experimenting with a linearizable put\/get key\/value storage service. Unlike Lab 2, her key\/value service has no versions; put calls look like put(key, value).  \n\nAlyssa has two clients. Client C1 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t + 1)\n```\n\nAt about the same time, client C2 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t * 2)\n```\n\nBefore either client starts, the value for key $^ { \\mathrm { 6 6 } } \\mathrm { X } ^ { \\mathrm { 7 } }$ in the storage system is 10. Both clients\u2019 calls complete without error. There is no other activity involving the storage system, and there are no failures.  \n\nSuppose the history of the execution, in the style of Lecture 4, with values omitted, looks like this:  \n\nC1: |--Rx?--| |--Wx?--|\nC2: |--Rx?--| |--Wx?--|\n\nAlyssa resets the value of $^ { 6 6 } \\mathrm { X } ^ { 7 3 }$ to 10, and re-runs the two client programs. This time, the execution history looks like this:  \n\nC1: |--Rx?--|    |--Wx?--|\nC2:     |---Rx?---|    |--Wx?--|  \n\n4. [5 points]: After both clients have finished, what could the resulting value of x be in the storage system? (Circle all that apply)  \n   A. 10   \n   B. 11   \n   C. 20   \n   D. 21   \n   E. 22  ","answer":"B,C,E","explanation":"Answer: 11, 20, and 22. 22 is possible if C2\u2019s read sees C1\u2019s write.  ","type":"MultipleChoice"}
-{"instance_id":20,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":5,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# IV GFS  \n\nConsider GFS as described in The Google File System by Ghemawat et al.  \n\n5. [5 points]: Which statements about GFS are true? (Circle all that apply)  \n   A. GFS ensures linearizability of client operations by allowing clients to read from chunk replicas.   \n   B. The primary server of a chunk ensures that Append operations are executed exactly once.   \n   C. A chunk server uses 64 Mbytes of disk space for each chunk.   \n   D. Leases help ensure that each chunk has only one primary.  ","answer":"D","explanation":"Answer:  D. A is false, because GFS allows reading chunks from backups, which may have not seen the last update to a chunk, violating linearizability. B is false because if an Append fails, the client retries the Append, which the primary executes, causing some Appends to be execute twice. C is false; the paper\u2019s Section 2.5 says that chunks are stored as Linux files and are extended only as needed, with disk space allocated lazily; this means that if only a few bytes of a chunk are written, only that part of the Linux chunk file will consume disk space. D is true; Section 3.1 says that the coordinator grants a chunk\u2019s lease to just one of the replicas, and only grants the lease to a different replica if the lease expires.  ","type":"SingleChoice"}
-{"instance_id":21,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":6,"points":10,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# V Raft  \n\nRefer to Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version).  \n\n6. [10 points]: Which statements about Raft are true? (Circle all that apply)  \n   A. If a follower receives an AppendEntries RPC from the leader and the follower\u2019s term matches the one in the RPC, then the prevLogIndex in the RPC must be equal to or higher than the follower\u2019s lastApplied   \n   B. Raft is optimized for the case that term switch happen frequently   \n   C. Raft guarantees that a leader in term $t$ is leader in term $t + 1$   \n   D. If a leader sends the command in log index $i$ on the apply channel, the leader must have persisted log index $i$   \n   E. If a follower crashes in a term and quickly reboots, it remembers who it voted for before the crash   \n   F. The leader\u2019s matchIndex for a peer is always equal to or smaller than the leader\u2019s nextIndex for that peer.   \n   G. A candidate who becomes leader sends out AppendEntries to all followers to suppress further elections   \n   H. If Raft doesn\u2019t use snapshots, a crashed follower will send all committed log entries on the apply channel after it reboots, even ones that it sent before the crash  ","answer":"D,E,F,G,H","explanation":"Answer: D, E, F, G, H are true.  \nA is false, because an AppendEntries RPC from the leader may be delayed and arrive after later AppendEntries RPCs that bump up lastApplied; when the follower processes the first RPC, the prevLogIndex may be smaller than its lastApplied. B is false, because the authors believe terms change infrequently and therefore don\u2019t think the fast-backup optimization is necessary.  ","type":"MultipleChoice"}
-{"instance_id":22,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":7,"points":10,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# V Raft  \n\nRefer to Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version).  \n\n[10 points]: Which of the following bugs causes a Raft implementation to violate the safety properties listed in Figure 3? (Circle all that apply)  \nA. A deadlock in a follower   \nB. A follower who starts an election very quickly   \nC. A partitioned leader who on rejoining updates its term to the new leader\u2019s term and sends AppendEntries for commands in its log with the new term   \nD. A race condition in the follower\u2019s implementation that causes two followers to send different commands on the apply channel for log index $i$   \nE. A candidate that forgets to vote for itself   \nF. A follower who appends a log entry to its log even if the term in the AppendEntries is smaller than its own and who then sends the log entry on the apply channel   \nG. A follower that forgets to implement the rollback optimization presented at the end of section 5.3   \nH. A leader who always sends only one entry in an AppendEntries RPC to a follower","answer":"A,B,E,G,H","explanation":"Answer: A, B, E, G, and H are examples of what are called \u201cliveness\u201d bugs: these bugs don\u2019t cause wrong behavior but may prevent any progress. A: A deadlock in the follower may cause Raft to not make forward progress at some point (e.g., if the follower is necessary to form a majority). B may prevent a leader from being elected but it doesn\u2019t violate the safety properties. E is another variation of B. G and H may cause Raft to run slowly but that doesn\u2019t violate the safety properties.   C, D, F, on the other hand, are \u201csafety\u201d bugs that cause incorrect behavior that violates the safety rules of Raft of Figure 3.  ","type":"MultipleChoice"}
-{"instance_id":23,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":8,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# VI Lab 3A-3C  \n\nGeorge is implementing Raft as in Lab 3A-3C. Eager to test his implementation, George runs a git pull to get the latest changes from the 6.5840 staff. The latest changes introduce a new test, TestMiniReElection, which tests whether Raft can re-elect a leader after a single network partition.  \n\n```\nfunc TestMiniReElection(t *testing.T) {\n    servers := 3 \/\/ initialize three servers\n    ...\n    \/\/ wait for a leader to be elected; get the leader\u2019s index\n    leader1 := ts.checkOneLeader()\n    \n    ts.g.DisconnectAll(leader1) \/\/ disconnect leader1 from other servers\n    \n    \/\/ wait for a new leader to be elected; get the leader\u2019s index\n    leader2 := ts.checkOneLeader() \/\/ ***\n}\n```\n\nts.checkOneLeader() repeatedly polls only the connected servers until one of the connected servers returns that it is a leader. If it cannot find a leader within 5s, it returns a timeout error.  \n\nUnfortunately, there is a bug in the Raft test infrastructure. When leader1 is disconnected, leader1 can still send RPCs to the other servers but not receive responses from the other servers. George runs TestMiniReElection, and finds that the test fails at the line marked with \u201c\\*\\*\\*\u201d with the timeout error \u201cexpected one leader, got none\u201d.  \n\n8. [5 points]: Assume George\u2019s Raft implementation is completely correct, and that the network is reliable. Briefly explain why the buggy test infrastructure causes George\u2019s implementation to fail the new test.  ","answer":"Answer: All the peers will continue to receive leader1\u2019s heartbeat AppendEntries RPCs, which will prevent them from ever starting an election.  ","explanation":"Answer: All the peers will continue to receive leader1\u2019s heartbeat AppendEntries RPCs, which will prevent them from ever starting an election.  ","type":"ShortAnswerQuestion"}
-{"instance_id":24,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":9,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems, by Hunt, Konar, Junqueira, and Reed, and to Lecture 9.  \n\nAlyssa runs a ZooKeeper service with a ZooKeeper leader and multiple followers. Alyssa has three ZooKeeper client programs, P1, P2, and P3:  \n\n```\nP1:\n    s = openSession()\n    if create(s, \"\/leader\", \"one\", flags=ephemeral) == true:\n        print \"P1 starting as leader\"\n        _, version = getData(s, \"\/x\", watch=false)\n        setData(s, \"\/x\", \"one\", version)\n        _, version = getData(s, \"\/y\", watch=false)\n        setData(s, \"\/y\", \"one\", version)\n```\n\n```\nP2:\n    s = openSession()\n    if create(s, \"\/leader\", \"two\", flags=ephemeral) == true:\n        print \"P2 starting as leader\"\n        _, version = getData(s, \"\/x\", watch=false)\n        setData(s, \"\/x\", \"two\", version)\n        _, version = getData(s, \"\/y\", watch=false)\n        setData(s, \"\/y\", \"two\", version)\n        print \"P2 done\"\n```\n\n```\nP3:\n    s = openSession()\n    sync(s, \"\/\")\n    x = getData(s, \"\/x\", watch=false)\n    y = getData(s, \"\/y\", watch=false)\n    print x, y\n```\n\nInitially, znode \u201c\/leader\u201d does not exist, znode \u201c\/x\u201d exists and contains the string \u201cempty\u201d, and znode \u201c\/y\u201d exists and also contains the string \u201cempty\u201d.  \n\nThe ZooKeeper calls in Alyssa\u2019s code are all synchronous. The ZooKeeper client call create() is exclusive, returning false if the file already exists, and true if it was able to create the file. The programs might end up talking to different ZooKeeper followers.  \n\nAlyssa starts P1, waits until she sees it print \u201cP1 starting as leader\u201d, then (on a different computer) starts P2. Just at this point in time, P1\u2019s network connection starts to become slow and unreliable, so that sometimes it delivers packets, sometimes not. Alyssa sees that P2 prints \u201cP2 starting as leader\u201d, and after a little while \u201cP2 done\u201d. P2\u2019s network connection is reliable and fast.  \n\nAfter Alyssa sees \u201cP2 done\u201d, she runs P3.  \n\n9. [5 points]: What output from P3 could Alyssa see? (Circle all that apply)  \n   A. one, one \n   B. two, two \n   C. one, two \n   D. two, one  ","answer":"B","explanation":"Answer: B. Only two, two. We know P1\u2019s session must have terminated, because Alyssa saw P2 print \u201cP2 starting as leader,\u201d which could only have happened if ZooKeeper deleted P1\u2019s ephemeral \/leader file. So P2 will only start reading and writing data after P1 is guaranteed to have stopped writing (since ZooKeeper terminated its session). So P1 and P2\u2019s activities won\u2019t be intermixed; P2 runs strictly after P1. So both P2\u2019s sets will succeed. P3 starts after P2 finishes, and P3 calls sync(), so P3 will see P2\u2019s writes.  ","type":"SingleChoice"}
-{"instance_id":25,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":10,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# VIII Distributed Transactions  \n\nAlyssa has a database that supports serializable transactions. Records \u201cx\u201d and \u201cy\u201d both start out containing the value 1. Alyssa starts three transactions at the same time:  \n\n```\nT1:\n    BEGIN-X\n        temp1 = get(\"x\")\n        temp2 = get(\"y\")\n        put(\"x\", temp1 + temp2)\n    END-X\n\nT2:\n    BEGIN-X\n        temp1 = get(\"y\")\n        put(\"x\", temp1 * 2)\n    END-X\n\nT3:\n    BEGIN-X\n        put(\"y\", 3)\n    END-X\n```\n\nBEGIN-X marks the start of a transaction, and END-X marks the end. All three transactions commit and finish. There are no aborts, deadlocks, or failures. There is no other activity in the database.  \n\nWhen Alyssa looks at record \u201cx\u201d in the database after the transactions complete, she sees the value 5.  \n\n10. [5 points]: Briefly explain how the value 5 could have resulted from these transactions.  ","answer":"Answer: The database system could have executed the transactions one at a time, in the order T2, T3, T1.  ","explanation":"Answer: The database system could have executed the transactions one at a time, in the order T2, T3, T1.  ","type":"ShortAnswerQuestion"}
-{"instance_id":26,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2025 Exam I","course":"Distributed System Engineering","year":2025,"problem_num":11,"points":5,"score_total":65,"score_max":65.0,"score_median":52.0,"score_avg":51.85,"score_standard_seviation":6.7,"problem":"# IX Spanner  \n\nRefer to Spanner: Google\u2019s Globally-Distributed Database, by Corbett et al.  \n\n11. [5 points]: Suppose you only wanted to support read-write transactions (not read-only and not snapshot reads). You want therefore to eliminate all of the Spanner mechanisms that are not needed for read-write transactions. Which of these techniques can be eliminated? (Circle all that apply)  \n    A. commit wait (Sections 4.1.2, 4.2.1)   \n    B. safe time (Section 4.1.3)   \n    C. deadlock avoidance (Section 4.2.1)   \n    D. assignment of timestamps to read\/write transactions (Section 4.1.2)  ","answer":"A,B,D","explanation":"vAnswer: A, B, and D. The time-stamp mechanism is only needed for read-only transactions. Read-write transactions are made serializable and externally consistent by Spanner\u2019s two-phase locking and two-phase commit; the correctness of read-write transactions thus does not rely on time-stamps.  ","type":"MultipleChoice"}
-{"instance_id":27,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":1,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n1. [6 points]: Roughly when is the first time at which the sort application\u2019s Reduce() function is called? Circle the best answer.  \n   - A. 0 seconds\n   - B. 50 seconds\n   - C. 150 seconds\n   - D. 300 seconds  ","answer":"C","explanation":"Answer: C. The best answer is 150 seconds. No Reduce function can be called until every Map function has finished; the top graph suggests that the Maps stop running around 150 seconds, and the paper text mentions 200 seconds.  ","type":"SingleChoice"}
-{"instance_id":28,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":2,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n2. [7 points]: Roughly how long does it take a single application Reduce function to sort its share of the data (just the sort, not including either the shuffle or the writing of the output)? Circle the best answer.  \n\n- A. 10 seconds\n- B. 75 seconds\n- C. 200 seconds\n- D. 250 seconds\n- E. 650 seconds  \n- F. None of the above answers are correct.","answer":"F","explanation":"Answer: F. This question is broken: the application Reduce function does not sort the data. MapReduce\u2019s reduce task framework does the sort, and (for this application) the application Reduce function just returns its argument.","type":"SingleChoice"}
-{"instance_id":29,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":3,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n[6 points]: Why are there two bumps in the Shuffle graph? That is, why does the Shuffle graph go up and then down from time 20 to 200, remain at zero for 100 seconds, and then go up and then down from time 300 to 600? Circle the best answer.  \n\nA. There are more Map tasks $( \\mathbf { M } = 1 5 , 0 0 0 )$ than there are machines.   \nB. There are more Reduce tasks $\\begin{array} { r } { \\mathrm { R } = 4 0 0 0 \\mathrm { , } } \\end{array}$ ) than there are machines.   \nC. There are more Map tasks than there are Reduce tasks.   \nD. The aggregate network throughput is smaller than the aggregate disk throughput.   \nE. The Map tasks consume more CPU time than the Reduce tasks.  ","answer":"B","explanation":"Answer: B. The best answer is the second one (more Reduce tasks than machines). Intermediate data can only be moved from Map machines to Reduce machines for Reduce tasks that have been allocated to machines. There are only 1800 machines, so at first only 1800 of the 4000 Reduce tasks are assigned to machines, so only about 1800\/4000ths of the shuffles can happen at first. That\u2019s the first bump. The second bump starts once the first set of Reduce tasks finishes, moving intermediate data to the machines that will run the remaining Reduces.","type":"SingleChoice"}
-{"instance_id":30,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":4,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our\u2019s; you should circle the answer that is closest to what you think is correct.  \n\n[7 points]: Why does the shuffle begin a long time before the Map phase has finished? Circle the best answer.  \n\nA. There are more Map tasks $( \\mathbf { M } = 1 5 , 0 0 0 )$ ) than there are machines.\nB. There are more Reduce tasks $\\begin{array} { r } { \\mathrm { { R } } = 4 0 0 0 \\mathrm { { } } } \\end{array}$ ) than there are machines.\nC. There are more Map tasks than there are Reduce tasks.\nD. The aggregate network throughput is smaller than the aggregate disk throughput.   \nE. The Map tasks consume more CPU time than the Reduce tasks.  ","answer":"A","explanation":"Answer: A. The best answer is the first one (more Map tasks than machines). Shuffles can start as soon as Map functions finish. The system runs 1800 Maps at a time; the first of these finishes a long time before the last of the 15,000 Maps finishes at time 200.  ","type":"SingleChoice"}
-{"instance_id":31,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":5,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# II Linearizability  \n\nThese questions concern the material from Lecture 4, Consistency and Linearizability.  \n\nYou have a service whose state is a single string, and that exposes two RPC operations to clients: one operation appends the RPC argument to the state, and the other RPC operation returns the current state. The timelines below indicate the start time, end time, argument string, and reply string for each client operation. Ax indicates an append operation with argument $\\mathbf { X }$ , and Ry indicates a read operation to which the server replied y. The vertical bars indicate the start and end times of each operation (the times at which the client sends the request, and receives the reply). The service\u2019s state string starts out empty at the beginning of each history.  \n\nFor example,  \n\nC1:|---Ax---|\n\nC2:        |---Ay---|\n\nC3:    |--Ryx--|  \n\nmeans that client C1 sent an append RPC with \u201cx\u201d as the argument, C2 sent an append RPC with \u201cy\u201d as the argument, and C3 read the state and received the reply \u201cyx\u201d.  \n\nConsider this history, in which the reply string sent to C4 has been omitted:  \n\nC1:|---Ax---|  \nC2:    |---Ay---|  \nC3:        |---Az---|  \nC4:                |--R?--|  \n\n5. [6 points]: Which values could C4\u2019s read yield that are consistent with linearizability? Circle all of the correct answers. \n   - A. xzy\n   - B. yxz\n   - C. yzx\n   - D. xy\n   - E. xz\n   - F. yx\n   - G. zy  ","answer":"A,B,D,F","explanation":"Answer: A,B,D,F. xzy, yxz, xy, and yx. The result C4 receives can\u2019t start with z (since the Az starts after the Ax finishes); if both x and z appear, x must come first; and it must include both x and y (since Ax and Ay both finish before the C4\u2019s read starts).  ","type":"MultipleChoice"}
-{"instance_id":32,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":6,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# II Linearizability  \n\nThese questions concern the material from Lecture 4, Consistency and Linearizability.  \n\nYou have a service whose state is a single string, and that exposes two RPC operations to clients: one operation appends the RPC argument to the state, and the other RPC operation returns the current state. The timelines below indicate the start time, end time, argument string, and reply string for each client operation. Ax indicates an append operation with argument $\\mathbf { X }$ , and Ry indicates a read operation to which the server replied y. The vertical bars indicate the start and end times of each operation (the times at which the client sends the request, and receives the reply). The service\u2019s state string starts out empty at the beginning of each history.  \n\nFor example,  \n\nC1:|---Ax---|\n\nC2:        |---Ay---|\n\nC3:    |--Ryx--|  \n\nmeans that client C1 sent an append RPC with \u201cx\u201d as the argument, C2 sent an append RPC with \u201cy\u201d as the argument, and C3 read the state and received the reply \u201cyx\u201d.  \n\nConsider this history, in which the reply string sent to C4 has been omitted:  \n\nC1:|---Ax---|  \nC2:    |---Ay---|  \nC3:        |---Az---|  \nC4:                |--R?--|  \n\nNow look at this history:  \n\nC1: |-------Ax-------|  \nC2:         |---Ay---|  \nC3:     |--Ry--|  \nC4:             |----R?----|  \n\n6. [7 points]: Which values could C4\u2019s read yield that are consistent with linearizability? Circle all of the correct answers. \n   - A. y \n   - B. x \n   - C. yx \n   - D. xy  ","answer":"A,C","explanation":"Answer: A,C. y and yx. The fact that C3 read y, and that C3\u2019s read finished before C4\u2019s read started, means that C4\u2019s result must include y, and, if it includes x, the x must come after y.  ","type":"MultipleChoice"}
-{"instance_id":33,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":7,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# III GFS and Raft  \n\nAfter reading the GFS paper (The Google File System by Ghemawat et al.) and the Raft paper (Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version)), Ben replaces the GFS master with a new coordinator that uses Raft. The Raft-based coordinator provides the same functions as before but replicates the log of operations using 3 Raft peers. All other parts of GFS stay the same.  \n\nWhich of the following statements are true? (Circle all that apply)  \n\n7. [6 points]:  \n\n- A. The coordinator can continue operation in the presence of network partitions without any additional monitoring infrastructure, if one partition with peers is able to achieve a majority.   \n- B. The coordinator can continue operation correctly even if one of the 3 peers has failed (and there are no other failures).   \n- C. None of the above are true  ","answer":"A,B","explanation":"Answer: Both A and B are true; these are properties of Raft.  ","type":"MultipleChoice"}
-{"instance_id":34,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":8,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# III GFS and Raft  \n\nAfter reading the GFS paper (The Google File System by Ghemawat et al.) and the Raft paper (Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version)), Ben replaces the GFS master with a new coordinator that uses Raft. The Raft-based coordinator provides the same functions as before but replicates the log of operations using 3 Raft peers. All other parts of GFS stay the same.  \n\nWhich of the following statements are true? (Circle all that apply)  \n\nBen also considers using Raft for chunk replication. He runs many Raft clusters and has the GFS master assign chunks to a specific Raft cluster (i.e., each chunk is assigned to one Raft cluster, consisting of a leader and two followers). GFS clients submit write and append operations for a chunk to the leader of the Raft cluster for that chunk (i.e., Ben\u2019s design doesn\u2019t implement the separate data flow). The leader of the Raft cluster replicates write and append operation using the Raft library. All other parts of GFS (e.g., assigning leases to the leader, client caching locations of chunk servers, reading from the closest server, and so on) stay the same. (You can assume that chunk servers have enough disk space for operations to succeed.)  \n\nWhich of the following statements are true? (Circle all that apply)  \n\n8. [7 points]:  \n\n- A. Unlike the old design, Ben\u2019s design can achieve linearizability for chunk operations.   \n- B. Unlike the old design, Ben\u2019s design can continue operation despite the failure of one chunk server.   \n- C. By using Raft, Ben\u2019s design allows clients to perform more mutating chunk operations per second than the old design.   \n- D. Raft\u2019s snapshots allow a chunk server to catch up in a few seconds if has been down for a long time (assuming the same network as in the GFS paper).   \n- E. None of the above are true  ","answer":"E","explanation":"Answer: E. None of the above are true. A is false because the client\u2019s cache that maps file names to chunk handles can yield stale results. B is false because the old design can continue despite one failure as well. C is false because Ben\u2019s scheme moves data less efficiently (via the leader, rather than the separate data flow). D is false because the snapshot mechanism sends the leader\u2019s entire database of chunks, which will likely take far longer than a few seconds.  ","type":"MultipleChoice"}
-{"instance_id":35,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":9,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# IV Raft  \n\nConsider the Raft paper (Ongaro and Ousterhout\u2019s In Search of an Understandable Consensus Algorithm (Extended Version)). Ben wonders what the impact of network behavior is on Raft\u2019s performance. Ben runs a Raft-replicated server that receives many client requests. If the network delivers AppendEntries RPCs in order, Ben\u2019s Raft implementation is fast (i.e., completes many client requests per second). But, if the network delivers AppendEntries frequently out of order, Ben\u2019s Raft implementation performs badly (i.e., completes fewer client requests per second). Using the rules in Figure 2 explain why this is the case.  \n\n9. [6 points]:  ","answer":"Answer: This question is broken. Figure 2 implies that each AppendEntries should include all as-yet-unacknowledged log entries. So if there are two such RPCs outstanding, the one that was sent second contains a copy of all the log entries in the first. This means that, if the second RPC arrives first, it will be accepted. So it\u2019s not clear why Ben would see any different performance due to out-of-order delivery.  ","explanation":"Answer: This question is broken. Figure 2 implies that each AppendEntries should include all as-yet-unacknowledged log entries. So if there are two such RPCs outstanding, the one that was sent second contains a copy of all the log entries in the first. This means that, if the second RPC arrives first, it will be accepted. So it\u2019s not clear why Ben would see any different performance due to out-of-order delivery.  ","type":"ShortAnswerQuestion"}
-{"instance_id":36,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":10,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# V Lab 3A-3C  \n\nAlyssa is implementing Raft as in Lab 3A-3C. She implements advancing the commitIndex at the leader (i.e., last bullet of Leaders in Fig 2) as follows:  \n\n```go\nfunc (rf *Raft) advanceCommit() {\n    start := rf.commitIndex + 1\n    if start < rf.log.start() { \/\/ on restart start could be 1\n    \tstart = rf.log.start()\n    }\n    for index := start; index <= rf.log.lastindex(); index++ {\n    \tif rf.log.entry(index).Term != rf.currentTerm { \/\/ 5.4\n    \t\tcontinue \/\/ ***\n        }\n        n := 1 \/\/ leader always matches\n        for i := 0; i < len(rf.peers); i++ {\n        \tif i != rf.me && rf.matchIndex[i] >= index {\n        \t\tn += 1\n            }\n        }\n        if n > len(rf.peers)\/2 { \/\/ a majority?\n        \tDPrintf(\"%v: Commit %v\\n\", rf.me, index)\n        \trf.commitIndex = index\n        }\n    }\n}\n```\n\nAssume that all omitted parts of Alyssa\u2019s code are correct.  \n\nBen argues that the line marked with \u201c\\*\\*\\*\u201d could be replaced by a break statement so that the loop terminates immediately.  \n\n10. [7 points]: Explain what could go wrong if one adopted Ben\u2019s proposal; please include a specific sequence of events to illustrate your answer.  ","answer":"Answer: If there\u2019s a term mis-match, the leader won\u2019t be able to commit any further log entries.   \nThe paper\u2019s Figure 8e shows an example of such a scenario.  ","explanation":"Answer: If there\u2019s a term mis-match, the leader won\u2019t be able to commit any further log entries.   \nThe paper\u2019s Figure 8e shows an example of such a scenario.  ","type":"ShortAnswerQuestion"}
-{"instance_id":37,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":11,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VI More lab 3A-3C  \n\nAlyssa is implementing Raft as in Lab 3A-3C. She implements the rule for conversion to follower in her AppendEntries RPC handler as shown below:  \n\n```go\nfunc (rf *Raft) convertToFollower(term int) {\n    rf.state = Follower\n    rf.votedFor = -1\n    rf.currentTerm = term\n    rf.persist()\n}\n```\n\n```go\nfunc (rf *Raft) AppendEntries(args * AppendEntriesArgs,\n                            reply *AppendEntriesReply) {\n    rf.mu.Lock()\n    defer rf.mu.Unlock()\n    if args.Term >= rf.currentTerm {\n        rf.convertToFollower(args.Term)\n    }\n    ...\n}\n```\n\nAssume that all omitted parts of Alyssa\u2019s code are correct.  \n\n11. [6 points]: Describe a specific sequence of events that would cause Alyssa\u2019s implementation to break the safety guarantees provided by Raft.  ","answer":"Answer: The code shown can cause a peer to forget it has cast a vote for the current term. Suppose peer P1 has been elected for this term. The peers that elected it may forget that they voted for P1. Then some other peer P2 may become candidate for this term, and get votes from those forgetful peers, and become a second leader for the same term. This will lead to split brain.  ","explanation":"Answer: The code shown can cause a peer to forget it has cast a vote for the current term. Suppose peer P1 has been elected for this term. The peers that elected it may forget that they voted for P1. Then some other peer P2 may become candidate for this term, and get votes from those forgetful peers, and become a second leader for the same term. This will lead to split brain.  ","type":"ShortAnswerQuestion"}
-{"instance_id":38,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":12,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems by Hunt, Konar, Junqueira, and Reed, and to the notes for Lecture 9.  \n\nThe code fragments below are simplified versions of how something like GFS or MapReduce might use ZooKeeper to elect a coordinator, and for that coordinator to store state such as the assignments of GFS data to chunkservers.  \n\nSuppose server S1 executes the following code to become elected and to then store coordinator state in \/A and \/B. Initially, znode \/coord-lock does not exist, znode \/A starts out containing A0, and znode \/B starts out containing B0.  \n\n```\n    s = openSession()\n    if create(s, \"\/coord-lock\", data=\"S1\", ephemeral=true) == true:\n        setData(s, \"\/A\", \"A1\", version=-1)\n        setData(s, \"\/B\", \"B1\", version=-1)\n```\n\n\n\n12. [7 points]: Briefly explain why, for coordinator election, it makes sense that \/coord-lock should be an ephemeral znode rather than a regular znode.  ","answer":"Answer: If a server is elected as coordinator, and then fails, ZooKeeper automatically deletes the ephemeral \/coord-lock; now another server can create that file and become coordinator.  ","explanation":"Answer: If a server is elected as coordinator, and then fails, ZooKeeper automatically deletes the ephemeral \/coord-lock; now another server can create that file and become coordinator.  ","type":"ShortAnswerQuestion"}
-{"instance_id":39,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":13,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems by Hunt, Konar, Junqueira, and Reed, and to the notes for Lecture 9.  \n\nThe code fragments below are simplified versions of how something like GFS or MapReduce might use ZooKeeper to elect a coordinator, and for that coordinator to store state such as the assignments of GFS data to chunkservers.  \n\nSuppose server S1 executes the following code to become elected and to then store coordinator state in \/A and \/B. Initially, znode \/coord-lock does not exist, znode \/A starts out containing A0, and znode \/B starts out containing B0.  \n\n```\n    s = openSession()\n    if create(s, \"\/coord-lock\", data=\"S1\", ephemeral=true) == true:\n        setData(s, \"\/A\", \"A1\", version=-1)\n        setData(s, \"\/B\", \"B1\", version=-1)\n```\n\nS1\u2019s create() finishes and returns true to indicate success. But just after that, and before ZooKeeper has received S1\u2019s setData() requests, ZooKeeper decides that S1 has failed, and ZooKeeper terminates S1\u2019s session.  \n\nAfter ZooKeeper terminates S1\u2019s session, server S2 runs this to become coordinator:  \n\n```\ns = openSession()\nif create(s, \"\/coord-lock\", data=\"S2\", ephemeral=true) == true:\n    setData(s, \"\/A\", \"A2\", version=-1)\n    setData(s, \"\/B\", \"B2\", version=-1)\n```\n\nHowever, S1 is actually still alive, and it proceeds to send the two setData() requests, and they arrive at ZooKeeper.  \n\nThen client C1 reads \/B and \/A and sees B2 and A2, respectively.  \n\nNow a different client, C2, reads \/B, and then reads \/A. Both reads succeed.  \n\n13. [6 points]: Given the way ZooKeeper works, what can C2 observe? Circle all of the possible read results.  \n        \/B \/A   \n    A. B0 A0   \n    B. B0 A1   \n    C. B0 A2   \n    D. B2 A0   \n    E. B2 A1  ","answer":"A,C","explanation":"Answer: B0 A0 and B0 A2 are the only possible results. B0 is possible because, in the absence of other constraints, ZooKeeper can yield stale data to reads. A1 is never possible because ZooKeeper terminated S1\u2019s session before ZooKeeper receive S1\u2019s setData()s, so ZooKeeper ignore those setData()s. B2 A0 is not possible since, once ZooKeeper has revealed a write to a client, the \u201cLinearizable writes\u201d guarantee in Section 2.3 implies that all previous writes have been applied.  ","type":"MultipleChoice"}
-{"instance_id":40,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":14,"points":7,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# VIII Grove  \n\nIn the ApplyReadonly function in Figure 7, Ben decides to delete the check for s.waitForCommitted() The new code is as as follows:  \n\n```\nfunc (s *Server) ApplyReadonly(op) Result {\n\ts.mutex.Lock()\n    if s.leaseExpiry > GetTimeRange().latest {\n        e := s.epoch\n        idx, res := s.stateLogger.LocalRead(op)\n        s.mutex.Unlock()\n        return res\n    } else {\n        s.mutex.Unlock()\n        return ErrRetry\n    }\n}\n```\n\n14. [7 points]: Explain why this modification can result in non-linearizable reads.  ","answer":"Answer: If a Grove backup server reveals an update without waiting to ensure it has been committed, then it may reveal an uncommitted write. If the primary then fails, the backup whose database is used to recover may not have recent uncommitted writes. So the write may disappear, and other clients issuing strictly subsequent reads may not see that write. That would not be linearizable.  ","explanation":"Answer: If a Grove backup server reveals an update without waiting to ensure it has been committed, then it may reveal an uncommitted write. If the primary then fails, the backup whose database is used to recover may not have recent uncommitted writes. So the write may disappear, and other clients issuing strictly subsequent reads may not see that write. That would not be linearizable.  ","type":"ShortAnswerQuestion"}
-{"instance_id":41,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam I","course":"Distributed System Engineering","year":2024,"problem_num":15,"points":6,"score_total":97,"score_max":95.0,"score_median":78.0,"score_avg":77.0,"score_standard_seviation":9.72,"problem":"# IX Distributed Transactions  \n\nMouseGPT is designing a distributed transaction system using two-phase commit and two-phase locking, as discussed in Lecture 12 and Chapter 9 of the 6.033 reading. The goal is to provide serializable results. The question arises of what should happen if a participant computer crashes while in the PREPARED state for a transaction. MouseGPT thinks that all-or-nothing atomicity would be satisfied if such a transaction were completely forgotten. So MouseGPT designs the system so that if a participant computer crashes and restarts while it is in the PREPARED state for a transaction that it\u2019s part of, the recovery software on that computer un-does any local modifications the interrupted transaction might have performed and releases its locks, and sends a network message to each other participant and to the TC to tell them to undo any changes made by the transaction and to release its locks.  \n\n15. [6 points]: Explain why MouseGPT\u2019s plan would cause the system to produce nonserializable (incorrect) results.  ","answer":"Answer: The TC may have decided to commit the transaction, and sent out COMMIT messages to the other participating workers, and they may have committed, and revealed committed results to other transactions. At that point, there is no way to back out of the transaction without violating serializability and atomicity.  ","explanation":"Answer: The TC may have decided to commit the transaction, and sent out COMMIT messages to the other participating workers, and they may have committed, and revealed committed results to other transactions. At that point, there is no way to back out of the transaction without violating serializability and atomicity.  ","type":"ShortAnswerQuestion"}
-{"instance_id":42,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":1,"points":5,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# I Spanner  \n\nThe intelligent computer HAL is using Spanner (as described in Spanner: Google\u2019s GloballyDistributed Database by Corbett et al.) to store data. HAL notes that read\/write transactions are being slowed down by Spanner\u2019s commit-wait mechanism (see Section 4.2.1). HAL disables commit-wait in his Spanner installation; as a result, everything works just as described in the paper except that the coordinator leader does not wait until the timestamp $s$ is guaranteed to be in the past.  \n\nHAL uses just these three transactions:  \n\n```\nT1:\n    X=1\n    Y=1\nT2:\n    X=22\n    Y=22\nT3:\n    print X, Y\n```\n\nInitially, database records X and Y both have value 0. X and $\\mathrm { Y }$ are in different Spanner shards, managed by different Paxos groups. T1 and T2 are read\/write transactions; T3 is a read-only transaction.  \n\nHAL starts T1; waits for Spanner to say that T1 has completed; starts T2, waits for Spanner to say that T2 has completed; then starts T3 and observes T3\u2019s output.  \n\n1. [5 points]: Which outputs from T3 are possible? (For each statement, circle True or False.)  \n   A. True \/ False : 22, 22  \n   B. True \/ False : 1, 1  \n   C. True \/ False : 1, 22  \n   D. True \/ False : 0, 0  ","answer":"A,B,D","explanation":"Answer: A,B,D. 22,22, 1,1, and 0,0 are all possible; 1,22 is not. Omitting commit-wait means that either or both of T1 and T2 might commit with time-stamps later than the time-stamp that T3 chooses, so T3 might see the result of either T1 or T2, or neither. T3 can\u2019t see 1,22 because both T1 and T2 do both their writes at the same timestamp, so T3 will either see both writes of one of the transactions, or neither.  ","type":"MultipleChoice"}
-{"instance_id":43,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":2,"points":4,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# II Chardonnay  \n\nConsider the paper Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases, by Eldeeb et al.  \n\nA read\/write Chardonnay transaction reads database record A, then reads B, and then writes C. The system is busy with other read\/write transactions at the same time, some of which might also use A, B, and\/or C.  \n\n2. [4 points]: In which situation will Chardonnay\u2019s \u201cdry run\u201d mechanism yield the most benefit? (Circle the single best answer.)  \n   A. A is hot, B is cold.   \n   B. A is cold, B is hot.   \n   C. A is cold, B is cold.   \n   D. A is hot, B is hot.  \n\n\u201cCold\u201d means used rarely. \u201cHot\u201d means used by many transactions.  ","answer":"A","explanation":"Answer: A. Only the first answer (hot, cold) is correct. Chardonnay\u2019s dry run mechanism helps avoid situations in which a transaction holds the lock for a record that other transactions need, while waiting to read a record from the disk. This situation arises when a read\/write transaction uses a hot record followed by a cold record.  \n\n---\n","type":"SingleChoice"}
-{"instance_id":44,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":3,"points":4,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# II Chardonnay  \n\nConsider the paper Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases, by Eldeeb et al.  \n\nA read\/write Chardonnay transaction reads database record A, then reads B, and then writes C. The system is busy with other read\/write transactions at the same time, some of which might also use A, B, and\/or C.  \n\nA system that uses Chardonnay issues just these three transactions:  \n\n```\nT1:\n    X=1\nT2:\n    Y=1\nT3:\n    print X, Y\n```\n\nInitially, both database records (X and Y) start out with value 0. X and Y are in different ranges. T1 and T2 are read\/write transactions. T3 is a read-only transaction (described in the paper\u2019s Section 6). T3 does not use the the waiting idea described in the last paragraph of Section 6.2.  \n\nOne client starts T1. After T1 completes, another client starts T2. After T2 completes, a third client runs T3.  \n\nThis version of Chardonnay has a bug somewhere in its code, causing T3 to print the incorrect output 0,1.  \n\n3. [4 points]: Which of the following bugs is the most plausible explanation for T3 printing   \n   0,1? Circle the single most correct answer.  \n   A. The epoch server is stuck: it always returns the same epoch number, and never increases it.  \n   B. The epoch server is incrementing too quickly: more than once per 10 milliseconds.  \n   C. The epoch server is working correctly except it gave T2 an epoch that was too small.  \n   D. The epoch server is working correctly except it gave T2 an epoch that was too large.  ","answer":"C","explanation":"Answer: C. The third answer is correct. 0,1 is not a correct output because serializability requires that if T3 observes the results of T2, and T1 finished before T2 started, then T3 is required to also see the results of T1. If the epoch server gives T2 an epoch that\u2019s less than T1\u2019s epoch, and T3 and T1 run in the same epoch, then T3 will see T2\u2019s $\\scriptstyle \\ Y = 1$ but not T1\u2019s $\\scriptstyle \\mathrm { X = 1 }$ .  ","type":"SingleChoice"}
-{"instance_id":45,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":4,"points":8,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# III FaRM  \n\nConsider the following statements about FaRM as described in No compromises: distributed transactions with consistency, availability, and performance. For each statement, circle True or False.  \n\n4. [8 points]:  \n\nTrue \/ False : Because FaRM uses primary-backup replication for a region (instead of Paxos), FaRM must reconfigure to remove a failed replica before FaRM can continue to use the region.  \n\nTrue \/ False : FaRM can use short leases (10ms by default) because it has communication and scheduling optimizations to renew leases quickly.  \n\nTrue \/ False : A transaction that modifies only one object will never abort.  \n\nTrue \/ False : Read-only transactions require only the validate step of the Commit phase in Figure 4.  ","answer":"True,True,False,True","explanation":"Answer: True, True, False, True. The first statement is true because FaRM requires a response from all replicas, thus it must reconfigure to remove the failed replica before it can continue with the affected shard. The third statement is false because another transaction may modify the one object causing this transaction\u2019s validation phase to fail (because the other transaction will have incremented the object\u2019s version number).  ","type":"True\/False Questions"}
-{"instance_id":46,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":5,"points":4,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# IV Ray  \n\nConsider the following Ray program, which creates a sqrt task task for each number in the list mylist. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:  \n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\ndef sqrts0(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n    \n    r = [ ]\n    for f in l:\n\t\tr.append(get(f)) # collect the result\n\treturn r\n\nprint(sqrts0(mylist)) # invoke sqrts0 with a list of numbers and print result\n```\n\nAssume Ray behaves in the way described in Ownership: a distributed futures system for finegrained tasks by Wang et al., and Ray is running on a cluster of computers.  \n\n5. [4 points]:  Will the sqrt computations complete in the order that sqrts0 appends to r? (Briefly explain your answer)  ","answer":"Answer: No. The sqrt tasks run concurrently with each other, and may finish in an arbitrary order. All that is guaranteed is that the task has finished executing (at least once) by the time get(f) returns.  ","explanation":"Answer: No. The sqrt tasks run concurrently with each other, and may finish in an arbitrary order. All that is guaranteed is that the task has finished executing (at least once) by the time get(f) returns.  ","type":"ShortAnswerQuestion"}
-{"instance_id":47,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":6,"points":4,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# IV Ray  \n\nConsider the following Ray program, which creates a sqrt task task for each number in the list mylist. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:  \n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\ndef sqrts0(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n    \n    r = [ ]\n    for f in l:\n\t\tr.append(get(f)) # collect the result\n\treturn r\n\nprint(sqrts0(mylist)) # invoke sqrts0 with a list of numbers and print result\n```\n\nAlyssa creates a function sqrts1 whose body is the same as sqrts0, but is declared as a remote task. She then modifies the program to invoke many sqrts1\u2019s, each with a large distinct, nonoverlapping slice of the number list. The code is as follows:  \n\n```\n@ray.remote\ndef sqrts1(n_list):\n    ...\n    # same code as sqrts0\n    ...\n    return r\n    \nf0 = sqrts1(mylist[...])\nf1 = sqrts1(mylist[...])\nf2 = sqrts1(mylist[...])\n...\nprint(get(f0))\nprint(get(f1))\n...\n```\n\n6. [4 points]:  Ben is worried that the above program creates so many sqrt tasks tasks that Ray will be bottle-necked by managing the tasks and the futures they yield. Briefly explain why Ray can manage many tasks in parallel for the above program?  ","answer":"Answer: The worker machine that invokes sqrts1(...) is the owner of the metadata for the value returned by each sqrts1 call. The many workers that execute sqrts1() each independently own the metadata for their sqrt task\u2019s, resulting in no one machine being required to manage all the sqrt tasks.  ","explanation":"Answer: The worker machine that invokes sqrts1(...) is the owner of the metadata for the value returned by each sqrts1 call. The many workers that execute sqrts1() each independently own the metadata for their sqrt task\u2019s, resulting in no one machine being required to manage all the sqrt tasks.  ","type":"ShortAnswerQuestion"}
-{"instance_id":48,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":7,"points":5,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# V Memcache at Facebook  \n\nBen Bitdiddle runs a web site. Ben reads the paper Scaling Memcache at Facebook by Nishtala et al., and thinks that the design is too complex. So Ben decides to ignore the paper\u2019s design: he doesn\u2019t use leases, mcrouter, pools, etc. Ben uses only the mechanisms described below.  \n\nBen has just a single region, with some web servers, some memcache servers, and a single database server. Ben programs each of his web servers to use the following client code to read and write data:  \n\n```\nread(k):\n    if v = memcache_get(k) succeeds\n    \treturn v\n    else\n\t\treturn database_get(k)\n\nwrite(k, v):\n    database_put(k, v)\n    memcache_put(k, v)\n```\n\nNote that read() does not insert anything into memcache, and note that write() always inserts the new data into memcache, whether it was already cached or not. Ben knows this may be wasteful, since it may cause memcache to cache data that\u2019s never read, but he doesn\u2019t mind.  \n\n7. [5 points]: Sadly, Ben sees that read()s sometimes return stale data for a long time after the write() of a newer value has succeeded and returned. Explain how this could happen.  ","answer":"Answer: If there are concurrent writes by different clients to the same key, the calls to database put() may execute in a different order that the calls to memcache put(), so that memcache and the database end up with different values. This condition can persistent for a long time: until the next time a client writes the same key.  ","explanation":"Answer: If there are concurrent writes by different clients to the same key, the calls to database put() may execute in a different order that the calls to memcache put(), so that memcache and the database end up with different values. This condition can persistent for a long time: until the next time a client writes the same key.  ","type":"ShortAnswerQuestion"}
-{"instance_id":49,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":8,"points":4,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# VI Lab 4  \n\nBen implements the RPC handlers and the applier in Lab 4 as follows. The RPC handlers for Get, Put, and Append take the following steps:  \n\nA. Submit a command to the Raft library via Start. The command includes the client ID, request ID, operation type, and arguments.   \nB. Loop to wait until the reply for that command to show up in the reply table, which maps from client IDs to the replies of clients\u2019 latest requests. Each reply contains the request ID and the result to that request. If Raft\u2019s leadership changes during the loop, return ErrWrongLeader.   \nC. Return the result stored in the reply table.  \n\nThe applier detail is irrelevant to this question and is shown on the next page.  \n\n8. [4 points]:  Ben observes that Get does not modify the application state. He changes Get\u2019s RPC handler to read the key-value table and return immediately to the client the result. Does this implementation preserve linearizability? (Briefly explain your answer.)  ","answer":"Answer: No. Get could return a stale result if Raft the leadership changes. For instance, if a client submits an Append to the old leader and succeeds, and then submits a Get to the new leader, the Get result could miss the appended value if the new leader handles the Get before applying the Append.  ","explanation":"Answer: No. Get could return a stale result if Raft the leadership changes. For instance, if a client submits an Append to the old leader and succeeds, and then submits a Get to the new leader, the Get result could miss the appended value if the new leader handles the Get before applying the Append.  ","type":"ShortAnswerQuestion"}
-{"instance_id":50,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":9,"points":4,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"Ben implements the RPC handlers and the applier in Lab 4 as follows. The RPC handlers for Get, Put, and Append take the following steps:  \n\nA. Submit a command to the Raft library via Start. The command includes the client ID, request ID, operation type, and arguments.   \nB. Loop to wait until the reply for that command to show up in the reply table, which maps from client IDs to the replies of clients\u2019 latest requests. Each reply contains the request ID and the result to that request. If Raft\u2019s leadership changes during the loop, return ErrWrongLeader.   \nC. Return the result stored in the reply table.  \n\nThe applier takes the following steps:  \n\nD. Read a command from the apply channel.   \nE. De-duplicate the command with the reply table: if the request ID in the reply table for the client is greater than or equal to that in the command, then skip the command.   \nF. Apply the command and insert the result to the reply table.  \n\n9. [4 points]:  Separately from the previous change, Ben modifies his implementation to perform de-duplication early in the RPC handlers. Concretely, he removes step $\\mathbf { E }$ in the applier, and adds an additional step at the start of the RPC handlers (i.e., before step A) as follows:  \n   If the request ID in the reply table for the client is greater than or equal to that in the RPC arguments, return the result stored in the reply table.  \n   Does this implementation preserve linearizability? (Briefly explain your answer.)  ","answer":"Answer: No. An operation could be applied twice if the client re-sends it before the first RPC is applied.  ","explanation":"Answer: No. An operation could be applied twice if the client re-sends it before the first RPC is applied.  ","type":"ShortAnswerQuestion"}
-{"instance_id":51,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":10,"points":8,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# VII AWS Lambda  \n\nConsider the guest lecture about the paper On-demand container loading in AWS Lambda by Brooker et al. For each of the following statements, indicate whether it is true or false.  \n\n10. [8 points]:  \n\nTrue \/ False $\\because$ AWS Lambda is attractive to customers because it allows them to run cloud computations without having to provision a machine.  \n\nTrue \/ False : Many containers of AWS Lambda customers don\u2019t contain unique chunks because customers upload the same container multiple times.  \n\nTrue \/ False : AWS Lambda may deduplicate popular chunks less than unpopular chunks.  \n\nTrue \/ False $\\because$ AWS Lambdas use LRU-K to ensure that if many infrequently-used Lambdas are running at the same time, they don\u2019t evict the chunks of frequently-used Lambdas.  ","answer":"True,True,True,True","explanation":"Answer: True, True, True, True. The third option is true because AWS does this to reduce the blast radius of popular chunks (see Section 3.3).  ","type":"True\/False Questions"}
-{"instance_id":52,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":11,"points":5,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# VIII Boki  \n\nConsider Figure 6(a) in Boki: Stateful Serverless Computing with Shared Logs by Jia and Witchel. The left column describes how Boki makes the execution of a workflow of serverless functions with database side-effects exactly-once.  \n\nAlyssa notices that if Boki reruns a workflow it will append a record to the workflow\u2019s LogBook, even if an append of an earlier failed execution already logged the record. Alyssa proposes to change the pattern of append-read to read-append-read: that is, she modifies Boki to read before an append to see if the append already logged its record; if so, it uses the first value returned by the read and skips the subsequent append and read. (If not, Boki executes as before, doing an append followed by read.)  \n\nFor example, Alyssa changes write as follows:  \n\ndef write(table, key, val): tag $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ hashLogTag([ID, STEP]) # first read rec $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ logReadNext(tag: tag, minSeqnum: 0) # if no record, then append and read again if rec $\\scriptstyle = =$ None: logAppend([tags: [tag], data: [table, key, val]) rec $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ logReadNext(tag: tag, minSeqnum: 0) rawDBWRITE(...) # same call as before $\\mathrm { S T E P } ~ = ~ \\mathrm { S T E P } ~ + ~ 1$  \n\n11. [5 points]:  \n\nAlyssa runs one workflow on her modified Boki. The workflow crashes during its execution and then restarts from the beginning and completes. With Alyssa\u2019s modification will write preserve exactly-once semantics? (Briefly explain your answer.)  ","answer":"Answer: It will preserve exactly-once semantics. In the case that logReadNext() returns something non-None initially, it will always return that same log record. So even if write() did a logAppend(), the final logReadNext() would have the same value as the logReadNext() that is executed before logAppend().  ","explanation":"Answer: It will preserve exactly-once semantics. In the case that logReadNext() returns something non-None initially, it will always return that same log record. So even if write() did a logAppend(), the final logReadNext() would have the same value as the logReadNext() that is executed before logAppend().  ","type":"ShortAnswerQuestion"}
-{"instance_id":53,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":12,"points":6,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# IX SUNDR  \n\nConsider the straw-man design in the paper Secure Untrusted Data Repository (SUNDR) by Li et al.  \n\nUsers A, B, and C share a SUNDR server. The server may be malicious, though the server does not know any of the private keys. User A creates a new file aaa in the SUNDR file system. After that, user B looks for file aaa, but does not see the file. After that, user C creates a new empty file ccc.  \n\nThere is no client activity other than what is described here. None of the stronger consistency ideas from the paper\u2019s Section 3.2 are in use. All three users are honest and run correct SUNDR client software.  \n\nAll three users now use the ls command to check whether they can see file ccc. All three users\u2019 client SUNDR implementations report that the data they receive from SUNDR passes all validity checks. Nevertheless, a malicious SUNDR server can cause a number of different outcomes.  \n\n12. [6 points]: What combinations are possible for which users can see ccc? For each statement, circle True if the SUNDR server could cause the indicated results, and False if not.  \n\nTrue \/ False : All three users can see ccc.  \n\nTrue \/ False : Only A and B can see ccc, but not C.  \n\nTrue \/ False $\\because$ Only A and C can see ccc, but not B.  \n\nTrue \/ False $\\because$ Only B and C can see ccc, but not A.  \n\nTrue \/ False $\\because$ Only C can see ccc, but not A or B.  \n\nTrue \/ False $\\because$ None of the users can see ccc.  ","answer":"False,False,True,True,True,False","explanation":"Answer: False,False,True,True,True,False. The correct answers are A and C but not B, B and C but not A, and only C. We know that the server has forked A and B from the fact that B cannot see aaa. So A and B have seen different operation histories, and each has appended an operation to the history it saw, and remembered that operation. Thus, when C asks the server for the current history (before C creates ccc), the SUNDR server can show C A\u2019s fork of the history, B\u2019s fork, or perhaps the history as of before A\u2019s creation of aaa. As a result, after C creates ccc, ccc will be visible to A (but not B), to B (but not A), and to C alone, respectively.  ","type":"True\/False Questions"}
-{"instance_id":54,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":13,"points":5,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# X PBFT  \n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.  \n\n13. [5 points]:  \n\nPBFT chooses the primary for a view deterministically based on the view number. What could go wrong if PBFT were to use Raft\u2019s voting algorithm to select a primary for a view? (Briefly explain your answer.)  ","answer":"Answer: Raft\u2019s voting algorithm does not result in a single leader-per-term under byzantine faults. Consider a 7 node system with 2 Byzantine nodes. The nodes that vote for A for term T are A, B, C, D, and allow A to conclude it is leader. The nodes that vote for D for term T are E, F, G, D, and allow D to conclude it is leader. Of these, only D is Byzantine and has equivocated by voting for both A and D. All the other nodes may vote this way while acting non-byzantine. This results in two primaries for a single term and violates the assumptions that the rest of pbft builds on.  ","explanation":"Answer: Raft\u2019s voting algorithm does not result in a single leader-per-term under byzantine faults. Consider a 7 node system with 2 Byzantine nodes. The nodes that vote for A for term T are A, B, C, D, and allow A to conclude it is leader. The nodes that vote for D for term T are E, F, G, D, and allow D to conclude it is leader. Of these, only D is Byzantine and has equivocated by voting for both A and D. All the other nodes may vote this way while acting non-byzantine. This results in two primaries for a single term and violates the assumptions that the rest of pbft builds on.  ","type":"ShortAnswerQuestion"}
-{"instance_id":55,"test_paper_name":"6.5840 Distributed System Engineering: Spring 2024 Exam II","course":"Distributed System Engineering","year":2024,"problem_num":14,"points":5,"score_total":71,"score_max":71.0,"score_median":57.0,"score_avg":56.61,"score_standard_seviation":9.13,"problem":"# XI Bitcoin  \n\nSection 4 of Nakamoto\u2019s Bitcoin paper explains that the difficulty of mining is determined by the number of required leading zeros in the SHA-256 hash of the block. The paper also says that Bitcoin automatically varies the difficulty of mining (the number of required leading zeros) by observing the recent average rate of new block mining, relative to the target block every ten minutes; if blocks have been generated too quickly, the difficulty is increased; if too slowly, decreased. All honest Bitcoin peers use the same algorithm to determine the difficulty.  \n\nBen dreams of being able to buy tickets to the latest Taylor Swift concert. To obtain the money required, Ben has been running the Bitcoin peer software on his laptop, but he hasn\u2019t been earning mining rewards very quickly, because his laptop is only the winning miner very infrequently. Hoping to realize his dream faster, Ben modifies his copy of the Bitcoin peer software so that the difficulty determination algorithm always yields a low difficulty, with the result that his peer can mine new blocks very quickly, often before any other Bitcoin miner produces a given new block in the chain.  \n\n14. [5 points]: It turns out that Ben won\u2019t actually earn any bitcoins with this scheme.   \n    Explain why not.  ","answer":"Answer: Bitcoin peers that run correct software will check that any proposed new block has a hash with the expected number of leading zeros. Those peers are running the correct difficultydetermining algorithm, so they will reject Ben\u2019s blocks because their hashes have too few leading zeros.  ","explanation":"Answer: Bitcoin peers that run correct software will check that any proposed new block has a hash with the expected number of leading zeros. Those peers are running the correct difficultydetermining algorithm, so they will reject Ben\u2019s blocks because their hashes have too few leading zeros.  ","type":"ShortAnswerQuestion"}
-{"instance_id":56,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":1,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a\/b\n```\n\n1. [5 points]: How many inodes will xv6 allocate while executing these two commands? (Circle the one best answer.)  \n   A. 0   \n   B. 1   \n   C. 2   \n   D. 3  ","answer":"C","explanation":"Answer: C. A directory is implemented using an inode, and one inode is created for directory \u201ca\u201d and one is created for directory \u201cb\u201d.  ","type":"SingleChoice"}
-{"instance_id":57,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":2,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a\/b\n```\n\nAlyssa adds the statement:  \n\n```\nprintf(\"write: %d\\n\", b->blockno);\n```\n\nto xv6\u2019s log_write in log.c. She then makes a fresh fs.img, boots xv6, and runs the following command:  \n\n```\n$ mkdir a\nwrite: 33\nwrite: 33\nwrite: 45\nwrite: 770\nwrite: 770\nwrite: 33\nwrite: 770\nwrite: 33\nwrite: 46\nwrite: 32\nwrite: 32\n```\n\n2. [5 points]: What does block 770 contain? (Circle the one best answer.)  \n   A. directory entries   \n   B. inodes   \n   C. file data   \n   D. a bitmap  ","answer":"A","explanation":"Answer: A. Block 770 is a data block and data blocks of directories contain directory entries.  ","type":"SingleChoice"}
-{"instance_id":58,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":3,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a\/b\n```\n\nBen makes a fresh fs.img, boots xv6, and runs a program that makes the following system call:  \n\n```\nsymlink(\"b\", \"b\");  \n```\n\nFrom the shell he then runs:  \n\n```\n$ cat b\n```\n\n3. [5 points]: What will the result of the cat be? (Circle the one best answer.)  \n   A. \u201cb\u201d \n   B. an error because \u201cb\u201d doesn\u2019t exist \n   C. an error because \u201cb\u201d points to itself \n   D. nothing because xv6 will panic  ","answer":"C","explanation":"Answer: C. When the kernel resolves the symlink \u201cb\u201d in open, it will find the symlink \u201cb\u201d. The fs lab requires your solution detects this cycle and return an error.  ","type":"SingleChoice"}
-{"instance_id":59,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":4,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# II EXT3  \n\nRecall the Linux EXT3 journaling file system from Journaling the Linux ext2fs Filesystem and Lecture 15.   \nThe paper\u2019s \u201cext2fs\u201d is the same as EXT3.  \n\nSuppose that the current compound transaction has just closed (see step 1 on the paper\u2019s page 6) and is starting to commit.  \n\n4. [5 points]: How long must new file-system system calls wait until they can start executing? (Circle the one best answer.)  \n   A. New system calls can start immediately.   \n   B. New system calls must wait until all system calls in the just-closed transaction have completed.   \n   C. New system calls must wait until the just-closed transaction has started to write journal blocks to the journal.   \n   D. New system calls cannot start until the just-closed transaction has finished committing to the journal. \n   E. New system calls cannot start until all updated buffers from the just-closed transaction have been synced to their homes on disk.  ","answer":"B","explanation":"Answer: B. The delay serves to prevent partial modifications made by system calls in the next transaction from being seen by system calls that are finishing in the first transaction.  ","type":"SingleChoice"}
-{"instance_id":60,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":5,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# II EXT3  \n\nRecall the Linux EXT3 journaling file system from Journaling the Linux ext2fs Filesystem and Lecture 15.   \nThe paper\u2019s \u201cext2fs\u201d is the same as EXT3.  \n\nHatshepsut is building an application on Linux that creates a set of directories, and she would like the set of creations to be atomic with respect to crashes. She\u2019s using the EXT3 file system. She experiments with this application code:  \n\n```\nint main() {\n    mkdir(\"\/aaa\", 0777);\n    mkdir(\"\/zzz\", 0777);\n    exit(0);\n}\n```\n\n(The 0777 is needed for Linux, though not for xv6; it does not affect this question.)  \n\nHatshepsut runs this program. Both calls to mkdir() return success. Hatshepsut causes her computer to crash just after the program exits. Then she re-starts the computer, which runs the EXT3 recovery program.  \n\n5. [5 points]: What could Hatshepsut see after recovery? (Circle all that apply.)  \n   A. She might see neither \/aaa nor \/zzz.   \n   B. She might see \/aaa but not \/zzz.   \n   C. She might see \/zzz but not \/aaa.   \n   D. She might see both \/zzz and \/aaa.   \n   E. None of the above.  ","answer":"A,B,D","explanation":"Answer: A, B, and D. A can occur if the system crashed before the transaction(s) reflecting the mkdir\u2019s finished committing. B can occur if the two mkdirs are in different transactions, and only the first manages to finish committing. C cannot occur because system calls are placed in transactions in order, and the transactions are also replayed in order during recovery.  ","type":"MultipleChoice"}
-{"instance_id":61,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":6,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# III VM primitives  \n\nBelow is a code fragment illustrating how a user program can implement a large table of square roots with Linux VM primitives while using little physical memory. (The full code presented in lecture is in the appendix of this quiz.)  \n\n```\n1 static size_t page_size;\n2 #define MAX_SQRTS (1 << 27) \/\/ Maximum limit on sqrt table entries\n3\n4 static double *sqrts;\n5\n6 \/\/ The page handler catching page faults\n7 static void\n8 handle_sigsegv(int sig, siginfo_t *si, void *ctx)\n9 {\n10 \tuintptr_t fault_addr = (uintptr_t)si->si_addr;\n11 \tdouble *page_base = (double * )align_down(fault_addr, page_size);\n12 \tstatic double *last_page_base = NULL;\n13\n14 \tif (last_page_base && munmap(last_page_base, page_size) == -1) {\n15 \t\tfprintf(stderr, \"Couldn\u2019t munmap(); %s\\n\", strerror(errno));\n16 \texit(EXIT_FAILURE);\n17 }\n18\n19 if (mmap(page_base, page_size, PROT_READ | PROT_WRITE,\n20 \t\t\tMAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {\n21 \tfprintf(stderr, \"Couldn\u2019t mmap(); %s\\n\", strerror(errno));\n22 \texit(EXIT_FAILURE);\n23 }\n24\n25 \tcalculate_sqrts(page_base, page_base - sqrts, page_size \/ sizeof(double));\n26 \tlast_page_base = page_base;\n27 }\n28\n29 \/\/ Simplified version of the test function\n30 static void\n31 test_sqrt_region(void)\n32 {\n33 \tint i, pos;\n34 \tdouble s;\n35\n36 \t\/\/ Find a sufficiently-large unused range of virtual addresses, and\n37 \t\/\/ sets sqrts to the start.\n38 \tsetup_sqrt_region();\n39\n40 \t\/\/ look up some numbers in the sqrt table\n41 \tfor (i = 0; i < 8192; i++) {\n42 \t\ts = sqrts[i];\n43 \t\tprintf(\"sqrt %f\", s);\n44 \t}\n45 }\n```\n\nAssume size of double is 8 bytes and page size is 4096 bytes.  \n\n6. [5 points]: Assume the sqrts table occupies 0 pages of physical memory after the return from setup sqrt region. How many pages of physical memory does the sqrts table occupy when test sqrt region returns? (You can ignore physical memory pages used for the page table itself.) (Circle the one best answer.)  \n   A. 0   \n   B. 1   \n   C. 1000   \n   D. $\\left( \\left( 1 \\ll 2 7 \\right) ^ { \\ast } 8 \\right) \/ 4 0 9 6$  ","answer":"B","explanation":"Answer: B. The page fault handler uses only 1 page. If it maps a new page, it unmaps the old page.  ","type":"SingleChoice"}
-{"instance_id":62,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":7,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# III VM primitives  \n\nBelow is a code fragment illustrating how a user program can implement a large table of square roots with Linux VM primitives while using little physical memory. (The full code presented in lecture is in the appendix of this quiz.)  \n\n```\n1 static size_t page_size;\n2 #define MAX_SQRTS (1 << 27) \/\/ Maximum limit on sqrt table entries\n3\n4 static double *sqrts;\n5\n6 \/\/ The page handler catching page faults\n7 static void\n8 handle_sigsegv(int sig, siginfo_t *si, void *ctx)\n9 {\n10 \tuintptr_t fault_addr = (uintptr_t)si->si_addr;\n11 \tdouble *page_base = (double * )align_down(fault_addr, page_size);\n12 \tstatic double *last_page_base = NULL;\n13\n14 \tif (last_page_base && munmap(last_page_base, page_size) == -1) {\n15 \t\tfprintf(stderr, \"Couldn\u2019t munmap(); %s\\n\", strerror(errno));\n16 \texit(EXIT_FAILURE);\n17 }\n18\n19 if (mmap(page_base, page_size, PROT_READ | PROT_WRITE,\n20 \t\t\tMAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {\n21 \tfprintf(stderr, \"Couldn\u2019t mmap(); %s\\n\", strerror(errno));\n22 \texit(EXIT_FAILURE);\n23 }\n24\n25 \tcalculate_sqrts(page_base, page_base - sqrts, page_size \/ sizeof(double));\n26 \tlast_page_base = page_base;\n27 }\n28\n29 \/\/ Simplified version of the test function\n30 static void\n31 test_sqrt_region(void)\n32 {\n33 \tint i, pos;\n34 \tdouble s;\n35\n36 \t\/\/ Find a sufficiently-large unused range of virtual addresses, and\n37 \t\/\/ sets sqrts to the start.\n38 \tsetup_sqrt_region();\n39\n40 \t\/\/ look up some numbers in the sqrt table\n41 \tfor (i = 0; i < 8192; i++) {\n42 \t\ts = sqrts[i];\n43 \t\tprintf(\"sqrt %f\", s);\n44 \t}\n45 }\n```\n\nAssume size of double is 8 bytes and page size is 4096 bytes.  \n\n[5 points]: How many total page faults will the repeated execution of line 42 cause? (Circle the one best answer.)  \nA. 0   \nB. 1   \nC. 2   \nD. 16   \nE. 8192  ","answer":"D","explanation":"Answer: D. The loop goes through the first 8192 entries in the sqrts table. A double is 8 bytes and 512 fit on a single page of 4096 bytes $( 4 0 9 6 \/ 8 = 5 1 2 )$ . Thus, the total number of virtual pages referenced in the loop is $8 1 9 2 \/ 5 1 2 = 1 6$ . The page fault handler will be invoked once for each of the 16 pages.  ","type":"SingleChoice"}
-{"instance_id":63,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":8,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# IV L4Linux  \n\nConsider The Performance of $\\mu$ -Kernel-Based Systems, by Ha\u00a8rtig et al., along with Lecture 17.   \nSuppose that an sh Linux process running under $\\mathrm { L } ^ { 4 }$ Linux performs a fork().  \n\n8. [5 points]: Which of the following are true? (Circle all that apply.)  \n   A. The L4 kernel\u2019s fork() system call copies the sh process\u2019s memory.   \n   B. When the Linux kernel server task has finished executing the system call implementation, it executes the x86 equivalent of RISC-V sret to return to the sh process.   \n   C. When the Linux kernel server task returns to the newly created child process, the Linux kernel changes the hardware page table register (equivalent of RISC-V satp) to point to the child process\u2019s page table.  \n   D. Copy-on-write fork() is not possble for $\\mathrm { L } ^ { 4 }$ Linux because the CPU delivers page faults to the L4 kernel, not to the Linux kernel task.   \n   E. None of the above.  ","answer":"E","explanation":"Answer: E. Not A: fork is a Linux system call, implemented by the Linux kernel server, not by the L4 kernel. Not B: Linux processes communicate with the Linux server via IPC messages, not by system call traps. Not C: The Linux kernel server is not privileged, and cannot modify the page table register; only the L4 kernel can do this. Not D: the L4 kernel forwards page faults to the Linux kernel server.  ","type":"MultipleChoice"}
-{"instance_id":64,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":9,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# V RedLeaf  \n\nConsider RedLeaf: Isolation and Communication in a Safe Operating System by Narayanan et al.  \n\n9. [5 points]: Which of the following are true statements about RedLeaf\u2019s design? (Circle all that apply.)  \n   A. Because the RedLeaf microkernel and domains run in a single address space, a domain can read any kernel memory by dereferencing a Rust pointer.   \n   B. User programs can avoid data copies by passing pointers to their private memory to other user programs.   \n   C. Two domains can have a Rust pointer to an object on the shared heap at the same time.   \n   D. The rv6 file system can be modified to support memory-mapped files using the same ideas as in the mmap lab without modifying the RedLeaf microkernel.   \n   E. A divide-by-zero error in the network domain won\u2019t crash the rv6 file system.   \n   F. None of the above.  ","answer":"C,E","explanation":"Answer: C and E. A is false because RedLeaf uses language techniques that disallow domains to dereferences arbitrary addresses. B is false, because RedLeaf explicitly disallows this; only pointers in the shared heap can be passed to other domains. C is true, because RedLeaf follows the Rust borrow rules, which allow two domains to have an immutable reference to the same object. D is false, because RedLeaf doesn\u2019t use page tables but relies on language techniques for isolation; the mmap lab requires the use of page tables. E is true, because RedLeaf is designed to catch errors like these and clean up a domain that experience such an error.  ","type":"MultipleChoice"}
-{"instance_id":65,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":10,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VI Networking lecture\/reading  \n\nConsider Eliminating Receive Livelock in an Interrupt-driven Kernel, by Mogul et al., and Lecture 20.  \n\nBen implements the paper\u2019s polling design (Section 6.4), in which the NIC interrupt handler just wakes up the polling thread. However, Ben\u2019s implementation leaves NIC interrupts enabled (in contrast to Section 6.4, which specifies that they be disabled until the polling thread is done).  \n\nBen\u2019s computer has just one CPU (i.e. just a single core).  \n\n10. [5 points]: What will Ben observe as the rate of packet arrivals increases? (Circle the one best answer.)  A. He won\u2019t see livelock, because the interrupt handler doesn\u2019t process the packets; only the polling thread handles the packets.   \n    B. He won\u2019t see livelock, because the polling design eliminates the IP input queue, which was the point at which packets were discarded in the old design.   \n    C. He will see livelock, because at high enough arrival rates the CPU will spend all its time in the polling thread.   \n    D. He will see livelock, because at high enough arrival rates the CPU will spend all its time in the interrupt handler.   \n    E. He will see livelock, because the polling thread can only process packets at some finite rate, and the input rate could be higher than that.  ","answer":"D","explanation":"Answer: D.  \n\nZoe\u2019s xv6 computer has a UART that has no limit on how fast it can transmit bytes. The UART interrupts once per byte transmitted, to indicate that it has finished transmitting the byte. Zoe has a program whose standard output (file descriptor 1) is connected to the xv6 console, which uses the UART; the program sends bytes as fast as it can:  \n\n```\nwhile(1){\n    char c = \u2019x\u2019;\n    write(1, &c, 1);\n}\n```\n\nZoe\u2019s computer has just one CPU (i.e. just a single core).  ","type":"SingleChoice"}
-{"instance_id":66,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":11,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VI Networking lecture\/reading  \n\nConsider Eliminating Receive Livelock in an Interrupt-driven Kernel, by Mogul et al., and Lecture 20.  \n\nBen implements the paper\u2019s polling design (Section 6.4), in which the NIC interrupt handler just wakes up the polling thread. However, Ben\u2019s implementation leaves NIC interrupts enabled (in contrast to Section 6.4, which specifies that they be disabled until the polling thread is done).  \n\nZoe\u2019s computer has just one CPU (i.e. just a single core).  \n\n11. [5 points]: Could this program cause interrupt livelock due to the CPU spending all its time in the UART interrupt handler, and thus no time executing Zoe\u2019s program? Explain briefly.  ","answer":"Answer: No. The UART interrupts just once for each call to write(). There can\u2019t be more than a brief period of time in which UART interrupts prevent Zoe\u2019s program from running, because the UART driver will soon run out of bytes to transmit.  ","explanation":"Answer: No. The UART interrupts just once for each call to write(). There can\u2019t be more than a brief period of time in which UART interrupts prevent Zoe\u2019s program from running, because the UART driver will soon run out of bytes to transmit.  ","type":"ShortAnswerQuestion"}
-{"instance_id":67,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":12,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VII Meltdown  \n\nBelow is Listing 2 of the paper Meltdown: reading kernel memory from user space by Lipp et al., written in a C-like notation instead of x86 assembly.  \n\n```\n1 char buf[8192]\n2\n3 \/\/ The flush part of Flush+Reload\n4 cflush buf[0]\n5 cflush buf[4096]\n6\n7 \/\/ The core attack from listing 2\n8 r1 = 0x79cbcc0 \/\/ a kernel virtual address\n9 r2 = *r1\n10 r2 = r2 & 1\n11 r2 = r2 * 4096\n12 r3 = buf[r2]\n```\n\n12. [5 points]: Which of the following are true statements? (Circle all that apply.)  \n    A. In Linux as described in the paper, page tables of user programs map all of kernel memory.   \n    B. Loading the value at kernel address 0x79cbcc0 on line 9 will lead to an exception.   \n    C. If the attack succeeds, then buf[0] will be in the L1 cache if the low bit of the value at address 0x79cbcc0 is a 0.   \n    D. One reason why one run of Meltdown might not succeed is because buf[0] maybe evicted from the L1 cache before the attack can measure its presence using Reload.   \n    E. The Meltdown attack on xv6 wouldn\u2019t be able to dump all of xv6 kernel memory because like KAISER the xv6 kernel and user processes have separate page tables.   \n    F. None of the above.  ","answer":"A,B,C,D,E","explanation":"Answer: A, B, C, D, and E. B is true because eventually the CPU will generate an exception, perhaps after speculating on a few instructions. $\\mathbf { E }$ is true, because xv6 has separate kernel and user page tables, and the user page tables don\u2019t map all of kernel memory.  ","type":"MultipleChoice"}
-{"instance_id":68,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":13,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":"# VIII RCU  \n\nBen has a Linux kernel that uses RCU as described in RCU Usage In the Linux Kernel: One Decade Later, by McKenney et al. He modifies udp sendmsg() in the paper\u2019s Figure 6, adding a call to new function() on line 8, so that the code now reads:  \n\n```\n1 void udp_sendmsg(sock_t *sock, msg_t *msg)\n2 {\n3 \tip_options_t *opts;\n4 \tchar packet[];\n5 \tcopy_msg(packet, msg);\n6 \trcu_read_lock();\n7 \topts = rcu_dereference(sock->opts);\n8 \tnew_function(); \/\/*** Ben adds this line. ***\n9 \tif (opts != NULL)\n10 \t\tcopy_opts(packet, opts);\n11 \trcu_read_unlock();\n12 \tqueue_packet(packet);\n13 }\n14 void setsockopt(sock_t *sock, int opt, void *arg)\n15 {\n16 \tif (opt == IP_OPTIONS) {\n17 \tip_options_t *old = sock->opts;\n18 \tip_options_t *new = arg;\n19 \trcu_assign_pointer(&sock->opts, new);\n20 \tif (old != NULL)\n21 \t\tcall_rcu(kfree, old);\n22 \treturn;\n23 \t}\n24 }\n```\n\nThis code is otherwise identical to the paper\u2019s Figure 6.  \n\nnew_function() performs a context switch (i.e., it calls the Linux equivalent of xv6\u2019s sleep() or yield()).  \n\n13. [5 points]: Ben has made a mistake. Explain a scenario in which something goes wrong with the Figure 6 code as a result of Ben\u2019s change.  ","answer":"Answer: Use-after-free. If new function() causes a context switch, then call rcu() could call kfree(old), and that memory could be re-allocated for something else and overwritten. But that\u2019s the same memory that opts points to on line 10, which would therefore copy the wrong data.  ","explanation":"Answer: Use-after-free. If new function() causes a context switch, then call rcu() could call kfree(old), and that memory could be re-allocated for something else and overwritten. But that\u2019s the same memory that opts points to on line 10, which would therefore copy the wrong data.  ","type":"ShortAnswerQuestion"}
-{"instance_id":69,"test_paper_name":"6.1810 Fall 2024 Quiz II Solutions","course":"Operating System Engineering","year":2024,"problem_num":14,"points":5,"score_total":70,"score_max":67.5,"score_median":49.0,"score_avg":49.88,"score_standard_seviation":11.07,"problem":" # VIII RCU  \n\nBen has a Linux kernel that uses RCU as described in RCU Usage In the Linux Kernel: One Decade Later, by McKenney et al. He modifies udp sendmsg() in the paper\u2019s Figure 6, adding a call to new function() on line 8, so that the code now reads:  \n\n```\n1 void udp_sendmsg(sock_t *sock, msg_t *msg)\n2 {\n3 \tip_options_t *opts;\n4 \tchar packet[];\n5 \tcopy_msg(packet, msg);\n6 \trcu_read_lock();\n7 \topts = rcu_dereference(sock->opts);\n8 \tnew_function(); \/\/*** Ben adds this line. ***\n9 \tif (opts != NULL)\n10 \t\tcopy_opts(packet, opts);\n11 \trcu_read_unlock();\n12 \tqueue_packet(packet);\n13 }\n14 void setsockopt(sock_t *sock, int opt, void *arg)\n15 {\n16 \tif (opt == IP_OPTIONS) {\n17 \tip_options_t *old = sock->opts;\n18 \tip_options_t *new = arg;\n19 \trcu_assign_pointer(&sock->opts, new);\n20 \tif (old != NULL)\n21 \t\tcall_rcu(kfree, old);\n22 \treturn;\n23 \t}\n24 }\n```\n\nThis code is otherwise identical to the paper\u2019s Figure 6.  \n\nnew_function() performs a context switch (i.e., it calls the Linux equivalent of xv6\u2019s sleep() or yield()).  \n\nNow Ben is working on the code in the RCU paper\u2019s Figure 7. He reasons that the kfree(local table) in retract_table() really belongs inside the critical section, so that the entire sequence is atomic. He moves that line, resulting in this code:  \n\n```\n...;\nspin_lock(&table_lock);\nlocal_table = table;\nrcu_assign_pointer(&table, NULL);\nkfree(local_table); \/\/ *** Ben moved this line. ***\nspin_unlock(&table_lock);\n...;\n```\n\n14. [5 points]: What problem is Ben\u2019s change likely to cause? (Circle the one best answer.)  \n    A. Ben\u2019s change could cause a deadlock.   \n    B. Ben\u2019s change could allow a context switch to occur just before the kfree() call, which would be illegal. \n    C. Ben\u2019s change could cause invoke_syscall() to dereference a pointer to freed memory.   \n    D. Ben\u2019s change could cause retract_table() to dereference a pointer to freed memory.  ","answer":"C","explanation":"Answer: C. Ben\u2019s modified retract table() frees local table before the call to synchronize_rcu(). An execution of invoke_systemcall() might be active at the same time on another CPU, and might read the old value of local table just after it has been freed and re-used for something else.  ","type":"SingleChoice"}
diff --git a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper_20250527.xlsx b/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper_20250527.xlsx
deleted file mode 100644
index f79e3dfe..00000000
Binary files a/benchmarks/course_exam_bench/data/benchmark/SystemTestPaper_20250527.xlsx and /dev/null differ
diff --git a/benchmarks/course_exam_bench/data/benchmark/example_bench_benchmark_timestamp.jsonl b/benchmarks/course_exam_bench/data/benchmark/example_bench_benchmark_timestamp.jsonl
deleted file mode 100644
index c4b375f5..00000000
--- a/benchmarks/course_exam_bench/data/benchmark/example_bench_benchmark_timestamp.jsonl
+++ /dev/null
@@ -1,3 +0,0 @@
-{"id": "platform_1", "sys_prompt": "You are an expert on Kusto. Help me write kusto query based on the user requirement. DO NOT explain anything, DO give KQL directly. You should strictly use the following format:\n##### KQL\n```\ncluster(#CLUSTER_NAME#).database(#DATABASE_NAME#).#TABLE_NAME# | where #COLUMN_NAME# xxx\n```", "user_prompt": "Query the nodes are stuck in Linux MOS under cluster IAD02PrdGPC01", "thinking": "chain of thought", "response": "let targetCluster = \"IAD02PrdGPC01\";\ncluster(\"azuredcm\").database(\"AzureDCMDb\").dcmInventoryMachines \n| where AzureClusterName == targetCluster\n| join kind=inner \n( \n    cluster('Oneboot').database('OneDhcp').OneDhcpApi\n    | where PreciseTimeStamp > ago(7d)\n    | where ApiName has 'SetDeviceOsType' \n    | parse ApiParams with \"name=\" MachineName \",osType=\" OSType \",osPath=\" OSPath\n) on MachineName\n| project PreciseTimeStamp, AzureClusterName, AzureNodeId, MachineName, OSType, OSPath\n| summarize arg_max(PreciseTimeStamp, *) by AzureNodeId, MachineName\n| where OSPath == \"LinuxMOS\" ", "metadata": {"scenario": "Example", "subtask": "Example", "data_quality":"high", "XXX": "XXX"}}
-{"id": "platform_2", "sys_prompt": "You are an expert on Kusto. Help me write kusto query based on the user requirement. DO NOT explain anything, DO give KQL directly. You should strictly use the following format:\n##### KQL\n```\ncluster(#CLUSTER_NAME#).database(#DATABASE_NAME#).#TABLE_NAME# | where #COLUMN_NAME# xxx\n```", "user_prompt": "Check the capacity of cluster SIN22PrdApp22, which is a non-GPU cluster", "thinking": "chain of thought", "response": "cluster(\"azcore.centralus\").database(\"AzureCP\").MycroftNodeHealthSnapshot\n| where PreciseTimeStamp  >= ago(1d)\n| where ClusterName == \"SIN22PrdApp22\"\n| summarize arg_max(PreciseTimeStamp, *) by ClusterName, NodeId\n| project PreciseTimeStamp, ClusterName, NodeId, NsdState, AvailabilityState, ContainerCount\n| summarize NodeCount = count() by NsdState, AvailabilityState, ContainerCount", "metadata": {"scenario": "Example", "subtask": "Example", "data_quality":"high", "XXX": "XXX"}}
-{"id": "platform_3", "sys_prompt": "You are an expert on Kusto. Help me write kusto query based on the user requirement. DO NOT explain anything, DO give KQL directly. You should strictly use the following format:\n##### KQL\n```\ncluster(#CLUSTER_NAME#).database(#DATABASE_NAME#).#TABLE_NAME# | where #COLUMN_NAME# xxx\n```", "user_prompt": "Check the capacity of cluster IAD02PrdGPC01, which is a GPU cluster", "thinking": "chain of thought", "response": "let cluster = \"IAD02PrdGPC01\";\ncluster(\"azcore.centralus\").database(\"AzureCP\").MycroftNodeSnapshot\n| where PreciseTimeStamp  >= ago(1d)\n| where ClusterName == cluster\n| summarize arg_max(PreciseTimeStamp, *) by ClusterName, NodeId\n| where MachinePoolName contains \"MP2\"\n| project PreciseTimeStamp, ClusterName, NodeId, MachinePoolName\n| join (\ncluster(\"azcore.centralus\").database(\"AzureCP\").MycroftNodeHealthSnapshot\n| where PreciseTimeStamp  >= ago(1d)\n| where ClusterName == cluster\n| summarize arg_max(PreciseTimeStamp, *) by ClusterName, NodeId\n| project PreciseTimeStamp, ClusterName, NodeId, NsdState, AvailabilityState, ContainerCount\n) on NodeId\n| summarize NodeCount = count() by NsdState, AvailabilityState, ContainerCount", "metadata": {"scenario": "Example", "subtask": "Example", "data_quality":"high", "XXX": "XXX"}}
diff --git a/benchmarks/course_exam_bench/data/benchmark/exams_metadata.json b/benchmarks/course_exam_bench/data/benchmark/exams_metadata.json
new file mode 100644
index 00000000..c30846ff
--- /dev/null
+++ b/benchmarks/course_exam_bench/data/benchmark/exams_metadata.json
@@ -0,0 +1,64 @@
+{
+  "exams": [
+    {
+      "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i",
+      "test_paper_name": "6.5840 Distributed System Engineering: Spring 2025 Exam I",
+      "course": "Distributed System Engineering",
+      "year": 2025,
+      "score_total": 65,
+      "score_max": 65.0,
+      "score_avg": 51.85,
+      "score_median": 52,
+      "score_standard_deviation": 6.7,
+      "num_questions": 11
+    },
+    {
+      "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii",
+      "test_paper_name": "6.5840 Distributed System Engineering: Spring 2025 Exam II",
+      "course": "Distributed System Engineering",
+      "year": 2025,
+      "score_total": 73,
+      "score_max": 71.5,
+      "score_avg": 55.95,
+      "score_median": 57.5,
+      "score_standard_deviation": 7.22,
+      "num_questions": 15
+    },
+    {
+      "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i",
+      "test_paper_name": "6.5840 Distributed System Engineering: Spring 2024 Exam I",
+      "course": "Distributed System Engineering",
+      "year": 2024,
+      "score_total": 97,
+      "score_max": 95.0,
+      "score_avg": 77.0,
+      "score_median": 78,
+      "score_standard_deviation": 9.72,
+      "num_questions": 15
+    },
+    {
+      "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii",
+      "test_paper_name": "6.5840 Distributed System Engineering: Spring 2024 Exam II",
+      "course": "Distributed System Engineering",
+      "year": 2024,
+      "score_total": 71,
+      "score_max": 71.0,
+      "score_avg": 56.61,
+      "score_median": 57,
+      "score_standard_deviation": 9.13,
+      "num_questions": 14
+    },
+    {
+      "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii",
+      "test_paper_name": "6.1810 Fall 2024 Quiz II",
+      "course": "Operating System Engineering",
+      "year": 2024,
+      "score_total": 70,
+      "score_max": 67.5,
+      "score_avg": 49.88,
+      "score_median": 49,
+      "score_standard_deviation": 11.07,
+      "num_questions": 14
+    }
+  ]
+}
\ No newline at end of file
diff --git a/benchmarks/course_exam_bench/data/benchmark/questions.jsonl b/benchmarks/course_exam_bench/data/benchmark/questions.jsonl
new file mode 100644
index 00000000..2bb8ed5d
--- /dev/null
+++ b/benchmarks/course_exam_bench/data/benchmark/questions.jsonl
@@ -0,0 +1,69 @@
+{"instance_id": 1, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 1, "points": 5, "problem": "# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a/b\n```\n\n1. [5 points]: How many inodes will xv6 allocate while executing these two commands? (Circle the one best answer.)  \n   A. 0   \n   B. 1   \n   C. 2   \n   D. 3  ", "answer": "C", "explanation": "Answer: C. A directory is implemented using an inode, and one inode is created for directory “a” and one is created for directory “b”.  ", "type": "SingleChoice"}
+{"instance_id": 2, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 2, "points": 5, "problem": "# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a/b\n```\n\nAlyssa adds the statement:  \n\n```\nprintf(\"write: %d\\n\", b->blockno);\n```\n\nto xv6’s log_write in log.c. She then makes a fresh fs.img, boots xv6, and runs the following command:  \n\n```\n$ mkdir a\nwrite: 33\nwrite: 33\nwrite: 45\nwrite: 770\nwrite: 770\nwrite: 33\nwrite: 770\nwrite: 33\nwrite: 46\nwrite: 32\nwrite: 32\n```\n\n2. [5 points]: What does block 770 contain? (Circle the one best answer.)  \n   A. directory entries   \n   B. inodes   \n   C. file data   \n   D. a bitmap  ", "answer": "A", "explanation": "Answer: A. Block 770 is a data block and data blocks of directories contain directory entries.  ", "type": "SingleChoice"}
+{"instance_id": 3, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 3, "points": 5, "problem": "# I The xv6 file system and lab fs  \n\nBen makes a fresh fs.img, boots xv6, and runs the following commands:  \n\n```\n$ mkdir a\n$ mkdir a/b\n```\n\nBen makes a fresh fs.img, boots xv6, and runs a program that makes the following system call:  \n\n```\nsymlink(\"b\", \"b\");  \n```\n\nFrom the shell he then runs:  \n\n```\n$ cat b\n```\n\n3. [5 points]: What will the result of the cat be? (Circle the one best answer.)  \n   A. “b” \n   B. an error because “b” doesn’t exist \n   C. an error because “b” points to itself \n   D. nothing because xv6 will panic  ", "answer": "C", "explanation": "Answer: C. When the kernel resolves the symlink “b” in open, it will find the symlink “b”. The fs lab requires your solution detects this cycle and return an error.  ", "type": "SingleChoice"}
+{"instance_id": 4, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 4, "points": 5, "problem": "# II EXT3  \n\nRecall the Linux EXT3 journaling file system from Journaling the Linux ext2fs Filesystem and Lecture 15.   \nThe paper’s “ext2fs” is the same as EXT3.  \n\nSuppose that the current compound transaction has just closed (see step 1 on the paper’s page 6) and is starting to commit.  \n\n4. [5 points]: How long must new file-system system calls wait until they can start executing? (Circle the one best answer.)  \n   A. New system calls can start immediately.   \n   B. New system calls must wait until all system calls in the just-closed transaction have completed.   \n   C. New system calls must wait until the just-closed transaction has started to write journal blocks to the journal.   \n   D. New system calls cannot start until the just-closed transaction has finished committing to the journal. \n   E. New system calls cannot start until all updated buffers from the just-closed transaction have been synced to their homes on disk.  ", "answer": "B", "explanation": "Answer: B. The delay serves to prevent partial modifications made by system calls in the next transaction from being seen by system calls that are finishing in the first transaction.  ", "type": "SingleChoice"}
+{"instance_id": 5, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 5, "points": 5, "problem": "# II EXT3  \n\nRecall the Linux EXT3 journaling file system from Journaling the Linux ext2fs Filesystem and Lecture 15.   \nThe paper’s “ext2fs” is the same as EXT3.  \n\nHatshepsut is building an application on Linux that creates a set of directories, and she would like the set of creations to be atomic with respect to crashes. She’s using the EXT3 file system. She experiments with this application code:  \n\n```\nint main() {\n    mkdir(\"/aaa\", 0777);\n    mkdir(\"/zzz\", 0777);\n    exit(0);\n}\n```\n\n(The 0777 is needed for Linux, though not for xv6; it does not affect this question.)  \n\nHatshepsut runs this program. Both calls to mkdir() return success. Hatshepsut causes her computer to crash just after the program exits. Then she re-starts the computer, which runs the EXT3 recovery program.  \n\n5. [5 points]: What could Hatshepsut see after recovery? (Circle all that apply.)  \n   A. She might see neither /aaa nor /zzz.   \n   B. She might see /aaa but not /zzz.   \n   C. She might see /zzz but not /aaa.   \n   D. She might see both /zzz and /aaa.   \n   E. None of the above.  ", "answer": "A,B,D", "explanation": "Answer: A, B, and D. A can occur if the system crashed before the transaction(s) reflecting the mkdir’s finished committing. B can occur if the two mkdirs are in different transactions, and only the first manages to finish committing. C cannot occur because system calls are placed in transactions in order, and the transactions are also replayed in order during recovery.  ", "type": "MultipleChoice"}
+{"instance_id": 6, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 6, "points": 5, "problem": "# III VM primitives  \n\nBelow is a code fragment illustrating how a user program can implement a large table of square roots with Linux VM primitives while using little physical memory. (The full code presented in lecture is in the appendix of this quiz.)  \n\n```\n1 static size_t page_size;\n2 #define MAX_SQRTS (1 << 27) // Maximum limit on sqrt table entries\n3\n4 static double *sqrts;\n5\n6 // The page handler catching page faults\n7 static void\n8 handle_sigsegv(int sig, siginfo_t *si, void *ctx)\n9 {\n10 \tuintptr_t fault_addr = (uintptr_t)si->si_addr;\n11 \tdouble *page_base = (double * )align_down(fault_addr, page_size);\n12 \tstatic double *last_page_base = NULL;\n13\n14 \tif (last_page_base && munmap(last_page_base, page_size) == -1) {\n15 \t\tfprintf(stderr, \"Couldn’t munmap(); %s\\n\", strerror(errno));\n16 \texit(EXIT_FAILURE);\n17 }\n18\n19 if (mmap(page_base, page_size, PROT_READ | PROT_WRITE,\n20 \t\t\tMAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {\n21 \tfprintf(stderr, \"Couldn’t mmap(); %s\\n\", strerror(errno));\n22 \texit(EXIT_FAILURE);\n23 }\n24\n25 \tcalculate_sqrts(page_base, page_base - sqrts, page_size / sizeof(double));\n26 \tlast_page_base = page_base;\n27 }\n28\n29 // Simplified version of the test function\n30 static void\n31 test_sqrt_region(void)\n32 {\n33 \tint i, pos;\n34 \tdouble s;\n35\n36 \t// Find a sufficiently-large unused range of virtual addresses, and\n37 \t// sets sqrts to the start.\n38 \tsetup_sqrt_region();\n39\n40 \t// look up some numbers in the sqrt table\n41 \tfor (i = 0; i < 8192; i++) {\n42 \t\ts = sqrts[i];\n43 \t\tprintf(\"sqrt %f\", s);\n44 \t}\n45 }\n```\n\nAssume size of double is 8 bytes and page size is 4096 bytes.  \n\n6. [5 points]: Assume the sqrts table occupies 0 pages of physical memory after the return from setup sqrt region. How many pages of physical memory does the sqrts table occupy when test sqrt region returns? (You can ignore physical memory pages used for the page table itself.) (Circle the one best answer.)  \n   A. 0   \n   B. 1   \n   C. 1000   \n   D. $\\left( \\left( 1 \\ll 2 7 \\right) ^ { \\ast } 8 \\right) / 4 0 9 6$  ", "answer": "B", "explanation": "Answer: B. The page fault handler uses only 1 page. If it maps a new page, it unmaps the old page.  ", "type": "SingleChoice"}
+{"instance_id": 7, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 7, "points": 5, "problem": "# III VM primitives  \n\nBelow is a code fragment illustrating how a user program can implement a large table of square roots with Linux VM primitives while using little physical memory. (The full code presented in lecture is in the appendix of this quiz.)  \n\n```\n1 static size_t page_size;\n2 #define MAX_SQRTS (1 << 27) // Maximum limit on sqrt table entries\n3\n4 static double *sqrts;\n5\n6 // The page handler catching page faults\n7 static void\n8 handle_sigsegv(int sig, siginfo_t *si, void *ctx)\n9 {\n10 \tuintptr_t fault_addr = (uintptr_t)si->si_addr;\n11 \tdouble *page_base = (double * )align_down(fault_addr, page_size);\n12 \tstatic double *last_page_base = NULL;\n13\n14 \tif (last_page_base && munmap(last_page_base, page_size) == -1) {\n15 \t\tfprintf(stderr, \"Couldn’t munmap(); %s\\n\", strerror(errno));\n16 \texit(EXIT_FAILURE);\n17 }\n18\n19 if (mmap(page_base, page_size, PROT_READ | PROT_WRITE,\n20 \t\t\tMAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) {\n21 \tfprintf(stderr, \"Couldn’t mmap(); %s\\n\", strerror(errno));\n22 \texit(EXIT_FAILURE);\n23 }\n24\n25 \tcalculate_sqrts(page_base, page_base - sqrts, page_size / sizeof(double));\n26 \tlast_page_base = page_base;\n27 }\n28\n29 // Simplified version of the test function\n30 static void\n31 test_sqrt_region(void)\n32 {\n33 \tint i, pos;\n34 \tdouble s;\n35\n36 \t// Find a sufficiently-large unused range of virtual addresses, and\n37 \t// sets sqrts to the start.\n38 \tsetup_sqrt_region();\n39\n40 \t// look up some numbers in the sqrt table\n41 \tfor (i = 0; i < 8192; i++) {\n42 \t\ts = sqrts[i];\n43 \t\tprintf(\"sqrt %f\", s);\n44 \t}\n45 }\n```\n\nAssume size of double is 8 bytes and page size is 4096 bytes.  \n\n[5 points]: How many total page faults will the repeated execution of line 42 cause? (Circle the one best answer.)  \nA. 0   \nB. 1   \nC. 2   \nD. 16   \nE. 8192  ", "answer": "D", "explanation": "Answer: D. The loop goes through the first 8192 entries in the sqrts table. A double is 8 bytes and 512 fit on a single page of 4096 bytes $( 4 0 9 6 / 8 = 5 1 2 )$ . Thus, the total number of virtual pages referenced in the loop is $8 1 9 2 / 5 1 2 = 1 6$ . The page fault handler will be invoked once for each of the 16 pages.  ", "type": "SingleChoice"}
+{"instance_id": 8, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 8, "points": 5, "problem": "# IV L4Linux  \n\nConsider The Performance of $\\mu$ -Kernel-Based Systems, by Ha¨rtig et al., along with Lecture 17.   \nSuppose that an sh Linux process running under $\\mathrm { L } ^ { 4 }$ Linux performs a fork().  \n\n8. [5 points]: Which of the following are true? (Circle all that apply.)  \n   A. The L4 kernel’s fork() system call copies the sh process’s memory.   \n   B. When the Linux kernel server task has finished executing the system call implementation, it executes the x86 equivalent of RISC-V sret to return to the sh process.   \n   C. When the Linux kernel server task returns to the newly created child process, the Linux kernel changes the hardware page table register (equivalent of RISC-V satp) to point to the child process’s page table.  \n   D. Copy-on-write fork() is not possble for $\\mathrm { L } ^ { 4 }$ Linux because the CPU delivers page faults to the L4 kernel, not to the Linux kernel task.   \n   E. None of the above.  ", "answer": "E", "explanation": "Answer: E. Not A: fork is a Linux system call, implemented by the Linux kernel server, not by the L4 kernel. Not B: Linux processes communicate with the Linux server via IPC messages, not by system call traps. Not C: The Linux kernel server is not privileged, and cannot modify the page table register; only the L4 kernel can do this. Not D: the L4 kernel forwards page faults to the Linux kernel server.  ", "type": "MultipleChoice"}
+{"instance_id": 9, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 9, "points": 5, "problem": "# V RedLeaf  \n\nConsider RedLeaf: Isolation and Communication in a Safe Operating System by Narayanan et al.  \n\n9. [5 points]: Which of the following are true statements about RedLeaf’s design? (Circle all that apply.)  \n   A. Because the RedLeaf microkernel and domains run in a single address space, a domain can read any kernel memory by dereferencing a Rust pointer.   \n   B. User programs can avoid data copies by passing pointers to their private memory to other user programs.   \n   C. Two domains can have a Rust pointer to an object on the shared heap at the same time.   \n   D. The rv6 file system can be modified to support memory-mapped files using the same ideas as in the mmap lab without modifying the RedLeaf microkernel.   \n   E. A divide-by-zero error in the network domain won’t crash the rv6 file system.   \n   F. None of the above.  ", "answer": "C,E", "explanation": "Answer: C and E. A is false because RedLeaf uses language techniques that disallow domains to dereferences arbitrary addresses. B is false, because RedLeaf explicitly disallows this; only pointers in the shared heap can be passed to other domains. C is true, because RedLeaf follows the Rust borrow rules, which allow two domains to have an immutable reference to the same object. D is false, because RedLeaf doesn’t use page tables but relies on language techniques for isolation; the mmap lab requires the use of page tables. E is true, because RedLeaf is designed to catch errors like these and clean up a domain that experience such an error.  ", "type": "MultipleChoice"}
+{"instance_id": 10, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 10, "points": 5, "problem": "# VI Networking lecture/reading  \n\nConsider Eliminating Receive Livelock in an Interrupt-driven Kernel, by Mogul et al., and Lecture 20.  \n\nBen implements the paper’s polling design (Section 6.4), in which the NIC interrupt handler just wakes up the polling thread. However, Ben’s implementation leaves NIC interrupts enabled (in contrast to Section 6.4, which specifies that they be disabled until the polling thread is done).  \n\nBen’s computer has just one CPU (i.e. just a single core).  \n\n10. [5 points]: What will Ben observe as the rate of packet arrivals increases? (Circle the one best answer.)  A. He won’t see livelock, because the interrupt handler doesn’t process the packets; only the polling thread handles the packets.   \n    B. He won’t see livelock, because the polling design eliminates the IP input queue, which was the point at which packets were discarded in the old design.   \n    C. He will see livelock, because at high enough arrival rates the CPU will spend all its time in the polling thread.   \n    D. He will see livelock, because at high enough arrival rates the CPU will spend all its time in the interrupt handler.   \n    E. He will see livelock, because the polling thread can only process packets at some finite rate, and the input rate could be higher than that.  ", "answer": "D", "explanation": "Answer: D.  \n\nZoe’s xv6 computer has a UART that has no limit on how fast it can transmit bytes. The UART interrupts once per byte transmitted, to indicate that it has finished transmitting the byte. Zoe has a program whose standard output (file descriptor 1) is connected to the xv6 console, which uses the UART; the program sends bytes as fast as it can:  \n\n```\nwhile(1){\n    char c = ’x’;\n    write(1, &c, 1);\n}\n```\n\nZoe’s computer has just one CPU (i.e. just a single core).  ", "type": "SingleChoice"}
+{"instance_id": 11, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 11, "points": 5, "problem": "# VI Networking lecture/reading  \n\nConsider Eliminating Receive Livelock in an Interrupt-driven Kernel, by Mogul et al., and Lecture 20.  \n\nBen implements the paper’s polling design (Section 6.4), in which the NIC interrupt handler just wakes up the polling thread. However, Ben’s implementation leaves NIC interrupts enabled (in contrast to Section 6.4, which specifies that they be disabled until the polling thread is done).  \n\nZoe’s computer has just one CPU (i.e. just a single core).  \n\n11. [5 points]: Could this program cause interrupt livelock due to the CPU spending all its time in the UART interrupt handler, and thus no time executing Zoe’s program? Explain briefly.  ", "answer": "Answer: No. The UART interrupts just once for each call to write(). There can’t be more than a brief period of time in which UART interrupts prevent Zoe’s program from running, because the UART driver will soon run out of bytes to transmit.  ", "explanation": "Answer: No. The UART interrupts just once for each call to write(). There can’t be more than a brief period of time in which UART interrupts prevent Zoe’s program from running, because the UART driver will soon run out of bytes to transmit.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 12, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 12, "points": 5, "problem": "# VII Meltdown  \n\nBelow is Listing 2 of the paper Meltdown: reading kernel memory from user space by Lipp et al., written in a C-like notation instead of x86 assembly.  \n\n```\n1 char buf[8192]\n2\n3 // The flush part of Flush+Reload\n4 cflush buf[0]\n5 cflush buf[4096]\n6\n7 // The core attack from listing 2\n8 r1 = 0x79cbcc0 // a kernel virtual address\n9 r2 = *r1\n10 r2 = r2 & 1\n11 r2 = r2 * 4096\n12 r3 = buf[r2]\n```\n\n12. [5 points]: Which of the following are true statements? (Circle all that apply.)  \n    A. In Linux as described in the paper, page tables of user programs map all of kernel memory.   \n    B. Loading the value at kernel address 0x79cbcc0 on line 9 will lead to an exception.   \n    C. If the attack succeeds, then buf[0] will be in the L1 cache if the low bit of the value at address 0x79cbcc0 is a 0.   \n    D. One reason why one run of Meltdown might not succeed is because buf[0] maybe evicted from the L1 cache before the attack can measure its presence using Reload.   \n    E. The Meltdown attack on xv6 wouldn’t be able to dump all of xv6 kernel memory because like KAISER the xv6 kernel and user processes have separate page tables.   \n    F. None of the above.  ", "answer": "A,B,C,D,E", "explanation": "Answer: A, B, C, D, and E. B is true because eventually the CPU will generate an exception, perhaps after speculating on a few instructions. $\\mathbf { E }$ is true, because xv6 has separate kernel and user page tables, and the user page tables don’t map all of kernel memory.  ", "type": "MultipleChoice"}
+{"instance_id": 13, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 13, "points": 5, "problem": "# VIII RCU  \n\nBen has a Linux kernel that uses RCU as described in RCU Usage In the Linux Kernel: One Decade Later, by McKenney et al. He modifies udp sendmsg() in the paper’s Figure 6, adding a call to new function() on line 8, so that the code now reads:  \n\n```\n1 void udp_sendmsg(sock_t *sock, msg_t *msg)\n2 {\n3 \tip_options_t *opts;\n4 \tchar packet[];\n5 \tcopy_msg(packet, msg);\n6 \trcu_read_lock();\n7 \topts = rcu_dereference(sock->opts);\n8 \tnew_function(); //*** Ben adds this line. ***\n9 \tif (opts != NULL)\n10 \t\tcopy_opts(packet, opts);\n11 \trcu_read_unlock();\n12 \tqueue_packet(packet);\n13 }\n14 void setsockopt(sock_t *sock, int opt, void *arg)\n15 {\n16 \tif (opt == IP_OPTIONS) {\n17 \tip_options_t *old = sock->opts;\n18 \tip_options_t *new = arg;\n19 \trcu_assign_pointer(&sock->opts, new);\n20 \tif (old != NULL)\n21 \t\tcall_rcu(kfree, old);\n22 \treturn;\n23 \t}\n24 }\n```\n\nThis code is otherwise identical to the paper’s Figure 6.  \n\nnew_function() performs a context switch (i.e., it calls the Linux equivalent of xv6’s sleep() or yield()).  \n\n13. [5 points]: Ben has made a mistake. Explain a scenario in which something goes wrong with the Figure 6 code as a result of Ben’s change.  ", "answer": "Answer: Use-after-free. If new function() causes a context switch, then call rcu() could call kfree(old), and that memory could be re-allocated for something else and overwritten. But that’s the same memory that opts points to on line 10, which would therefore copy the wrong data.  ", "explanation": "Answer: Use-after-free. If new function() causes a context switch, then call rcu() could call kfree(old), and that memory could be re-allocated for something else and overwritten. But that’s the same memory that opts points to on line 10, which would therefore copy the wrong data.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 14, "exam_id": "6_1810_operating_system_engineering_fall_2024_quiz_ii", "problem_num": 14, "points": 5, "problem": " # VIII RCU  \n\nBen has a Linux kernel that uses RCU as described in RCU Usage In the Linux Kernel: One Decade Later, by McKenney et al. He modifies udp sendmsg() in the paper’s Figure 6, adding a call to new function() on line 8, so that the code now reads:  \n\n```\n1 void udp_sendmsg(sock_t *sock, msg_t *msg)\n2 {\n3 \tip_options_t *opts;\n4 \tchar packet[];\n5 \tcopy_msg(packet, msg);\n6 \trcu_read_lock();\n7 \topts = rcu_dereference(sock->opts);\n8 \tnew_function(); //*** Ben adds this line. ***\n9 \tif (opts != NULL)\n10 \t\tcopy_opts(packet, opts);\n11 \trcu_read_unlock();\n12 \tqueue_packet(packet);\n13 }\n14 void setsockopt(sock_t *sock, int opt, void *arg)\n15 {\n16 \tif (opt == IP_OPTIONS) {\n17 \tip_options_t *old = sock->opts;\n18 \tip_options_t *new = arg;\n19 \trcu_assign_pointer(&sock->opts, new);\n20 \tif (old != NULL)\n21 \t\tcall_rcu(kfree, old);\n22 \treturn;\n23 \t}\n24 }\n```\n\nThis code is otherwise identical to the paper’s Figure 6.  \n\nnew_function() performs a context switch (i.e., it calls the Linux equivalent of xv6’s sleep() or yield()).  \n\nNow Ben is working on the code in the RCU paper’s Figure 7. He reasons that the kfree(local table) in retract_table() really belongs inside the critical section, so that the entire sequence is atomic. He moves that line, resulting in this code:  \n\n```\n...;\nspin_lock(&table_lock);\nlocal_table = table;\nrcu_assign_pointer(&table, NULL);\nkfree(local_table); // *** Ben moved this line. ***\nspin_unlock(&table_lock);\n...;\n```\n\n14. [5 points]: What problem is Ben’s change likely to cause? (Circle the one best answer.)  \n    A. Ben’s change could cause a deadlock.   \n    B. Ben’s change could allow a context switch to occur just before the kfree() call, which would be illegal. \n    C. Ben’s change could cause invoke_syscall() to dereference a pointer to freed memory.   \n    D. Ben’s change could cause retract_table() to dereference a pointer to freed memory.  ", "answer": "C", "explanation": "Answer: C. Ben’s modified retract table() frees local table before the call to synchronize_rcu(). An execution of invoke_systemcall() might be active at the same time on another CPU, and might read the old value of local table just after it has been freed and re-used for something else.  ", "type": "SingleChoice"}
+{"instance_id": 15, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 1, "points": 6, "problem": "# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our’s; you should circle the answer that is closest to what you think is correct.  \n\n1. [6 points]: Roughly when is the first time at which the sort application’s Reduce() function is called? Circle the best answer.  \n   - A. 0 seconds\n   - B. 50 seconds\n   - C. 150 seconds\n   - D. 300 seconds  ", "answer": "C", "explanation": "Answer: C. The best answer is 150 seconds. No Reduce function can be called until every Map function has finished; the top graph suggests that the Maps stop running around 150 seconds, and the paper text mentions 200 seconds.  ", "type": "SingleChoice"}
+{"instance_id": 16, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 2, "points": 7, "problem": "# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our’s; you should circle the answer that is closest to what you think is correct.  \n\n2. [7 points]: Roughly how long does it take a single application Reduce function to sort its share of the data (just the sort, not including either the shuffle or the writing of the output)? Circle the best answer.  \n\n- A. 10 seconds\n- B. 75 seconds\n- C. 200 seconds\n- D. 250 seconds\n- E. 650 seconds  \n- F. None of the above answers are correct.", "answer": "F", "explanation": "Answer: F. This question is broken: the application Reduce function does not sort the data. MapReduce’s reduce task framework does the sort, and (for this application) the application Reduce function just returns its argument.", "type": "SingleChoice"}
+{"instance_id": 17, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 3, "points": 6, "problem": "# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our’s; you should circle the answer that is closest to what you think is correct.  \n\n[6 points]: Why are there two bumps in the Shuffle graph? That is, why does the Shuffle graph go up and then down from time 20 to 200, remain at zero for 100 seconds, and then go up and then down from time 300 to 600? Circle the best answer.  \n\nA. There are more Map tasks $( \\mathbf { M } = 1 5 , 0 0 0 )$ than there are machines.   \nB. There are more Reduce tasks $\\begin{array} { r } { \\mathrm { R } = 4 0 0 0 \\mathrm { , } } \\end{array}$ ) than there are machines.   \nC. There are more Map tasks than there are Reduce tasks.   \nD. The aggregate network throughput is smaller than the aggregate disk throughput.   \nE. The Map tasks consume more CPU time than the Reduce tasks.  ", "answer": "B", "explanation": "Answer: B. The best answer is the second one (more Reduce tasks than machines). Intermediate data can only be moved from Map machines to Reduce machines for Reduce tasks that have been allocated to machines. There are only 1800 machines, so at first only 1800 of the 4000 Reduce tasks are assigned to machines, so only about 1800/4000ths of the shuffles can happen at first. That’s the first bump. The second bump starts once the first set of Reduce tasks finishes, moving intermediate data to the machines that will run the remaining Reduces.", "type": "SingleChoice"}
+{"instance_id": 18, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 4, "points": 7, "problem": "# I MapReduce  \n\nHave a look at Figure 3(a) in the paper MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat. The three graphs on the left show the rate of data movement over time for a MapReduce job that sorts a terabyte of data: the rate at which Maps read their input, the rate at which intermediate data is shuffled, and the rate at which Reduces write their output. For these questions you should assume that only this MapReduce job is using the servers and network, and that there are no failures. Many of the numbers below are derived from looking at the graphs, and are thus approximate; your reading of the graphs may be somewhat different from our’s; you should circle the answer that is closest to what you think is correct.  \n\n[7 points]: Why does the shuffle begin a long time before the Map phase has finished? Circle the best answer.  \n\nA. There are more Map tasks $( \\mathbf { M } = 1 5 , 0 0 0 )$ ) than there are machines.\nB. There are more Reduce tasks $\\begin{array} { r } { \\mathrm { { R } } = 4 0 0 0 \\mathrm { { } } } \\end{array}$ ) than there are machines.\nC. There are more Map tasks than there are Reduce tasks.\nD. The aggregate network throughput is smaller than the aggregate disk throughput.   \nE. The Map tasks consume more CPU time than the Reduce tasks.  ", "answer": "A", "explanation": "Answer: A. The best answer is the first one (more Map tasks than machines). Shuffles can start as soon as Map functions finish. The system runs 1800 Maps at a time; the first of these finishes a long time before the last of the 15,000 Maps finishes at time 200.  ", "type": "SingleChoice"}
+{"instance_id": 19, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 5, "points": 6, "problem": "# II Linearizability  \n\nThese questions concern the material from Lecture 4, Consistency and Linearizability.  \n\nYou have a service whose state is a single string, and that exposes two RPC operations to clients: one operation appends the RPC argument to the state, and the other RPC operation returns the current state. The timelines below indicate the start time, end time, argument string, and reply string for each client operation. Ax indicates an append operation with argument $\\mathbf { X }$ , and Ry indicates a read operation to which the server replied y. The vertical bars indicate the start and end times of each operation (the times at which the client sends the request, and receives the reply). The service’s state string starts out empty at the beginning of each history.  \n\nFor example,  \n\nC1:|---Ax---|\n\nC2:        |---Ay---|\n\nC3:    |--Ryx--|  \n\nmeans that client C1 sent an append RPC with “x” as the argument, C2 sent an append RPC with “y” as the argument, and C3 read the state and received the reply “yx”.  \n\nConsider this history, in which the reply string sent to C4 has been omitted:  \n\nC1:|---Ax---|  \nC2:    |---Ay---|  \nC3:        |---Az---|  \nC4:                |--R?--|  \n\n5. [6 points]: Which values could C4’s read yield that are consistent with linearizability? Circle all of the correct answers. \n   - A. xzy\n   - B. yxz\n   - C. yzx\n   - D. xy\n   - E. xz\n   - F. yx\n   - G. zy  ", "answer": "A,B,D,F", "explanation": "Answer: A,B,D,F. xzy, yxz, xy, and yx. The result C4 receives can’t start with z (since the Az starts after the Ax finishes); if both x and z appear, x must come first; and it must include both x and y (since Ax and Ay both finish before the C4’s read starts).  ", "type": "MultipleChoice"}
+{"instance_id": 20, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 6, "points": 7, "problem": "# II Linearizability  \n\nThese questions concern the material from Lecture 4, Consistency and Linearizability.  \n\nYou have a service whose state is a single string, and that exposes two RPC operations to clients: one operation appends the RPC argument to the state, and the other RPC operation returns the current state. The timelines below indicate the start time, end time, argument string, and reply string for each client operation. Ax indicates an append operation with argument $\\mathbf { X }$ , and Ry indicates a read operation to which the server replied y. The vertical bars indicate the start and end times of each operation (the times at which the client sends the request, and receives the reply). The service’s state string starts out empty at the beginning of each history.  \n\nFor example,  \n\nC1:|---Ax---|\n\nC2:        |---Ay---|\n\nC3:    |--Ryx--|  \n\nmeans that client C1 sent an append RPC with “x” as the argument, C2 sent an append RPC with “y” as the argument, and C3 read the state and received the reply “yx”.  \n\nConsider this history, in which the reply string sent to C4 has been omitted:  \n\nC1:|---Ax---|  \nC2:    |---Ay---|  \nC3:        |---Az---|  \nC4:                |--R?--|  \n\nNow look at this history:  \n\nC1: |-------Ax-------|  \nC2:         |---Ay---|  \nC3:     |--Ry--|  \nC4:             |----R?----|  \n\n6. [7 points]: Which values could C4’s read yield that are consistent with linearizability? Circle all of the correct answers. \n   - A. y \n   - B. x \n   - C. yx \n   - D. xy  ", "answer": "A,C", "explanation": "Answer: A,C. y and yx. The fact that C3 read y, and that C3’s read finished before C4’s read started, means that C4’s result must include y, and, if it includes x, the x must come after y.  ", "type": "MultipleChoice"}
+{"instance_id": 21, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 7, "points": 6, "problem": "# III GFS and Raft  \n\nAfter reading the GFS paper (The Google File System by Ghemawat et al.) and the Raft paper (Ongaro and Ousterhout’s In Search of an Understandable Consensus Algorithm (Extended Version)), Ben replaces the GFS master with a new coordinator that uses Raft. The Raft-based coordinator provides the same functions as before but replicates the log of operations using 3 Raft peers. All other parts of GFS stay the same.  \n\nWhich of the following statements are true? (Circle all that apply)  \n\n7. [6 points]:  \n\n- A. The coordinator can continue operation in the presence of network partitions without any additional monitoring infrastructure, if one partition with peers is able to achieve a majority.   \n- B. The coordinator can continue operation correctly even if one of the 3 peers has failed (and there are no other failures).   \n- C. None of the above are true  ", "answer": "A,B", "explanation": "Answer: Both A and B are true; these are properties of Raft.  ", "type": "MultipleChoice"}
+{"instance_id": 22, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 8, "points": 7, "problem": "# III GFS and Raft  \n\nAfter reading the GFS paper (The Google File System by Ghemawat et al.) and the Raft paper (Ongaro and Ousterhout’s In Search of an Understandable Consensus Algorithm (Extended Version)), Ben replaces the GFS master with a new coordinator that uses Raft. The Raft-based coordinator provides the same functions as before but replicates the log of operations using 3 Raft peers. All other parts of GFS stay the same.  \n\nWhich of the following statements are true? (Circle all that apply)  \n\nBen also considers using Raft for chunk replication. He runs many Raft clusters and has the GFS master assign chunks to a specific Raft cluster (i.e., each chunk is assigned to one Raft cluster, consisting of a leader and two followers). GFS clients submit write and append operations for a chunk to the leader of the Raft cluster for that chunk (i.e., Ben’s design doesn’t implement the separate data flow). The leader of the Raft cluster replicates write and append operation using the Raft library. All other parts of GFS (e.g., assigning leases to the leader, client caching locations of chunk servers, reading from the closest server, and so on) stay the same. (You can assume that chunk servers have enough disk space for operations to succeed.)  \n\nWhich of the following statements are true? (Circle all that apply)  \n\n8. [7 points]:  \n\n- A. Unlike the old design, Ben’s design can achieve linearizability for chunk operations.   \n- B. Unlike the old design, Ben’s design can continue operation despite the failure of one chunk server.   \n- C. By using Raft, Ben’s design allows clients to perform more mutating chunk operations per second than the old design.   \n- D. Raft’s snapshots allow a chunk server to catch up in a few seconds if has been down for a long time (assuming the same network as in the GFS paper).   \n- E. None of the above are true  ", "answer": "E", "explanation": "Answer: E. None of the above are true. A is false because the client’s cache that maps file names to chunk handles can yield stale results. B is false because the old design can continue despite one failure as well. C is false because Ben’s scheme moves data less efficiently (via the leader, rather than the separate data flow). D is false because the snapshot mechanism sends the leader’s entire database of chunks, which will likely take far longer than a few seconds.  ", "type": "MultipleChoice"}
+{"instance_id": 23, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 9, "points": 6, "problem": "# IV Raft  \n\nConsider the Raft paper (Ongaro and Ousterhout’s In Search of an Understandable Consensus Algorithm (Extended Version)). Ben wonders what the impact of network behavior is on Raft’s performance. Ben runs a Raft-replicated server that receives many client requests. If the network delivers AppendEntries RPCs in order, Ben’s Raft implementation is fast (i.e., completes many client requests per second). But, if the network delivers AppendEntries frequently out of order, Ben’s Raft implementation performs badly (i.e., completes fewer client requests per second). Using the rules in Figure 2 explain why this is the case.  \n\n9. [6 points]:  ", "answer": "Answer: This question is broken. Figure 2 implies that each AppendEntries should include all as-yet-unacknowledged log entries. So if there are two such RPCs outstanding, the one that was sent second contains a copy of all the log entries in the first. This means that, if the second RPC arrives first, it will be accepted. So it’s not clear why Ben would see any different performance due to out-of-order delivery.  ", "explanation": "Answer: This question is broken. Figure 2 implies that each AppendEntries should include all as-yet-unacknowledged log entries. So if there are two such RPCs outstanding, the one that was sent second contains a copy of all the log entries in the first. This means that, if the second RPC arrives first, it will be accepted. So it’s not clear why Ben would see any different performance due to out-of-order delivery.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 24, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 10, "points": 7, "problem": "# V Lab 3A-3C  \n\nAlyssa is implementing Raft as in Lab 3A-3C. She implements advancing the commitIndex at the leader (i.e., last bullet of Leaders in Fig 2) as follows:  \n\n```go\nfunc (rf *Raft) advanceCommit() {\n    start := rf.commitIndex + 1\n    if start < rf.log.start() { // on restart start could be 1\n    \tstart = rf.log.start()\n    }\n    for index := start; index <= rf.log.lastindex(); index++ {\n    \tif rf.log.entry(index).Term != rf.currentTerm { // 5.4\n    \t\tcontinue // ***\n        }\n        n := 1 // leader always matches\n        for i := 0; i < len(rf.peers); i++ {\n        \tif i != rf.me && rf.matchIndex[i] >= index {\n        \t\tn += 1\n            }\n        }\n        if n > len(rf.peers)/2 { // a majority?\n        \tDPrintf(\"%v: Commit %v\\n\", rf.me, index)\n        \trf.commitIndex = index\n        }\n    }\n}\n```\n\nAssume that all omitted parts of Alyssa’s code are correct.  \n\nBen argues that the line marked with “\\*\\*\\*” could be replaced by a break statement so that the loop terminates immediately.  \n\n10. [7 points]: Explain what could go wrong if one adopted Ben’s proposal; please include a specific sequence of events to illustrate your answer.  ", "answer": "Answer: If there’s a term mis-match, the leader won’t be able to commit any further log entries.   \nThe paper’s Figure 8e shows an example of such a scenario.  ", "explanation": "Answer: If there’s a term mis-match, the leader won’t be able to commit any further log entries.   \nThe paper’s Figure 8e shows an example of such a scenario.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 25, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 11, "points": 6, "problem": "# VI More lab 3A-3C  \n\nAlyssa is implementing Raft as in Lab 3A-3C. She implements the rule for conversion to follower in her AppendEntries RPC handler as shown below:  \n\n```go\nfunc (rf *Raft) convertToFollower(term int) {\n    rf.state = Follower\n    rf.votedFor = -1\n    rf.currentTerm = term\n    rf.persist()\n}\n```\n\n```go\nfunc (rf *Raft) AppendEntries(args * AppendEntriesArgs,\n                            reply *AppendEntriesReply) {\n    rf.mu.Lock()\n    defer rf.mu.Unlock()\n    if args.Term >= rf.currentTerm {\n        rf.convertToFollower(args.Term)\n    }\n    ...\n}\n```\n\nAssume that all omitted parts of Alyssa’s code are correct.  \n\n11. [6 points]: Describe a specific sequence of events that would cause Alyssa’s implementation to break the safety guarantees provided by Raft.  ", "answer": "Answer: The code shown can cause a peer to forget it has cast a vote for the current term. Suppose peer P1 has been elected for this term. The peers that elected it may forget that they voted for P1. Then some other peer P2 may become candidate for this term, and get votes from those forgetful peers, and become a second leader for the same term. This will lead to split brain.  ", "explanation": "Answer: The code shown can cause a peer to forget it has cast a vote for the current term. Suppose peer P1 has been elected for this term. The peers that elected it may forget that they voted for P1. Then some other peer P2 may become candidate for this term, and get votes from those forgetful peers, and become a second leader for the same term. This will lead to split brain.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 26, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 12, "points": 7, "problem": "# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems by Hunt, Konar, Junqueira, and Reed, and to the notes for Lecture 9.  \n\nThe code fragments below are simplified versions of how something like GFS or MapReduce might use ZooKeeper to elect a coordinator, and for that coordinator to store state such as the assignments of GFS data to chunkservers.  \n\nSuppose server S1 executes the following code to become elected and to then store coordinator state in /A and /B. Initially, znode /coord-lock does not exist, znode /A starts out containing A0, and znode /B starts out containing B0.  \n\n```\n    s = openSession()\n    if create(s, \"/coord-lock\", data=\"S1\", ephemeral=true) == true:\n        setData(s, \"/A\", \"A1\", version=-1)\n        setData(s, \"/B\", \"B1\", version=-1)\n```\n\n\n\n12. [7 points]: Briefly explain why, for coordinator election, it makes sense that /coord-lock should be an ephemeral znode rather than a regular znode.  ", "answer": "Answer: If a server is elected as coordinator, and then fails, ZooKeeper automatically deletes the ephemeral /coord-lock; now another server can create that file and become coordinator.  ", "explanation": "Answer: If a server is elected as coordinator, and then fails, ZooKeeper automatically deletes the ephemeral /coord-lock; now another server can create that file and become coordinator.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 27, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 13, "points": 6, "problem": "# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems by Hunt, Konar, Junqueira, and Reed, and to the notes for Lecture 9.  \n\nThe code fragments below are simplified versions of how something like GFS or MapReduce might use ZooKeeper to elect a coordinator, and for that coordinator to store state such as the assignments of GFS data to chunkservers.  \n\nSuppose server S1 executes the following code to become elected and to then store coordinator state in /A and /B. Initially, znode /coord-lock does not exist, znode /A starts out containing A0, and znode /B starts out containing B0.  \n\n```\n    s = openSession()\n    if create(s, \"/coord-lock\", data=\"S1\", ephemeral=true) == true:\n        setData(s, \"/A\", \"A1\", version=-1)\n        setData(s, \"/B\", \"B1\", version=-1)\n```\n\nS1’s create() finishes and returns true to indicate success. But just after that, and before ZooKeeper has received S1’s setData() requests, ZooKeeper decides that S1 has failed, and ZooKeeper terminates S1’s session.  \n\nAfter ZooKeeper terminates S1’s session, server S2 runs this to become coordinator:  \n\n```\ns = openSession()\nif create(s, \"/coord-lock\", data=\"S2\", ephemeral=true) == true:\n    setData(s, \"/A\", \"A2\", version=-1)\n    setData(s, \"/B\", \"B2\", version=-1)\n```\n\nHowever, S1 is actually still alive, and it proceeds to send the two setData() requests, and they arrive at ZooKeeper.  \n\nThen client C1 reads /B and /A and sees B2 and A2, respectively.  \n\nNow a different client, C2, reads /B, and then reads /A. Both reads succeed.  \n\n13. [6 points]: Given the way ZooKeeper works, what can C2 observe? Circle all of the possible read results.  \n        /B /A   \n    A. B0 A0   \n    B. B0 A1   \n    C. B0 A2   \n    D. B2 A0   \n    E. B2 A1  ", "answer": "A,C", "explanation": "Answer: B0 A0 and B0 A2 are the only possible results. B0 is possible because, in the absence of other constraints, ZooKeeper can yield stale data to reads. A1 is never possible because ZooKeeper terminated S1’s session before ZooKeeper receive S1’s setData()s, so ZooKeeper ignore those setData()s. B2 A0 is not possible since, once ZooKeeper has revealed a write to a client, the “Linearizable writes” guarantee in Section 2.3 implies that all previous writes have been applied.  ", "type": "MultipleChoice"}
+{"instance_id": 28, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 14, "points": 7, "problem": "# VIII Grove  \n\nIn the ApplyReadonly function in Figure 7, Ben decides to delete the check for s.waitForCommitted() The new code is as as follows:  \n\n```\nfunc (s *Server) ApplyReadonly(op) Result {\n\ts.mutex.Lock()\n    if s.leaseExpiry > GetTimeRange().latest {\n        e := s.epoch\n        idx, res := s.stateLogger.LocalRead(op)\n        s.mutex.Unlock()\n        return res\n    } else {\n        s.mutex.Unlock()\n        return ErrRetry\n    }\n}\n```\n\n14. [7 points]: Explain why this modification can result in non-linearizable reads.  ", "answer": "Answer: If a Grove backup server reveals an update without waiting to ensure it has been committed, then it may reveal an uncommitted write. If the primary then fails, the backup whose database is used to recover may not have recent uncommitted writes. So the write may disappear, and other clients issuing strictly subsequent reads may not see that write. That would not be linearizable.  ", "explanation": "Answer: If a Grove backup server reveals an update without waiting to ensure it has been committed, then it may reveal an uncommitted write. If the primary then fails, the backup whose database is used to recover may not have recent uncommitted writes. So the write may disappear, and other clients issuing strictly subsequent reads may not see that write. That would not be linearizable.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 29, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_i", "problem_num": 15, "points": 6, "problem": "# IX Distributed Transactions  \n\nMouseGPT is designing a distributed transaction system using two-phase commit and two-phase locking, as discussed in Lecture 12 and Chapter 9 of the 6.033 reading. The goal is to provide serializable results. The question arises of what should happen if a participant computer crashes while in the PREPARED state for a transaction. MouseGPT thinks that all-or-nothing atomicity would be satisfied if such a transaction were completely forgotten. So MouseGPT designs the system so that if a participant computer crashes and restarts while it is in the PREPARED state for a transaction that it’s part of, the recovery software on that computer un-does any local modifications the interrupted transaction might have performed and releases its locks, and sends a network message to each other participant and to the TC to tell them to undo any changes made by the transaction and to release its locks.  \n\n15. [6 points]: Explain why MouseGPT’s plan would cause the system to produce nonserializable (incorrect) results.  ", "answer": "Answer: The TC may have decided to commit the transaction, and sent out COMMIT messages to the other participating workers, and they may have committed, and revealed committed results to other transactions. At that point, there is no way to back out of the transaction without violating serializability and atomicity.  ", "explanation": "Answer: The TC may have decided to commit the transaction, and sent out COMMIT messages to the other participating workers, and they may have committed, and revealed committed results to other transactions. At that point, there is no way to back out of the transaction without violating serializability and atomicity.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 30, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 1, "points": 5, "problem": "# I Spanner  \n\nThe intelligent computer HAL is using Spanner (as described in Spanner: Google’s GloballyDistributed Database by Corbett et al.) to store data. HAL notes that read/write transactions are being slowed down by Spanner’s commit-wait mechanism (see Section 4.2.1). HAL disables commit-wait in his Spanner installation; as a result, everything works just as described in the paper except that the coordinator leader does not wait until the timestamp $s$ is guaranteed to be in the past.  \n\nHAL uses just these three transactions:  \n\n```\nT1:\n    X=1\n    Y=1\nT2:\n    X=22\n    Y=22\nT3:\n    print X, Y\n```\n\nInitially, database records X and Y both have value 0. X and $\\mathrm { Y }$ are in different Spanner shards, managed by different Paxos groups. T1 and T2 are read/write transactions; T3 is a read-only transaction.  \n\nHAL starts T1; waits for Spanner to say that T1 has completed; starts T2, waits for Spanner to say that T2 has completed; then starts T3 and observes T3’s output.  \n\n1. [5 points]: Which outputs from T3 are possible? (For each statement, circle True or False.)  \n   A. True / False : 22, 22  \n   B. True / False : 1, 1  \n   C. True / False : 1, 22  \n   D. True / False : 0, 0  ", "answer": "A,B,D", "explanation": "Answer: A,B,D. 22,22, 1,1, and 0,0 are all possible; 1,22 is not. Omitting commit-wait means that either or both of T1 and T2 might commit with time-stamps later than the time-stamp that T3 chooses, so T3 might see the result of either T1 or T2, or neither. T3 can’t see 1,22 because both T1 and T2 do both their writes at the same timestamp, so T3 will either see both writes of one of the transactions, or neither.  ", "type": "MultipleChoice"}
+{"instance_id": 31, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 2, "points": 4, "problem": "# II Chardonnay  \n\nConsider the paper Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases, by Eldeeb et al.  \n\nA read/write Chardonnay transaction reads database record A, then reads B, and then writes C. The system is busy with other read/write transactions at the same time, some of which might also use A, B, and/or C.  \n\n2. [4 points]: In which situation will Chardonnay’s “dry run” mechanism yield the most benefit? (Circle the single best answer.)  \n   A. A is hot, B is cold.   \n   B. A is cold, B is hot.   \n   C. A is cold, B is cold.   \n   D. A is hot, B is hot.  \n\n“Cold” means used rarely. “Hot” means used by many transactions.  ", "answer": "A", "explanation": "Answer: A. Only the first answer (hot, cold) is correct. Chardonnay’s dry run mechanism helps avoid situations in which a transaction holds the lock for a record that other transactions need, while waiting to read a record from the disk. This situation arises when a read/write transaction uses a hot record followed by a cold record.  \n\n---\n", "type": "SingleChoice"}
+{"instance_id": 32, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 3, "points": 4, "problem": "# II Chardonnay  \n\nConsider the paper Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases, by Eldeeb et al.  \n\nA read/write Chardonnay transaction reads database record A, then reads B, and then writes C. The system is busy with other read/write transactions at the same time, some of which might also use A, B, and/or C.  \n\nA system that uses Chardonnay issues just these three transactions:  \n\n```\nT1:\n    X=1\nT2:\n    Y=1\nT3:\n    print X, Y\n```\n\nInitially, both database records (X and Y) start out with value 0. X and Y are in different ranges. T1 and T2 are read/write transactions. T3 is a read-only transaction (described in the paper’s Section 6). T3 does not use the the waiting idea described in the last paragraph of Section 6.2.  \n\nOne client starts T1. After T1 completes, another client starts T2. After T2 completes, a third client runs T3.  \n\nThis version of Chardonnay has a bug somewhere in its code, causing T3 to print the incorrect output 0,1.  \n\n3. [4 points]: Which of the following bugs is the most plausible explanation for T3 printing   \n   0,1? Circle the single most correct answer.  \n   A. The epoch server is stuck: it always returns the same epoch number, and never increases it.  \n   B. The epoch server is incrementing too quickly: more than once per 10 milliseconds.  \n   C. The epoch server is working correctly except it gave T2 an epoch that was too small.  \n   D. The epoch server is working correctly except it gave T2 an epoch that was too large.  ", "answer": "C", "explanation": "Answer: C. The third answer is correct. 0,1 is not a correct output because serializability requires that if T3 observes the results of T2, and T1 finished before T2 started, then T3 is required to also see the results of T1. If the epoch server gives T2 an epoch that’s less than T1’s epoch, and T3 and T1 run in the same epoch, then T3 will see T2’s $\\scriptstyle \\ Y = 1$ but not T1’s $\\scriptstyle \\mathrm { X = 1 }$ .  ", "type": "SingleChoice"}
+{"instance_id": 33, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 4, "points": 8, "problem": "# III FaRM  \n\nConsider the following statements about FaRM as described in No compromises: distributed transactions with consistency, availability, and performance. For each statement, circle True or False.  \n\n4. [8 points]:  \n\nTrue / False : Because FaRM uses primary-backup replication for a region (instead of Paxos), FaRM must reconfigure to remove a failed replica before FaRM can continue to use the region.  \n\nTrue / False : FaRM can use short leases (10ms by default) because it has communication and scheduling optimizations to renew leases quickly.  \n\nTrue / False : A transaction that modifies only one object will never abort.  \n\nTrue / False : Read-only transactions require only the validate step of the Commit phase in Figure 4.  ", "answer": "True,True,False,True", "explanation": "Answer: True, True, False, True. The first statement is true because FaRM requires a response from all replicas, thus it must reconfigure to remove the failed replica before it can continue with the affected shard. The third statement is false because another transaction may modify the one object causing this transaction’s validation phase to fail (because the other transaction will have incremented the object’s version number).  ", "type": "True/False Questions"}
+{"instance_id": 34, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 5, "points": 4, "problem": "# IV Ray  \n\nConsider the following Ray program, which creates a sqrt task task for each number in the list mylist. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:  \n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\ndef sqrts0(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n    \n    r = [ ]\n    for f in l:\n\t\tr.append(get(f)) # collect the result\n\treturn r\n\nprint(sqrts0(mylist)) # invoke sqrts0 with a list of numbers and print result\n```\n\nAssume Ray behaves in the way described in Ownership: a distributed futures system for finegrained tasks by Wang et al., and Ray is running on a cluster of computers.  \n\n5. [4 points]:  Will the sqrt computations complete in the order that sqrts0 appends to r? (Briefly explain your answer)  ", "answer": "Answer: No. The sqrt tasks run concurrently with each other, and may finish in an arbitrary order. All that is guaranteed is that the task has finished executing (at least once) by the time get(f) returns.  ", "explanation": "Answer: No. The sqrt tasks run concurrently with each other, and may finish in an arbitrary order. All that is guaranteed is that the task has finished executing (at least once) by the time get(f) returns.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 35, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 6, "points": 4, "problem": "# IV Ray  \n\nConsider the following Ray program, which creates a sqrt task task for each number in the list mylist. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:  \n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\ndef sqrts0(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n    \n    r = [ ]\n    for f in l:\n\t\tr.append(get(f)) # collect the result\n\treturn r\n\nprint(sqrts0(mylist)) # invoke sqrts0 with a list of numbers and print result\n```\n\nAlyssa creates a function sqrts1 whose body is the same as sqrts0, but is declared as a remote task. She then modifies the program to invoke many sqrts1’s, each with a large distinct, nonoverlapping slice of the number list. The code is as follows:  \n\n```\n@ray.remote\ndef sqrts1(n_list):\n    ...\n    # same code as sqrts0\n    ...\n    return r\n    \nf0 = sqrts1(mylist[...])\nf1 = sqrts1(mylist[...])\nf2 = sqrts1(mylist[...])\n...\nprint(get(f0))\nprint(get(f1))\n...\n```\n\n6. [4 points]:  Ben is worried that the above program creates so many sqrt tasks tasks that Ray will be bottle-necked by managing the tasks and the futures they yield. Briefly explain why Ray can manage many tasks in parallel for the above program?  ", "answer": "Answer: The worker machine that invokes sqrts1(...) is the owner of the metadata for the value returned by each sqrts1 call. The many workers that execute sqrts1() each independently own the metadata for their sqrt task’s, resulting in no one machine being required to manage all the sqrt tasks.  ", "explanation": "Answer: The worker machine that invokes sqrts1(...) is the owner of the metadata for the value returned by each sqrts1 call. The many workers that execute sqrts1() each independently own the metadata for their sqrt task’s, resulting in no one machine being required to manage all the sqrt tasks.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 36, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 7, "points": 5, "problem": "# V Memcache at Facebook  \n\nBen Bitdiddle runs a web site. Ben reads the paper Scaling Memcache at Facebook by Nishtala et al., and thinks that the design is too complex. So Ben decides to ignore the paper’s design: he doesn’t use leases, mcrouter, pools, etc. Ben uses only the mechanisms described below.  \n\nBen has just a single region, with some web servers, some memcache servers, and a single database server. Ben programs each of his web servers to use the following client code to read and write data:  \n\n```\nread(k):\n    if v = memcache_get(k) succeeds\n    \treturn v\n    else\n\t\treturn database_get(k)\n\nwrite(k, v):\n    database_put(k, v)\n    memcache_put(k, v)\n```\n\nNote that read() does not insert anything into memcache, and note that write() always inserts the new data into memcache, whether it was already cached or not. Ben knows this may be wasteful, since it may cause memcache to cache data that’s never read, but he doesn’t mind.  \n\n7. [5 points]: Sadly, Ben sees that read()s sometimes return stale data for a long time after the write() of a newer value has succeeded and returned. Explain how this could happen.  ", "answer": "Answer: If there are concurrent writes by different clients to the same key, the calls to database put() may execute in a different order that the calls to memcache put(), so that memcache and the database end up with different values. This condition can persistent for a long time: until the next time a client writes the same key.  ", "explanation": "Answer: If there are concurrent writes by different clients to the same key, the calls to database put() may execute in a different order that the calls to memcache put(), so that memcache and the database end up with different values. This condition can persistent for a long time: until the next time a client writes the same key.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 37, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 8, "points": 4, "problem": "# VI Lab 4  \n\nBen implements the RPC handlers and the applier in Lab 4 as follows. The RPC handlers for Get, Put, and Append take the following steps:  \n\nA. Submit a command to the Raft library via Start. The command includes the client ID, request ID, operation type, and arguments.   \nB. Loop to wait until the reply for that command to show up in the reply table, which maps from client IDs to the replies of clients’ latest requests. Each reply contains the request ID and the result to that request. If Raft’s leadership changes during the loop, return ErrWrongLeader.   \nC. Return the result stored in the reply table.  \n\nThe applier detail is irrelevant to this question and is shown on the next page.  \n\n8. [4 points]:  Ben observes that Get does not modify the application state. He changes Get’s RPC handler to read the key-value table and return immediately to the client the result. Does this implementation preserve linearizability? (Briefly explain your answer.)  ", "answer": "Answer: No. Get could return a stale result if Raft the leadership changes. For instance, if a client submits an Append to the old leader and succeeds, and then submits a Get to the new leader, the Get result could miss the appended value if the new leader handles the Get before applying the Append.  ", "explanation": "Answer: No. Get could return a stale result if Raft the leadership changes. For instance, if a client submits an Append to the old leader and succeeds, and then submits a Get to the new leader, the Get result could miss the appended value if the new leader handles the Get before applying the Append.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 38, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 9, "points": 4, "problem": "Ben implements the RPC handlers and the applier in Lab 4 as follows. The RPC handlers for Get, Put, and Append take the following steps:  \n\nA. Submit a command to the Raft library via Start. The command includes the client ID, request ID, operation type, and arguments.   \nB. Loop to wait until the reply for that command to show up in the reply table, which maps from client IDs to the replies of clients’ latest requests. Each reply contains the request ID and the result to that request. If Raft’s leadership changes during the loop, return ErrWrongLeader.   \nC. Return the result stored in the reply table.  \n\nThe applier takes the following steps:  \n\nD. Read a command from the apply channel.   \nE. De-duplicate the command with the reply table: if the request ID in the reply table for the client is greater than or equal to that in the command, then skip the command.   \nF. Apply the command and insert the result to the reply table.  \n\n9. [4 points]:  Separately from the previous change, Ben modifies his implementation to perform de-duplication early in the RPC handlers. Concretely, he removes step $\\mathbf { E }$ in the applier, and adds an additional step at the start of the RPC handlers (i.e., before step A) as follows:  \n   If the request ID in the reply table for the client is greater than or equal to that in the RPC arguments, return the result stored in the reply table.  \n   Does this implementation preserve linearizability? (Briefly explain your answer.)  ", "answer": "Answer: No. An operation could be applied twice if the client re-sends it before the first RPC is applied.  ", "explanation": "Answer: No. An operation could be applied twice if the client re-sends it before the first RPC is applied.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 39, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 10, "points": 8, "problem": "# VII AWS Lambda  \n\nConsider the guest lecture about the paper On-demand container loading in AWS Lambda by Brooker et al. For each of the following statements, indicate whether it is true or false.  \n\n10. [8 points]:  \n\nTrue / False $\\because$ AWS Lambda is attractive to customers because it allows them to run cloud computations without having to provision a machine.  \n\nTrue / False : Many containers of AWS Lambda customers don’t contain unique chunks because customers upload the same container multiple times.  \n\nTrue / False : AWS Lambda may deduplicate popular chunks less than unpopular chunks.  \n\nTrue / False $\\because$ AWS Lambdas use LRU-K to ensure that if many infrequently-used Lambdas are running at the same time, they don’t evict the chunks of frequently-used Lambdas.  ", "answer": "True,True,True,True", "explanation": "Answer: True, True, True, True. The third option is true because AWS does this to reduce the blast radius of popular chunks (see Section 3.3).  ", "type": "True/False Questions"}
+{"instance_id": 40, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 11, "points": 5, "problem": "# VIII Boki  \n\nConsider Figure 6(a) in Boki: Stateful Serverless Computing with Shared Logs by Jia and Witchel. The left column describes how Boki makes the execution of a workflow of serverless functions with database side-effects exactly-once.  \n\nAlyssa notices that if Boki reruns a workflow it will append a record to the workflow’s LogBook, even if an append of an earlier failed execution already logged the record. Alyssa proposes to change the pattern of append-read to read-append-read: that is, she modifies Boki to read before an append to see if the append already logged its record; if so, it uses the first value returned by the read and skips the subsequent append and read. (If not, Boki executes as before, doing an append followed by read.)  \n\nFor example, Alyssa changes write as follows:  \n\ndef write(table, key, val): tag $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ hashLogTag([ID, STEP]) # first read rec $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ logReadNext(tag: tag, minSeqnum: 0) # if no record, then append and read again if rec $\\scriptstyle = =$ None: logAppend([tags: [tag], data: [table, key, val]) rec $\\mathbf { \\Sigma } = \\mathbf { \\Sigma }$ logReadNext(tag: tag, minSeqnum: 0) rawDBWRITE(...) # same call as before $\\mathrm { S T E P } ~ = ~ \\mathrm { S T E P } ~ + ~ 1$  \n\n11. [5 points]:  \n\nAlyssa runs one workflow on her modified Boki. The workflow crashes during its execution and then restarts from the beginning and completes. With Alyssa’s modification will write preserve exactly-once semantics? (Briefly explain your answer.)  ", "answer": "Answer: It will preserve exactly-once semantics. In the case that logReadNext() returns something non-None initially, it will always return that same log record. So even if write() did a logAppend(), the final logReadNext() would have the same value as the logReadNext() that is executed before logAppend().  ", "explanation": "Answer: It will preserve exactly-once semantics. In the case that logReadNext() returns something non-None initially, it will always return that same log record. So even if write() did a logAppend(), the final logReadNext() would have the same value as the logReadNext() that is executed before logAppend().  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 41, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 12, "points": 6, "problem": "# IX SUNDR  \n\nConsider the straw-man design in the paper Secure Untrusted Data Repository (SUNDR) by Li et al.  \n\nUsers A, B, and C share a SUNDR server. The server may be malicious, though the server does not know any of the private keys. User A creates a new file aaa in the SUNDR file system. After that, user B looks for file aaa, but does not see the file. After that, user C creates a new empty file ccc.  \n\nThere is no client activity other than what is described here. None of the stronger consistency ideas from the paper’s Section 3.2 are in use. All three users are honest and run correct SUNDR client software.  \n\nAll three users now use the ls command to check whether they can see file ccc. All three users’ client SUNDR implementations report that the data they receive from SUNDR passes all validity checks. Nevertheless, a malicious SUNDR server can cause a number of different outcomes.  \n\n12. [6 points]: What combinations are possible for which users can see ccc? For each statement, circle True if the SUNDR server could cause the indicated results, and False if not.  \n\nTrue / False : All three users can see ccc.  \n\nTrue / False : Only A and B can see ccc, but not C.  \n\nTrue / False $\\because$ Only A and C can see ccc, but not B.  \n\nTrue / False $\\because$ Only B and C can see ccc, but not A.  \n\nTrue / False $\\because$ Only C can see ccc, but not A or B.  \n\nTrue / False $\\because$ None of the users can see ccc.  ", "answer": "False,False,True,True,True,False", "explanation": "Answer: False,False,True,True,True,False. The correct answers are A and C but not B, B and C but not A, and only C. We know that the server has forked A and B from the fact that B cannot see aaa. So A and B have seen different operation histories, and each has appended an operation to the history it saw, and remembered that operation. Thus, when C asks the server for the current history (before C creates ccc), the SUNDR server can show C A’s fork of the history, B’s fork, or perhaps the history as of before A’s creation of aaa. As a result, after C creates ccc, ccc will be visible to A (but not B), to B (but not A), and to C alone, respectively.  ", "type": "True/False Questions"}
+{"instance_id": 42, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 13, "points": 5, "problem": "# X PBFT  \n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.  \n\n13. [5 points]:  \n\nPBFT chooses the primary for a view deterministically based on the view number. What could go wrong if PBFT were to use Raft’s voting algorithm to select a primary for a view? (Briefly explain your answer.)  ", "answer": "Answer: Raft’s voting algorithm does not result in a single leader-per-term under byzantine faults. Consider a 7 node system with 2 Byzantine nodes. The nodes that vote for A for term T are A, B, C, D, and allow A to conclude it is leader. The nodes that vote for D for term T are E, F, G, D, and allow D to conclude it is leader. Of these, only D is Byzantine and has equivocated by voting for both A and D. All the other nodes may vote this way while acting non-byzantine. This results in two primaries for a single term and violates the assumptions that the rest of pbft builds on.  ", "explanation": "Answer: Raft’s voting algorithm does not result in a single leader-per-term under byzantine faults. Consider a 7 node system with 2 Byzantine nodes. The nodes that vote for A for term T are A, B, C, D, and allow A to conclude it is leader. The nodes that vote for D for term T are E, F, G, D, and allow D to conclude it is leader. Of these, only D is Byzantine and has equivocated by voting for both A and D. All the other nodes may vote this way while acting non-byzantine. This results in two primaries for a single term and violates the assumptions that the rest of pbft builds on.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 43, "exam_id": "6_5840_distributed_system_engineering_spring_2024_exam_ii", "problem_num": 14, "points": 5, "problem": "# XI Bitcoin  \n\nSection 4 of Nakamoto’s Bitcoin paper explains that the difficulty of mining is determined by the number of required leading zeros in the SHA-256 hash of the block. The paper also says that Bitcoin automatically varies the difficulty of mining (the number of required leading zeros) by observing the recent average rate of new block mining, relative to the target block every ten minutes; if blocks have been generated too quickly, the difficulty is increased; if too slowly, decreased. All honest Bitcoin peers use the same algorithm to determine the difficulty.  \n\nBen dreams of being able to buy tickets to the latest Taylor Swift concert. To obtain the money required, Ben has been running the Bitcoin peer software on his laptop, but he hasn’t been earning mining rewards very quickly, because his laptop is only the winning miner very infrequently. Hoping to realize his dream faster, Ben modifies his copy of the Bitcoin peer software so that the difficulty determination algorithm always yields a low difficulty, with the result that his peer can mine new blocks very quickly, often before any other Bitcoin miner produces a given new block in the chain.  \n\n14. [5 points]: It turns out that Ben won’t actually earn any bitcoins with this scheme.   \n    Explain why not.  ", "answer": "Answer: Bitcoin peers that run correct software will check that any proposed new block has a hash with the expected number of leading zeros. Those peers are running the correct difficultydetermining algorithm, so they will reject Ben’s blocks because their hashes have too few leading zeros.  ", "explanation": "Answer: Bitcoin peers that run correct software will check that any proposed new block has a hash with the expected number of leading zeros. Those peers are running the correct difficultydetermining algorithm, so they will reject Ben’s blocks because their hashes have too few leading zeros.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 44, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 1, "points": 5, "problem": "# I MapReduce  \n\nThe MapReduce paper (MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat) says in Section 3.1 that the intermediate key space is partitioned among the R reduce tasks using hash(key) mod R.  \n\nThea is running the word-count MapReduce job (pseudo-code in the paper’s Section 2.1) on a cluster with 10 worker machines. M is 20 and R is 40. There are no failures, the network is reliable, no machines are slower than expected, and there is no competing work on any of the machines or networks involved. The Map input is divided into 20 pieces of 16 megabytes each.  \n\n1. [5 points]: By mistake, the hash(key) function Thea is using with MapReduce always returns 1. What effect will that have on the execution of the word-count job, compared to using a well-behaved hash function? Circle the single best answer.  A. the job will produce incorrect final output   \n   B. 10 times as much total CPU time will be needed for Reduce phase   \n   C. 10 times as much total wall-clock time will be needed for Reduce phase   \n   D. 40 times as much total CPU time will be needed for Reduce phase   \n   E. 40 times as much total wall-clock time will be needed for Reduce phase   \n   F. the job will never complete  ", "answer": "C", "explanation": "Answer: C. The total amount of computation is unchanged, but it’s all done by one worker rather than divided up in parallel among 10 workers. B is not correct because the total amount of work doesn’t change; the only thing that the hash function changes is which worker does the work.  ", "type": "SingleChoice"}
+{"instance_id": 45, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 2, "points": 5, "problem": "# II VMware FT  \n\nThe paper The Design of a Practical System for Fault-Tolerant Virtual Machines by Scales et al. describes a method for avoiding split-brain using shared storage. Instead of using the shared storage’s test-and-set, Ben implements test-and-set using the kvsrv server from Lab 2 as follows:  \n\n```\nfunc test-and-set(clnt *tester.Clnt) bool {\n    val, version, err := clnt.Get(\"lock\")\n    if err != rpc.OK {\n        return false\n    }\n    if val == \"set\" {\n        return false\n    }\n    if err := clnt.Put(\"lock\", \"set\", version); err == rpc.OK {\n        return true\n    }\n    return false\n}\n```\n\nThe clnt.Put and clnt.Get are RPCs that invoke the server’s Put and Get methods. You can assume that Ben has implemented the Put and Get methods correctly.  \n\nThe initial value of “lock” is the empty string.  \n\nWhen the primary or the backup suspect that the other one has crashed, they invoke test-go-live, each with their own RPC client clnt:  \n\n```\nfunc test-go-live() {\n    for true {\n        if test-and-set(clnt) {\n            go-live()\n            return\n        }\n    }\n}\n```\n\nThe network may lose, delay, or duplicate a few messages, but most messages will be delivered.   \nThe computers (primary, backup, and kvsrv server) do not fail.  \n\n2. [5 points]: What statements about Ben’s implementation are true? (Circle all that apply)  \n   A. Both the primary and backup may observe test-and-set returning true in test-go-live, and “go live”, resulting in split brain   \n   B. The key/value server may never store “set” for the “lock” key   \n   C. The primary and backup may spin forever in test-go-live, retrying test-and-set, because it may never return true   \n   D. If all RPCs succeed with no timeouts while running test-go-live, either the primary or the backup will observe true from test-and-set, but not both  ", "answer": "C", "explanation": "Answer: C is true: the first Put may change lock to set, but the reply may be lost; a re-send will return ErrMaybe (since the version won’t match); so neither primary nor backup will ever see rpc.OK from Put. D is true: if there are no timeouts (i.e. no packets are lost) the first Put to arrive at the kvsrv will succeed, and the sender will get rpc.OK. A is false, because Put is conditional and only one can set val to “set”. B is false, because most messages will be delivered and thus eventually a Put will succeed in setting the lock.  ", "type": "SingleChoice"}
+{"instance_id": 46, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 3, "points": 5, "problem": "# III Linearizability  \n\nAlyssa is experimenting with a linearizable put/get key/value storage service. Unlike Lab 2, her key/value service has no versions; put calls look like put(key, value).  \n\nAlyssa has two clients. Client C1 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t + 1)\n```\n\nAt about the same time, client C2 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t * 2)\n```\n\nBefore either client starts, the value for key $^ { \\mathrm { 6 6 } } \\mathrm { X } ^ { \\mathrm { 7 } }$ in the storage system is 10. Both clients’ calls complete without error. There is no other activity involving the storage system, and there are no failures.  \n\nSuppose the history of the execution, in the style of Lecture 4, with values omitted, looks like this:  \n\nC1: |--Rx?--| |--Wx?--|\nC2: |--Rx?--| |--Wx?--|\n\n3. [5 points]: After both clients have finished, what could the resulting value of x be in the storage system? (Circle all that apply)  \n   A. 10   \n   B. 11   \n   C. 20   \n   D. 21   \n   E. 22  ", "answer": "B,C", "explanation": "Answer: 11 and 20. Both C1’s read and C2’s read see the initial value of $\\mathbf { X }$ (10), so C1 writes 11 and C2 writes 20. The writes are concurrent, so linearizability allows either write to appear to execute last, and thus provide the final value.  ", "type": "MultipleChoice"}
+{"instance_id": 47, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 4, "points": 5, "problem": "# III Linearizability  \n\nAlyssa is experimenting with a linearizable put/get key/value storage service. Unlike Lab 2, her key/value service has no versions; put calls look like put(key, value).  \n\nAlyssa has two clients. Client C1 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t + 1)\n```\n\nAt about the same time, client C2 executes this:  \n\n```\nt = get(\"x\")\nput(\"x\", t * 2)\n```\n\nBefore either client starts, the value for key $^ { \\mathrm { 6 6 } } \\mathrm { X } ^ { \\mathrm { 7 } }$ in the storage system is 10. Both clients’ calls complete without error. There is no other activity involving the storage system, and there are no failures.  \n\nSuppose the history of the execution, in the style of Lecture 4, with values omitted, looks like this:  \n\nC1: |--Rx?--| |--Wx?--|\nC2: |--Rx?--| |--Wx?--|\n\nAlyssa resets the value of $^ { 6 6 } \\mathrm { X } ^ { 7 3 }$ to 10, and re-runs the two client programs. This time, the execution history looks like this:  \n\nC1: |--Rx?--|    |--Wx?--|\nC2:     |---Rx?---|    |--Wx?--|  \n\n4. [5 points]: After both clients have finished, what could the resulting value of x be in the storage system? (Circle all that apply)  \n   A. 10   \n   B. 11   \n   C. 20   \n   D. 21   \n   E. 22  ", "answer": "B,C,E", "explanation": "Answer: 11, 20, and 22. 22 is possible if C2’s read sees C1’s write.  ", "type": "MultipleChoice"}
+{"instance_id": 48, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 5, "points": 5, "problem": "# IV GFS  \n\nConsider GFS as described in The Google File System by Ghemawat et al.  \n\n5. [5 points]: Which statements about GFS are true? (Circle all that apply)  \n   A. GFS ensures linearizability of client operations by allowing clients to read from chunk replicas.   \n   B. The primary server of a chunk ensures that Append operations are executed exactly once.   \n   C. A chunk server uses 64 Mbytes of disk space for each chunk.   \n   D. Leases help ensure that each chunk has only one primary.  ", "answer": "D", "explanation": "Answer:  D. A is false, because GFS allows reading chunks from backups, which may have not seen the last update to a chunk, violating linearizability. B is false because if an Append fails, the client retries the Append, which the primary executes, causing some Appends to be execute twice. C is false; the paper’s Section 2.5 says that chunks are stored as Linux files and are extended only as needed, with disk space allocated lazily; this means that if only a few bytes of a chunk are written, only that part of the Linux chunk file will consume disk space. D is true; Section 3.1 says that the coordinator grants a chunk’s lease to just one of the replicas, and only grants the lease to a different replica if the lease expires.  ", "type": "SingleChoice"}
+{"instance_id": 49, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 6, "points": 10, "problem": "# V Raft  \n\nRefer to Ongaro and Ousterhout’s In Search of an Understandable Consensus Algorithm (Extended Version).  \n\n6. [10 points]: Which statements about Raft are true? (Circle all that apply)  \n   A. If a follower receives an AppendEntries RPC from the leader and the follower’s term matches the one in the RPC, then the prevLogIndex in the RPC must be equal to or higher than the follower’s lastApplied   \n   B. Raft is optimized for the case that term switch happen frequently   \n   C. Raft guarantees that a leader in term $t$ is leader in term $t + 1$   \n   D. If a leader sends the command in log index $i$ on the apply channel, the leader must have persisted log index $i$   \n   E. If a follower crashes in a term and quickly reboots, it remembers who it voted for before the crash   \n   F. The leader’s matchIndex for a peer is always equal to or smaller than the leader’s nextIndex for that peer.   \n   G. A candidate who becomes leader sends out AppendEntries to all followers to suppress further elections   \n   H. If Raft doesn’t use snapshots, a crashed follower will send all committed log entries on the apply channel after it reboots, even ones that it sent before the crash  ", "answer": "D,E,F,G,H", "explanation": "Answer: D, E, F, G, H are true.  \nA is false, because an AppendEntries RPC from the leader may be delayed and arrive after later AppendEntries RPCs that bump up lastApplied; when the follower processes the first RPC, the prevLogIndex may be smaller than its lastApplied. B is false, because the authors believe terms change infrequently and therefore don’t think the fast-backup optimization is necessary.  ", "type": "MultipleChoice"}
+{"instance_id": 50, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 7, "points": 10, "problem": "# V Raft  \n\nRefer to Ongaro and Ousterhout’s In Search of an Understandable Consensus Algorithm (Extended Version).  \n\n[10 points]: Which of the following bugs causes a Raft implementation to violate the safety properties listed in Figure 3? (Circle all that apply)  \nA. A deadlock in a follower   \nB. A follower who starts an election very quickly   \nC. A partitioned leader who on rejoining updates its term to the new leader’s term and sends AppendEntries for commands in its log with the new term   \nD. A race condition in the follower’s implementation that causes two followers to send different commands on the apply channel for log index $i$   \nE. A candidate that forgets to vote for itself   \nF. A follower who appends a log entry to its log even if the term in the AppendEntries is smaller than its own and who then sends the log entry on the apply channel   \nG. A follower that forgets to implement the rollback optimization presented at the end of section 5.3   \nH. A leader who always sends only one entry in an AppendEntries RPC to a follower", "answer": "A,B,E,G,H", "explanation": "Answer: A, B, E, G, and H are examples of what are called “liveness” bugs: these bugs don’t cause wrong behavior but may prevent any progress. A: A deadlock in the follower may cause Raft to not make forward progress at some point (e.g., if the follower is necessary to form a majority). B may prevent a leader from being elected but it doesn’t violate the safety properties. E is another variation of B. G and H may cause Raft to run slowly but that doesn’t violate the safety properties.   C, D, F, on the other hand, are “safety” bugs that cause incorrect behavior that violates the safety rules of Raft of Figure 3.  ", "type": "MultipleChoice"}
+{"instance_id": 51, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 8, "points": 5, "problem": "# VI Lab 3A-3C  \n\nGeorge is implementing Raft as in Lab 3A-3C. Eager to test his implementation, George runs a git pull to get the latest changes from the 6.5840 staff. The latest changes introduce a new test, TestMiniReElection, which tests whether Raft can re-elect a leader after a single network partition.  \n\n```\nfunc TestMiniReElection(t *testing.T) {\n    servers := 3 // initialize three servers\n    ...\n    // wait for a leader to be elected; get the leader’s index\n    leader1 := ts.checkOneLeader()\n    \n    ts.g.DisconnectAll(leader1) // disconnect leader1 from other servers\n    \n    // wait for a new leader to be elected; get the leader’s index\n    leader2 := ts.checkOneLeader() // ***\n}\n```\n\nts.checkOneLeader() repeatedly polls only the connected servers until one of the connected servers returns that it is a leader. If it cannot find a leader within 5s, it returns a timeout error.  \n\nUnfortunately, there is a bug in the Raft test infrastructure. When leader1 is disconnected, leader1 can still send RPCs to the other servers but not receive responses from the other servers. George runs TestMiniReElection, and finds that the test fails at the line marked with “\\*\\*\\*” with the timeout error “expected one leader, got none”.  \n\n8. [5 points]: Assume George’s Raft implementation is completely correct, and that the network is reliable. Briefly explain why the buggy test infrastructure causes George’s implementation to fail the new test.  ", "answer": "Answer: All the peers will continue to receive leader1’s heartbeat AppendEntries RPCs, which will prevent them from ever starting an election.  ", "explanation": "Answer: All the peers will continue to receive leader1’s heartbeat AppendEntries RPCs, which will prevent them from ever starting an election.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 52, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 9, "points": 5, "problem": "# VII ZooKeeper  \n\nRefer to ZooKeeper: Wait-free coordination for Internet-scale systems, by Hunt, Konar, Junqueira, and Reed, and to Lecture 9.  \n\nAlyssa runs a ZooKeeper service with a ZooKeeper leader and multiple followers. Alyssa has three ZooKeeper client programs, P1, P2, and P3:  \n\n```\nP1:\n    s = openSession()\n    if create(s, \"/leader\", \"one\", flags=ephemeral) == true:\n        print \"P1 starting as leader\"\n        _, version = getData(s, \"/x\", watch=false)\n        setData(s, \"/x\", \"one\", version)\n        _, version = getData(s, \"/y\", watch=false)\n        setData(s, \"/y\", \"one\", version)\n```\n\n```\nP2:\n    s = openSession()\n    if create(s, \"/leader\", \"two\", flags=ephemeral) == true:\n        print \"P2 starting as leader\"\n        _, version = getData(s, \"/x\", watch=false)\n        setData(s, \"/x\", \"two\", version)\n        _, version = getData(s, \"/y\", watch=false)\n        setData(s, \"/y\", \"two\", version)\n        print \"P2 done\"\n```\n\n```\nP3:\n    s = openSession()\n    sync(s, \"/\")\n    x = getData(s, \"/x\", watch=false)\n    y = getData(s, \"/y\", watch=false)\n    print x, y\n```\n\nInitially, znode “/leader” does not exist, znode “/x” exists and contains the string “empty”, and znode “/y” exists and also contains the string “empty”.  \n\nThe ZooKeeper calls in Alyssa’s code are all synchronous. The ZooKeeper client call create() is exclusive, returning false if the file already exists, and true if it was able to create the file. The programs might end up talking to different ZooKeeper followers.  \n\nAlyssa starts P1, waits until she sees it print “P1 starting as leader”, then (on a different computer) starts P2. Just at this point in time, P1’s network connection starts to become slow and unreliable, so that sometimes it delivers packets, sometimes not. Alyssa sees that P2 prints “P2 starting as leader”, and after a little while “P2 done”. P2’s network connection is reliable and fast.  \n\nAfter Alyssa sees “P2 done”, she runs P3.  \n\n9. [5 points]: What output from P3 could Alyssa see? (Circle all that apply)  \n   A. one, one \n   B. two, two \n   C. one, two \n   D. two, one  ", "answer": "B", "explanation": "Answer: B. Only two, two. We know P1’s session must have terminated, because Alyssa saw P2 print “P2 starting as leader,” which could only have happened if ZooKeeper deleted P1’s ephemeral /leader file. So P2 will only start reading and writing data after P1 is guaranteed to have stopped writing (since ZooKeeper terminated its session). So P1 and P2’s activities won’t be intermixed; P2 runs strictly after P1. So both P2’s sets will succeed. P3 starts after P2 finishes, and P3 calls sync(), so P3 will see P2’s writes.  ", "type": "SingleChoice"}
+{"instance_id": 53, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 10, "points": 5, "problem": "# VIII Distributed Transactions  \n\nAlyssa has a database that supports serializable transactions. Records “x” and “y” both start out containing the value 1. Alyssa starts three transactions at the same time:  \n\n```\nT1:\n    BEGIN-X\n        temp1 = get(\"x\")\n        temp2 = get(\"y\")\n        put(\"x\", temp1 + temp2)\n    END-X\n\nT2:\n    BEGIN-X\n        temp1 = get(\"y\")\n        put(\"x\", temp1 * 2)\n    END-X\n\nT3:\n    BEGIN-X\n        put(\"y\", 3)\n    END-X\n```\n\nBEGIN-X marks the start of a transaction, and END-X marks the end. All three transactions commit and finish. There are no aborts, deadlocks, or failures. There is no other activity in the database.  \n\nWhen Alyssa looks at record “x” in the database after the transactions complete, she sees the value 5.  \n\n10. [5 points]: Briefly explain how the value 5 could have resulted from these transactions.  ", "answer": "Answer: The database system could have executed the transactions one at a time, in the order T2, T3, T1.  ", "explanation": "Answer: The database system could have executed the transactions one at a time, in the order T2, T3, T1.  ", "type": "ShortAnswerQuestion"}
+{"instance_id": 54, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_i", "problem_num": 11, "points": 5, "problem": "# IX Spanner  \n\nRefer to Spanner: Google’s Globally-Distributed Database, by Corbett et al.  \n\n11. [5 points]: Suppose you only wanted to support read-write transactions (not read-only and not snapshot reads). You want therefore to eliminate all of the Spanner mechanisms that are not needed for read-write transactions. Which of these techniques can be eliminated? (Circle all that apply)  \n    A. commit wait (Sections 4.1.2, 4.2.1)   \n    B. safe time (Section 4.1.3)   \n    C. deadlock avoidance (Section 4.2.1)   \n    D. assignment of timestamps to read/write transactions (Section 4.1.2)  ", "answer": "A,B,D", "explanation": "vAnswer: A, B, and D. The time-stamp mechanism is only needed for read-only transactions. Read-write transactions are made serializable and externally consistent by Spanner’s two-phase locking and two-phase commit; the correctness of read-write transactions thus does not rely on time-stamps.  ", "type": "MultipleChoice"}
+{"instance_id": 55, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 1, "points": 8, "problem": "# I FaRM\n\nConsider the following statements about FaRM as described in No compromises: distributed transactions with consistency, availability, and performance. For each statement, circle True or False.\n\n1.[8 points]:\n\n1. True / False : Short leases are important for FaRM, because FaRM must reconfigure to remove a failed replica of a region before FaRM can continue to use the region.\n2. True / False : For small messages RDMA performs $4 \\times$ better than RPC because the CPU is a performance bottleneck for RPCs.\n3. True / False : To obtain write locks FaRM uses RDMA writes so that the destination machine’s CPU doesn’t have to do any work on a lock request.\n4. True / False $\\because$ For the TATP workload the median latency on the left end of the graph is 9 microseconds (see Figure 7), rather than the 19 microsecond mean commit latency mentioned in Section 6.3, because not all operations update multiple rows.", "answer": "True,True,False,True", "explanation": "Answer: True,True,False,True 1. True, since FaRM uses primary-backup replication instead of Paxos/Raft and all replicas of a chunk must be up for FaRM to be able to continue using the chunk. 2. True; see Figure 3. False; in this case FaRM uses RDMA to implement RPC, which runs locking code on the destination machine. 4. True; see description of TATP workload and the explanation for the left-end of the graph.", "type": "True/False Questions"}
+{"instance_id": 56, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 2, "points": 3, "problem": "# II Chardonnay\n\nAnswer these questions with reference to Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases by Eldeeb et al.\n\n2. [3 points]: Section 6.1 says that all of a read-write transaction’s writes are tagged with a version containing the same epoch number. For example, if a read-write transaction executes at around the time the Epoch Service changes the epoch from 20 to 21, either all the transaction’s updated records should have a VID prefixed with 20, or all should be prefixed with 21, but not a mix of the two. Explain briefly why having the same epoch number in all the written items’ versions is important for correctness.", "answer": "Answer: A read-only (snapshot) transaction must see record versions that, for every readwrite transaction, either reflect all of that read-write transaction’s writes, or none. Chardonnay’s strategy is for read-only transactions to read a snapshot as of the end of some epoch. This only works if every read-write transaction’s writes appear all in the same epoch, and are not spread over multiple epochs.", "explanation": "Answer: A read-only (snapshot) transaction must see record versions that, for every readwrite transaction, either reflect all of that read-write transaction’s writes, or none. Chardonnay’s strategy is for read-only transactions to read a snapshot as of the end of some epoch. This only works if every read-write transaction’s writes appear all in the same epoch, and are not spread over multiple epochs.", "type": "ShortAnswerQuestion"}
+{"instance_id": 57, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 3, "points": 3, "problem": "# II Chardonnay\n\nAnswer these questions with reference to Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases by Eldeeb et al.\n\n3. [3 points]: Section 6.1 describes how a transaction chooses a version ID (VID): it is the current epoch, with a counter appended. Suppose, instead, that version IDs were just the current epoch (with nothing appended). Briefly explain how this change could cause snapshot reads to yield incorrect (non-serializable) values.", "answer": "Answer: Read-only (snapshot) transactions would not be able to tell which version was the last one in an epoch, and thus might read a mix of versions from early and late in the epoch.", "explanation": "Answer: Read-only (snapshot) transactions would not be able to tell which version was the last one in an epoch, and thus might read a mix of versions from early and late in the epoch.", "type": "ShortAnswerQuestion"}
+{"instance_id": 58, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 4, "points": 3, "problem": "# II Chardonnay\n\nAnswer these questions with reference to Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases by Eldeeb et al.\n\n4. [3 points]: Section 8 says that the locks for a transaction are acquired one at a time using a “chain” technique. It would be faster if the client sent out all the lock requests in parallel. Briefly explain why sending a transaction’s lock request messages in parallel would be a bad idea.", "answer": "Answer: The purpose of acquiring the locks one at a time is to ensure that they are acquired in a deadlock-avoiding order. Sending out the requests in parallel would not ensure that, and could lead to deadlock.", "explanation": "Answer: The purpose of acquiring the locks one at a time is to ensure that they are acquired in a deadlock-avoiding order. Sending out the requests in parallel would not ensure that, and could lead to deadlock.", "type": "ShortAnswerQuestion"}
+{"instance_id": 59, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 5, "points": 8, "problem": "# III Grove\n\nConsider “Grove: a separation-logic library for verifying distributed systems” by Sharma et al. For each statement, circle True or False.\n\n5.[8 points]:\n\n1. True / False : Because any replica in Grove can serve a Get request, Grove achieves higher throughput for Get’s than for Put’s (see Figure 19), but sacrifices linearizability for Get’s.\n2. True / False $\\because$ If a response to a Get is lost, the Grove exactlyonce library will resend the Get, which the server executes again and the second response may contain a different value than the first (but lost) response contained.\n3. True / False $\\because$ If the Go code for vKV had a bug that caused a backup to lose the effect of a Put after a crash, then Grove’s specification and proof would catch this bug.\n4. True / False $\\because$ If the configservice had an infinite loop that caused Reconfigure to never return, then this bug would be caught by Grove’s specification and proof.", "answer": "False,True,True,False", "explanation": "Answer: False,True,True,False1. False; Grove doesn’t sacrifice linearizability for reads from backups. 2. True; for example, if a Put happened between the first and second Get (but it doesn’t matter for correctness). 3. True; that is a safety property that Grove’s specification and proof capture. 4. False; Grove doesn’t prove liveness properties.", "type": "True/False Questions"}
+{"instance_id": 60, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 6, "points": 5, "problem": "# IV DynamoDB Transactions\n\nWith respect to Distributed Transactions at Scale in Amazon DynamoDB by Idziorek et al., imagine that a read-only transaction (TransactGetItems) and a read-write transaction (TransactWriteItems) execute concurrently. The read-write transaction updates multiple items stored on different storage nodes; the read-only transaction reads the same set of items. Serializability requires that the readonly transaction see all of the items as they were before the read-write transaction, or all as they are after the read-write transaction (but not a mix of before and after).\n\n6. [5 points]: For the above scenario, which mechanisms help to ensure this all-before or all-after property? Refer to the design as described in the paper’s Sections 2 and 3 (but not Section 4). Circle all that apply.\n   A. The item.timestamp $<$ input.timestamp check in Listing 3.   \n   B. Multiple time-stamped versions stored for each key.   \n   C. Log sequence numbers (LSNs).   \n   D. Two-phase locking.   \n   E. item.ongoingTransactions", "answer": "C,E", "explanation": "Answer: C,E. C and E are correct. A is not correct because read-only transactions do not involve the time-stamps. B is not correct because DynamoDB doesn’t store multiple versions of a given record. D is not correct because read-only transactions don’t use two-phase locking.", "type": "MultipleChoice"}
+{"instance_id": 61, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 7, "points": 8, "problem": "# V AWS Lambda\n\nConsider the paper, and guest lecture about, On-demand container loading in AWS Lambda by Brooker et al. For each of the following statements, indicate whether it is true or false.\n\n7.[8 points]:\n\n1. True / False : AWS Lambda is attractive to customers because the customer can spawn many lambdas in response to a load spike without having to provision machines in advance.\n2. True / False $\\because$ Replication of chunks in the AZ-level cache is important to ensure that chunks are not lost forever when a cache node fails.\n3. True / False : Erasure coding of cached chunks helps improve tail latency, because a worker can reconstruct a chunk without having to download all stripes of the chunk.\n4. True / False : The convergent encryption scheme described in Section 3.1 helps protect against an attacker who compromises a worker and attempts to read containers of any customer.", "answer": "True,False,True,True", "explanation": "Answer: True,False,True,True. 1. True; customers can spawn many Lambdas in response to a spike in load. 2. False; AWS Lambda replicates for low latency instead of durability. 3. True; erasure coding allows the client to reconstruct the data with a few stripes without having to wait for all stripes. 4. True; the goal is that a worker can access only the data that it needs to run the function sent to it, but because “any” is ambiguous we accepted False too.", "type": "True/False Questions"}
+{"instance_id": 62, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 8, "points": 5, "problem": "# VI Lab 4\n\nBen Bitdiddle is implementing the rsm package from Lab 4. He uses a shared reply table to implement communication between the rsm’s Submit() method and the reader goroutine. The reply table maps a completed operation’s id to its reply from DoOp(). In addition, rsm contains a nextId field that is initialized to 0 and is incremented by 1 each time Submit() is called.\n\nHis code takes the following steps:\n\n- A. The kvraft server calls `rsm.Submit()` with a client request req.\n- B. `Submit()` increments `rsm.nextId`, packs the request into a new operation `Op{id:rsm.nextId, req: req}`, and calls`rsm.rf.Start()` with the Op. `Ifrsm.rf.Start()` indicates that the current peer is the Raft leader, it then waits for the reply table to contain a reply under the key `Op.id` (the one passed to `Start()`).\n- C. When the rsm’s reader goroutine receives an `ApplyMsg` from the `applyCh`, it calls `DoOp()`, then populates the reply table with the result from `DoOp()` under key `Op.id`.\n- D. When `Submit()` sees the `Op.id` key in the reply table, it deletes the `Op.id` key/value pair from the reply table and returns the corresponding reply.\n\nAssume that all omitted parts of Ben’s design are correct.\n\n8. [5 points]: Ben notices that his implementation can result in incorrect behavior. Explain why.", "answer": "Answer: Each peer will, when it is leader, assign operations IDs starting with ID zero. Thus, if there is a change in leader, the new leader may assign IDs to new operations that are the same as IDs being waited for by Submit()s in the old leader, but the operations are different. So clients may receive responses for the wrong operations.", "explanation": "Answer: Each peer will, when it is leader, assign operations IDs starting with ID zero. Thus, if there is a change in leader, the new leader may assign IDs to new operations that are the same as IDs being waited for by Submit()s in the old leader, but the operations are different. So clients may receive responses for the wrong operations.", "type": "ShortAnswerQuestion"}
+{"instance_id": 63, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 9, "points": 8, "problem": "# VII Ray\n\nConsider the following Ray program, which creates a `sqrt_task` task for each number in the list `mylist`. The creation yields a DFut and the caller waits for the tasks to complete by calling get on each future. The code is as follows:\n\n```\n# A call to sqrt_task yields a DFut\n@ray.remote\ndef sqrt_task(n):\n    # sqrt is a python function, which returns the square root of its argument\n    return sqrt(n)\n\n@ray.remote\ndef sum_task(f):\n    # sum is a python function, which takes a future and returns the sum\n    l = get(f) # collect the list l\n    return sum(l) # return the sum of the numbers in list l\n\n# A call to sqrt_list_task yields a shared DFut\n@ray.remote\ndef sqrt_list_task(n_list):\n    # start tasks and collect futures\n    l = [ ] # list holding DFuts\n    for i in n_list: # iterate over list of numbers\n    \tl.append(sqrt_task(i))\n\n\tr = [ ]\n\tfor f in l:\n\t\tr.append(get(f)) # collect the result\n\n\treturn r # return a SharedDFut for r\n\n# invoke sqrt_list_task with a large list of numbers, sum, and print result\nf = sqrt_list_task(mylist)\ns = sum_task(f)\nprint(s)\n```\n\nAssume Ray behaves in the way described in Ownership: a distributed futures system for finegrained tasks by Wang et al., and Ray is running on a cluster of computers.\n\nFor each of the following statements, indicate whether it is true or false.\n\n9. [8 points]:\n\n- True / False : a Ray worker may start running sum task before sqrt list task has finished\n- True / False : the driver that invokes sum task receives the list with square-rooted numbers from the worker that ran sqrt list task.\n- True / False : the driver is the owner for each future that sqrt task returns.\n- True / False : the driver is the owner for the shared future returned by sqrt list task.", "answer": "True,False,False,True", "explanation": "Answer: True,False,False,True. 1. True, since remote invocations are asynchronous. 2. False; the worker running sum task will fetch the data from the worker that ran sqrt list task. 3. False; the worker who runs sqrt list task is the owner of these futures; 4. True; the driver starts sqrt list task and is thus the owner.", "type": "True/False Questions"}
+{"instance_id": 64, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 10, "points": 5, "problem": "# VIII SUNDR\n\nConsider the straw-man design in the paper Secure Untrusted Data Repository (SUNDR) by Li et al.\n\nIn the straw-man design, a client asks the server to append a “fetch” operation to the history after reading the history. Each client also remembers the most recent operation it has appended to the history. One reason for these mechanisms is to prevent the server from first showing some operations to the client, and then later hiding those operations.\n\nSuppose one modified the SUNDR straw-man protcol in three ways. First, eliminate “fetch” operations, so that the history only includes modify operations. Second, have each client remember the last entry in the most recent history the client obtained from the server; call this remembered entry ELAST. Third, each client checks that ELAST is present in the next history it obtains from the server. With these modifications, a client can still detect a situation in which the server first shows an operation to a client, and then omits that operation in a later history shown to the same client.\n\n10. [5 points]: It turns out this modification is a bad idea. Explain briefly how this modification would allow a malicious server to violate fork consistency.", "answer": "Answer: Suppose the last two entries in the (correct) history are E1 followed by E2. When client C1 asks for the history, the SUNDR server could return the history just through E1, omitting E2. C1 would then remember E1 as its ELAST. This is a fork (and is allowed by fork consistency) because C1 isn’t seeing the complete history. When C1 next asks the SUNDR server for the history, the SUNDR server could return the complete history, including E2. C1 would accept this history because ELAST (=E1) is present and all other checks pass (e.g. E1’s and E2’s signatures over the preceding history will validate). At this point the fork has been healed (since C1 sees the previously concealed E2); this is a violation of fork consistency. Note that every entry (including whatever is in ELAST) has a signature over the entire preceding history, and the client checks all of these signatures every time it obtains a history from the SUNDR server, so the SUNDR server cannot successfully change anything in the history before a client’s ELAST.", "explanation": "Answer: Suppose the last two entries in the (correct) history are E1 followed by E2. When client C1 asks for the history, the SUNDR server could return the history just through E1, omitting E2. C1 would then remember E1 as its ELAST. This is a fork (and is allowed by fork consistency) because C1 isn’t seeing the complete history. When C1 next asks the SUNDR server for the history, the SUNDR server could return the complete history, including E2. C1 would accept this history because ELAST (=E1) is present and all other checks pass (e.g. E1’s and E2’s signatures over the preceding history will validate). At this point the fork has been healed (since C1 sees the previously concealed E2); this is a violation of fork consistency. Note that every entry (including whatever is in ELAST) has a signature over the entire preceding history, and the client checks all of these signatures every time it obtains a history from the SUNDR server, so the SUNDR server cannot successfully change anything in the history before a client’s ELAST.", "type": "ShortAnswerQuestion"}
+{"instance_id": 65, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 11, "points": 3, "problem": "# IX Bitcoin\n\nBitcoin: A Peer-to-Peer Electronic Cash System, by Nakamoto, mentions in Section 4 that the cryptographic hash of a valid block must start with a certain number of zero bits. Assume that the hash algorithm is SHA-256, which returns a 256-bit hash.\n\n11. [3 points]: You are trying to mine a new block. The required number of zero bits is seven. You set the block’s 32-bit nonce field to a random value, and compute the SHA-256 hash of the block. What’s the probability that the first seven bits of the hash are zeros? Circle the one best answer.\n    A. 1/2   \n    B. 1/7   \n    C. 1/128   \n    D. 1/256   \n    E. 1/249   \n    F. $1 / ( 2 ^ { 3 2 } )$", "answer": "C", "explanation": "Answer: C (1/128) is correct.", "type": "SingleChoice"}
+{"instance_id": 66, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 12, "points": 3, "problem": "# IX Bitcoin\n\nBitcoin: A Peer-to-Peer Electronic Cash System, by Nakamoto, mentions in Section 4 that the cryptographic hash of a valid block must start with a certain number of zero bits. Assume that the hash algorithm is SHA-256, which returns a 256-bit hash.\n\nBen runs a Bitcoin node. A few hours ago Ben’s node learned about block B747, which is a valid block. Ben sees a transaction T27 in B747 that pays some money to a certain public key, signed by the correct private key. Ben would like to steal the money involved in T27. He modifies his copy of block B747 so that the payee’s public key in T27 is Ben’s own public key. He doesn’t change anything else in B747. He modifies his Bitcoin node software to announce the block to other nodes as if it were a valid block.\n\n12. [3 points]: Which of the following will cause other Bitcoin nodes to decide that Ben’s B747 is invalid? Circle all that apply.\n    A. The “Prev Hash” field in the next block in the chain doesn’t refer to Ben’s B747.   \n    B. Other peers will already know about the real B747.   \n    C. The “Prev Hash” field in Ben’s B747 isn’t valid.   \n    D. The hash of Ben’s B747 won’t start with enough zeroes.   \n    E. The signature in T27 in Ben’s B747 isn’t correct.", "answer": "D,E", "explanation": "Answer: D,E. D and E are correct. A and B are not correct: peers have to at least temporarily accept otherwise-valid blocks with no successor because they might turn out to be the start of a new winning fork. C is not correct because Ben didn’t modify the Prev Hash field, so it continues to refer to the predecessor of the original B747. D is correct because modifying the block will modify its cryptographic hash; the real B747’s hash started with enough zeroes, but a modified B747 is fantastically unlikely to happen also to start with enough zeroes. E is correct because the signature was correct for T27’s original payee public key, so the signature won’t be correct with Ben as the payee.", "type": "MultipleChoice"}
+{"instance_id": 67, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 13, "points": 3, "problem": "# IX Bitcoin\n\nBitcoin: A Peer-to-Peer Electronic Cash System, by Nakamoto, mentions in Section 4 that the cryptographic hash of a valid block must start with a certain number of zero bits. Assume that the hash algorithm is SHA-256, which returns a 256-bit hash.\n\nNow Ben is designing a new crypto-currency system, identical to Bitcoin, except with a different agreement scheme to resolve forks in the block-chain: instead of the longest fork winning, nodes compute the hash of the last block in each fork, and the fork with the lowest last-block hash value wins. Ben reasons that all nodes will compute the same hashes, and thus all nodes will agree about which fork wins.\n\n13. [3 points]: Why is Ben’s fork-resolution idea a disaster? Explain briefly.", "answer": "Answer: In real Bitcoin, if an attacker wants to eliminate a transaction that occurs many blocks in the past by creating a fork from before that transaction, the attacker has to sustain a block mining rate faster than the main chain long enough to catch up, which requires compute power believed to be too expensive for most attackers. But with Ben’s scheme, an attacker only needs to mine a single block that happens to have a hash smaller than the corresponding block in the main chain; then all nodes will switch to the attacker’s new short fork. The attacker needs relatively little compute power to mine this single block.", "explanation": "Answer: In real Bitcoin, if an attacker wants to eliminate a transaction that occurs many blocks in the past by creating a fork from before that transaction, the attacker has to sustain a block mining rate faster than the main chain long enough to catch up, which requires compute power believed to be too expensive for most attackers. But with Ben’s scheme, an attacker only needs to mine a single block that happens to have a hash smaller than the corresponding block in the main chain; then all nodes will switch to the attacker’s new short fork. The attacker needs relatively little compute power to mine this single block.", "type": "ShortAnswerQuestion"}
+{"instance_id": 68, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 14, "points": 6, "problem": "# X PBFT\n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.\n\nFor each of the following statements, indicate whether it is true or false.\n\n14. [6 points]:\n\n- True / False $\\because$ Assume all replicas of a PBFT-replicated system are running the same software and the software has a bug. If an attacker exploits this bug in all replicas to control each replica, then the attacker can commit arbitrary operations in the log.\n- True / False : If an attacker controls the primary of a PBFT-replicated system, then the attacker can commit arbitrary operations in the log.\n- True / False : Honest replicas must include in a VIEW-CHANGE the messages already prepared so that a new primary cannot omit already-committed operations.", "answer": "True,False,True", "explanation": "Answer: True,False,True. 1. True; PBFT doesn’t guarantee correctness if more than f machines are compromised. 2. False; the attacker must control more than f machines to be able to compromise PBFT; 3. True; see protocol description.", "type": "ShortAnswerQuestion"}
+{"instance_id": 69, "exam_id": "6_5840_distributed_system_engineering_spring_2025_exam_ii", "problem_num": 15, "points": 2, "problem": "# X PBFT\n\nConsider the PBFT protocol as described in the paper Practical Byzantine Fault Tolerance by Castro and Liskov.\n\nFor each of the following statements, indicate whether it is true or false.\n\n15. [2 points]: If there are 10 machines in our system, and at most 2 machines are malicious, how many REPLY messages does a client need to receive before it knows its operation was executed? (Circle best answer)\nA. 1   \nB. 3   \nC. 4   \nD. 6   \nE. 7", "answer": "B", "explanation": "Answer: B: $\\mathrm { f } { + } 1$ , where $\\mathrm { f } = 2$ ; as stated in the paper, we only need to guarantee that one honest replica voted REPLY.", "type": "SingleChoice"}
diff --git a/benchmarks/course_exam_bench/data/pretrain/course_exam_pretrain_example.jsonl b/benchmarks/course_exam_bench/data/pretrain/course_exam_pretrain_example.jsonl
new file mode 100644
index 00000000..e37c1a6d
--- /dev/null
+++ b/benchmarks/course_exam_bench/data/pretrain/course_exam_pretrain_example.jsonl
@@ -0,0 +1,11 @@
+{
+  "text": "MapReduce: Simplified Data Processing on Large Clusters\n\nMapReduce is a programming model and implementation for processing and generating large data sets. The intermediate key space is partitioned among R reduce tasks using hash(key) mod R. This ensures that all values for a given key are processed by the same reduce task.\n\nWhen implementing MapReduce, it's crucial to ensure proper load balancing. If the hash function is poorly designed (e.g., always returns the same value), all intermediate data will be sent to a single reduce worker, eliminating parallelism and increasing wall-clock time significantly. While the total CPU time remains unchanged, the lack of parallel execution means one worker must handle all the work sequentially.\n\nThe MapReduce framework automatically handles the distribution of data across workers, task scheduling, and failure recovery, making it easier to process large datasets across distributed systems.",
+  "metadata": {
+    "source": "MIT 6.5840 Distributed System Engineering Course Materials",
+    "course": "Distributed System Engineering",
+    "topic": "MapReduce",
+    "year": 2025,
+    "content_type": "educational_text",
+    "related_paper": "MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat"
+  }
+}
\ No newline at end of file
diff --git a/benchmarks/course_exam_bench/data/pretrain/example_bench_pretrain_timestamp.jsonl b/benchmarks/course_exam_bench/data/pretrain/example_bench_pretrain_timestamp.jsonl
deleted file mode 100644
index c44eafe2..00000000
--- a/benchmarks/course_exam_bench/data/pretrain/example_bench_pretrain_timestamp.jsonl
+++ /dev/null
@@ -1 +0,0 @@
-{"text": "text of one doc", "metadata": {"scenarios": "XXX", "subtask": "XXXX", "description": "xx", "link": "XXX", "XXX": "XXX"}}
\ No newline at end of file
diff --git a/benchmarks/course_exam_bench/data/sft/course_exam_sft_example.jsonl b/benchmarks/course_exam_bench/data/sft/course_exam_sft_example.jsonl
new file mode 100644
index 00000000..e0c6b540
--- /dev/null
+++ b/benchmarks/course_exam_bench/data/sft/course_exam_sft_example.jsonl
@@ -0,0 +1,14 @@
+{
+  "sys_prompt": "You are a university student who has completed the Distributed System Engineering course. You are now answering a final exam question. This is a SingleChoice problem. Please return your response in JSON format with 'answer' and 'explanation' fields.",
+  "user_prompt": "Below is the problem description:\n# I MapReduce  \n\nThe MapReduce paper (MapReduce: Simplified Data Processing on Large Clusters, by Dean and Ghemawat) says in Section 3.1 that the intermediate key space is partitioned among the R reduce tasks using hash(key) mod R.  \n\nThea is running the word-count MapReduce job (pseudo-code in the paper\u2019s Section 2.1) on a cluster with 10 worker machines. M is 20 and R is 40. There are no failures, the network is reliable, no machines are slower than expected, and there is no competing work on any of the machines or networks involved. The Map input is divided into 20 pieces of 16 megabytes each.  \n\n1. [5 points]: By mistake, the hash(key) function Thea is using with MapReduce always returns 1. What effect will that have on the execution of the word-count job, compared to using a well-behaved hash function? Circle the single best answer.  A. the job will produce incorrect final output   \n   B. 10 times as much total CPU time will be needed for Reduce phase   \n   C. 10 times as much total wall-clock time will be needed for Reduce phase   \n   D. 40 times as much total CPU time will be needed for Reduce phase   \n   E. 40 times as much total wall-clock time will be needed for Reduce phase   \n   F. the job will never complete  ",
+  "response": "{\"answer\": \"C\", \"explanation\": \"Answer: C. The total amount of computation is unchanged, but it\\u2019s all done by one worker rather than divided up in parallel among 10 workers. B is not correct because the total amount of work doesn\\u2019t change; the only thing that the hash function changes is which worker does the work.  \"}",
+  "metadata": {
+    "course": "Distributed System Engineering",
+    "exam": "6.5840 Distributed System Engineering: Spring 2025 Exam I",
+    "question_type": "SingleChoice",
+    "points": 5,
+    "instance_id": 1,
+    "topic": "MapReduce",
+    "data_quality": "high"
+  }
+}
\ No newline at end of file
diff --git a/benchmarks/course_exam_bench/data/sft/example_bench_sft_timestamp.jsonl b/benchmarks/course_exam_bench/data/sft/example_bench_sft_timestamp.jsonl
deleted file mode 100644
index 26cadb96..00000000
--- a/benchmarks/course_exam_bench/data/sft/example_bench_sft_timestamp.jsonl
+++ /dev/null
@@ -1,2 +0,0 @@
-{"sys_prompt": "You are XXX", "user_prompt": "what", "thinking": "chain of thought", "response": "XXX", "metadata": {"scenario": "XX", "subtask": "XXX", "data_quality":"high", "XXX": "XXX"}}
-  
\ No newline at end of file
diff --git a/benchmarks/course_exam_bench/install.sh b/benchmarks/course_exam_bench/install.sh
index 8bd15145..c10b4886 100755
--- a/benchmarks/course_exam_bench/install.sh
+++ b/benchmarks/course_exam_bench/install.sh
@@ -2,39 +2,17 @@
 
 set -e  # Exit immediately on error.
 
-
-if command -v sudo >/dev/null 2>&1; then
-    sudo apt update 
-    sudo apt install openjdk-17-jdk -y
-else
-    apt update
-    apt install -y openjdk-17-jdk
-fi
-
-
-# Verify Java installation
-readlink -f $(which java)
-
-# Set JAVA_HOME environment variable
-export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
-export PATH=$JAVA_HOME/bin:$PATH
-
-# Verify JAVA_HOME
-echo $JAVA_HOME
-java -version
-
-# if .venv does not exist, create it
+# Create virtual environment if it doesn't exist
 if [ -d ".venv" ]; then
     echo "==> .venv already exists, skipping creation."
 else
     echo "==> Creating .venv directory..."
-
     python3 -m venv .venv
     source .venv/bin/activate
+    pip install --upgrade pip
     pip install -r requirements.txt
-    pip install pytest
-    pip install pytest-cov
+    pip install pytest pytest-cov
     deactivate
 fi
 
-echo "==> CourseExamBench environment is set up successfully."
\ No newline at end of file
+echo "==> CourseExamBench environment is set up successfully."
diff --git a/benchmarks/course_exam_bench/requirements.txt b/benchmarks/course_exam_bench/requirements.txt
index e790a1c5..2e44ba8d 100644
--- a/benchmarks/course_exam_bench/requirements.txt
+++ b/benchmarks/course_exam_bench/requirements.txt
@@ -1,5 +1,4 @@
-pyspark==3.5.5
-loguru==0.7.3
 pandas==2.2.3
+loguru==0.7.3
 litellm==1.77.5
-sentence_transformers
\ No newline at end of file
+sentence_transformers
diff --git a/benchmarks/course_exam_bench/run.sh b/benchmarks/course_exam_bench/run.sh
index c7e2667b..e6472d45 100755
--- a/benchmarks/course_exam_bench/run.sh
+++ b/benchmarks/course_exam_bench/run.sh
@@ -20,10 +20,5 @@ NEW_MODEL_NAME="${MODEL_NAME//\//_}"
 
 source .venv/bin/activate
 echo "==> Start to run CourseExamBench with model: ${MODEL_NAME}"
-# Note that if you benchmark has multiple tasks, you need to add --task <task> 
-# in your code to enable task selection.
-python src/main.py \
-    --model_name "${MODEL_NAME}" 
-    # --save_path "./outputs/systemcourseexam__${NEW_MODEL_NAME}__$(date +"%Y-%m-%d_%H-%M-%S")" \
-
+python src/main.py --model_name "${MODEL_NAME}"
 deactivate
diff --git a/benchmarks/course_exam_bench/src/main.py b/benchmarks/course_exam_bench/src/main.py
index 855d077c..487e8289 100644
--- a/benchmarks/course_exam_bench/src/main.py
+++ b/benchmarks/course_exam_bench/src/main.py
@@ -1,4 +1,4 @@
-"""This script evaluates a course exam benchmark using a specified LLM model."""
+"""Evaluate LLM performance on course exam benchmark."""
 
 import argparse
 import json
@@ -6,304 +6,367 @@
 import sys
 from datetime import datetime
 
-sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../../../')))
+import pandas as pd
+
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../")))
 
 from loguru import logger
-from pyspark.sql import SparkSession
-from pyspark.sql import functions as F
-from pyspark.sql.types import FloatType, IntegerType, LongType, StringType, StructField, StructType
 
+from sdk.evaluator import ExamEvaluator
+from sdk.executor import SimpleExecutor
 from sdk.utils import set_llm_endpoint_from_config
 
-set_llm_endpoint_from_config('env.toml')
+FORMAT_INSTRUCTIONS = {
+    "SingleChoice": """
+This is a Single-choice problem.
+
+Please return your response in the following JSON format:
+```json
+{"answer": "A", "explanation": "Your explanation here."}
+```
+""",
+    "MultipleChoice": """
+This is a MultipleChoice problem.
+
+Please return your response in the following JSON format:
+```json
+{"answer": "A,B,C", "explanation": "Your explanation here."}
+```
+
+answer is capital letters separated by commas, without spaces
+""",
+    "True/False Questions": """
+This is a True/False problem.
+
+Please return your response in the following JSON format:
+```json
+{"answer": "True,False,True", "explanation": "Your explanation here."}
+```
+
+answer is each item corresponds to a sub-question
+""",
+    "ShortAnswerQuestion": """
+This is a ShortAnswerQuestion problem.
+
+Please return your response in the following JSON format:
+```json
+{"answer": "Your answer here.", "explanation": "Your explanation here."}
+```
+""",
+}
+
+
+def load_benchmark_data(data_dir):
+    """Load benchmark data from exam metadata and questions files.
 
-from sdk.evaluator import ExamEvaluator  # noqa: E402
-from sdk.executor import SimpleExecutor  # noqa: E402
+    Args:
+        data_dir: Directory containing exams_metadata.json and questions.jsonl
 
+    Returns:
+        Tuple of (questions list, exams metadata dict)
+    """
+    metadata_file = os.path.join(data_dir, "exams_metadata.json")
+    questions_file = os.path.join(data_dir, "questions.jsonl")
+
+    with open(metadata_file, encoding="utf-8") as f:
+        metadata = json.load(f)
+        exams_dict = {exam["exam_id"]: exam for exam in metadata["exams"]}
+
+    # Load questions and join with exam metadata
+    questions = []
+    with open(questions_file, encoding="utf-8") as f:
+        for line in f:
+            question = json.loads(line)
+            exam = exams_dict[question["exam_id"]]
+
+            # Merge question with exam metadata
+            groundtruth = {
+                "instance_id": question["instance_id"],
+                "exam_id": question["exam_id"],
+                "test_paper_name": exam["test_paper_name"],
+                "course": exam["course"],
+                "year": exam["year"],
+                "problem_num": question["problem_num"],
+                "points": question["points"],
+                "score_total": exam["score_total"],
+                "score_max": exam["score_max"],
+                "score_avg": exam["score_avg"],
+                "score_median": exam["score_median"],
+                "problem": question["problem"],
+                "answer": question["answer"],
+                "explanation": question["explanation"],
+                "type": question["type"],
+            }
+            questions.append(groundtruth)
 
-# Spark is used for aggregating results.
-def create_sparksession() -> SparkSession:
-    """Create a Spark session for processing data."""
-    return SparkSession.builder.getOrCreate()
+    return questions, exams_dict
 
 
-def problem_format(problem_type: str) -> str:
-    """Generate the format description based on the problem type.
+def process_question(groundtruth, model_name, agent_name, exam_id):
+    """Process a single question: prompt LLM and evaluate response.
 
     Args:
-        problem_type (str): The type of the problem (e.g., 'SingleChoice', 'MultipleChoice', 'True/False Questions', 'ShortAnswerQuestion').
+        groundtruth: Question data with correct answer
+        model_name: Name of the LLM model
+        agent_name: Type of agent to use
+        exam_id: Exam identifier
 
     Returns:
-        str: The format description for the problem type.
+        Tuple of (minimal_result, detailed_result)
     """
-    FORMAT_DESCRIPTION_SINGLE_CHOCIE = """
-    This is a Single-choice problem.
-
-    Please return your response in the following JSON format:
-    ```json
-    {"answer": "A", "explanation": "Your explanation here."}
-    ```
-    """
-    FORMAT_DESCRIPTION_MULTIPLE_CHOICE = """
-    This is a MultipleChoice problem.
+    format_instruction = FORMAT_INSTRUCTIONS.get(
+        groundtruth["type"], FORMAT_INSTRUCTIONS["ShortAnswerQuestion"]
+    )
+    system_prompt = (
+        f"You are a university student who has completed the {groundtruth['course']} course. "
+        f"You are now answering a final exam question." + format_instruction
+    )
+    user_prompt = f"Below is the problem description:\n{groundtruth['problem']}"
 
-    Please return your response in the following JSON format:
-    ```json
-    {"answer": "A,B,C", "explanation": "Your explanation here."}
-    ```
+    try:
+        if agent_name == "llm":
+            executor = SimpleExecutor(model_name, system_prompt)
+        else:
+            raise ValueError(f"Unknown agent name: {agent_name}")
 
-    answer is capital letters separated by commas, without Spaces
-    """
+        response_text = executor.run(user_prompt, lang="json")
+        response = json.loads(response_text)
+        llm_answer = str(response.get("answer", ""))
+        llm_explanation = response.get("explanation", "")
 
-    FORMAT_DESCRIPTION_TRUE_FALSE_QUESTIONS = """
-    This is a True/False problem.
+        logger.info(f'Question {groundtruth["instance_id"]}: Answer={llm_answer}')
 
-    Please return your response in the following JSON format:
-    ```json
-    {"answer": "True,False,True", "explanation": "Your explanation here."}
-    ```
+        evaluator = ExamEvaluator()
+        metrics = evaluator.eval(
+            llm_answer=llm_answer, groundtruth=groundtruth, model_name=model_name
+        )
+        points_earned = int(metrics["llm_score"])
+        points_possible = groundtruth["points"]
+        if points_earned == points_possible:
+            status = "correct"
+        elif points_earned > 0:
+            status = "partial"
+        else:
+            status = "incorrect"
+
+        minimal_result = {
+            "instance_id": groundtruth["instance_id"],
+            "exam_id": exam_id,
+            "question_type": groundtruth["type"],
+            "llm_answer": llm_answer,
+            "correct_answer": groundtruth["answer"],
+            "points_earned": points_earned,
+            "points_possible": points_possible,
+            "status": status,
+        }
+        detailed_result = {
+            **minimal_result,
+            "problem": groundtruth["problem"],
+            "llm_explanation": llm_explanation,
+            "correct_explanation": groundtruth["explanation"],
+            "llmjudger_explanation": metrics["llmjudger_explanation"],
+            "system_prompt": system_prompt,
+            "user_prompt": user_prompt,
+        }
+    except Exception as e:
+        logger.error(f"Error processing question {groundtruth['instance_id']}: {e}")
+        minimal_result = {
+            "instance_id": groundtruth["instance_id"],
+            "exam_id": exam_id,
+            "question_type": groundtruth["type"],
+            "llm_answer": None,
+            "correct_answer": groundtruth["answer"],
+            "points_earned": 0,
+            "points_possible": groundtruth["points"],
+            "status": "error",
+            "error": str(e),
+        }
+        detailed_result = {
+            **minimal_result,
+            "problem": groundtruth["problem"],
+            "llm_explanation": None,
+            "correct_explanation": groundtruth["explanation"],
+            "system_prompt": system_prompt,
+            "user_prompt": user_prompt,
+        }
+    return minimal_result, detailed_result
+
+
+def compute_summary(results_df, exams_metadata):
+    """Compute summary statistics using pandas.
 
-    answer is each item corresponds to a sub-question
-    """
-    FORMAT_DESCRIPTION_SHORT_ANSWER_QUESTION = """
-    This is a ShortAnswerQuestion problem.
+    Args:
+        results_df: DataFrame with evaluation results
+        exams_metadata: Dictionary mapping exam_id to exam metadata
 
-    Please return your response in the following JSON format:
-    ```json
-    {"answer": "Youe answer here.", "explanation": "Your explanation here."}
-    ```
+    Returns:
+        Tuple of (summary dict, comparison dict)
     """
+    total_questions = len(results_df)
+    answered = len(results_df[results_df["status"] != "error"])
+    unanswered = total_questions - answered
+    correct = len(results_df[results_df["status"] == "correct"])
+    incorrect = len(results_df[results_df["status"].isin(["incorrect", "partial"])])
+    points_earned = int(results_df["points_earned"].sum())
+    points_possible = int(results_df["points_possible"].sum())
+    summary = {
+        "overall": {
+            "total_questions": total_questions,
+            "answered": answered,
+            "unanswered": unanswered,
+            "correct": correct,
+            "incorrect": incorrect,
+            "points_earned": points_earned,
+            "points_possible": points_possible,
+            "accuracy": round(correct / answered, 3) if answered > 0 else 0,
+            "score_percentage": (
+                round(points_earned / points_possible, 3) if points_possible > 0 else 0
+            ),
+        }
+    }
+    # By exam summary
+    by_exam = []
+    for exam_id in results_df["exam_id"].unique():
+        exam_results = results_df[results_df["exam_id"] == exam_id]
+        exam_meta = exams_metadata.get(exam_id, {})
+        exam_answered = len(exam_results[exam_results["status"] != "error"])
+        exam_correct = len(exam_results[exam_results["status"] == "correct"])
+        exam_points_earned = int(exam_results["points_earned"].sum())
+        exam_points_possible = int(exam_results["points_possible"].sum())
+        by_exam.append(
+            {
+                "exam_id": exam_id,
+                "exam_name": exam_meta.get("test_paper_name", exam_id),
+                "total_questions": len(exam_results),
+                "answered": exam_answered,
+                "correct": exam_correct,
+                "incorrect": exam_answered - exam_correct,
+                "points_earned": exam_points_earned,
+                "points_possible": exam_points_possible,
+                "accuracy": (
+                    round(exam_correct / exam_answered, 3) if exam_answered > 0 else 0
+                ),
+                "score_percentage": (
+                    round(exam_points_earned / exam_points_possible, 3)
+                    if exam_points_possible > 0
+                    else 0
+                ),
+            }
+        )
+    summary["by_exam"] = by_exam
+    # Comparison with student performance
+    comparison = {"exams": []}
+    for exam_id in results_df["exam_id"].unique():
+        exam_results = results_df[results_df["exam_id"] == exam_id]
+        exam_meta = exams_metadata.get(exam_id, {})
+
+        if not exam_meta:
+            continue
+
+        exam_points_earned = int(exam_results["points_earned"].sum())
+        exam_points_possible = int(exam_results["points_possible"].sum())
+        comparison["exams"].append(
+            {
+                "exam_id": exam_id,
+                "exam_name": exam_meta.get("test_paper_name", exam_id),
+                "llm_performance": {
+                    "points_earned": exam_points_earned,
+                    "points_possible": exam_points_possible,
+                    "percentage": (
+                        round(exam_points_earned / exam_points_possible, 3)
+                        if exam_points_possible > 0
+                        else 0
+                    ),
+                },
+                "student_baseline": {
+                    "average_score": exam_meta.get("score_avg", 0),
+                    "max_score": exam_meta.get("score_max", 0),
+                    "median_score": exam_meta.get("score_median", 0),
+                    "total_points": exam_meta.get("score_total", 0),
+                    "average_percentage": (
+                        round(
+                            exam_meta.get("score_avg", 0)
+                            / exam_meta.get("score_total", 1),
+                            3,
+                        )
+                        if exam_meta.get("score_total", 0) > 0
+                        else 0
+                    ),
+                },
+            }
+        )
+    return summary, comparison
 
-    if problem_type == 'SingleChoice':
-        format_description = FORMAT_DESCRIPTION_SINGLE_CHOCIE
-    elif problem_type == 'MultipleChoice':
-        format_description = FORMAT_DESCRIPTION_MULTIPLE_CHOICE
-    elif problem_type == 'True/False Questions':
-        format_description = FORMAT_DESCRIPTION_TRUE_FALSE_QUESTIONS
-    elif problem_type == 'ShortAnswerQuestion':
-        format_description = FORMAT_DESCRIPTION_SHORT_ANSWER_QUESTION
-    else:
-        raise ValueError(f'Unknown problem type: {problem_type}')
-
-    return format_description
-
-
-def main(input_file, output_dir, model_name, agent_name):
-    """Main function for running the course exam benchmark."""
-    results = []
-    with open(input_file, encoding='utf-8') as f:
-        data = [json.loads(line) for line in f]
-
-    for groundtruth in data:
-        try:
-            logger.info(f"============ {groundtruth['instance_id']} ============")
 
-            problem_type = groundtruth['type']
-            format_description = problem_format(problem_type)
+def main(data_dir, output_dir, model_name, agent_name):
+    """Run the course exam benchmark.
 
-            system_prompt = (
-                f"You are a university student who has completed the {groundtruth['course']} course. You are now answering a final exam question."
-                + format_description
+    Args:
+        data_dir: Directory containing benchmark data files
+        output_dir: Directory to save results
+        model_name: Name of the LLM model
+        agent_name: Type of agent to use
+    """
+    logger.info("Loading benchmark data...")
+    questions, exams_metadata = load_benchmark_data(data_dir)
+    logger.info(f"Loaded {len(questions)} questions from {len(exams_metadata)} exams")
+    minimal_results = []
+    detailed_results = []
+
+    # Streaming
+    results_file = os.path.join(output_dir, "results.jsonl")
+    detailed_file = os.path.join(output_dir, "results_detailed.jsonl")
+
+    with open(results_file, "w", encoding="utf-8") as f_minimal, open(
+        detailed_file, "w", encoding="utf-8"
+    ) as f_detailed:
+        for groundtruth in questions:
+            logger.info(f"========== Question {groundtruth['instance_id']} ==========")
+            minimal_result, detailed_result = process_question(
+                groundtruth, model_name, agent_name, groundtruth["exam_id"]
             )
-            user_prompt = 'Below is the problem description:\n' + groundtruth['problem']
-
-            if agent_name == 'llm':
-                executor = SimpleExecutor(model_name, system_prompt)
-            else:
-                # You can add more agents here
-                raise ValueError(f'Unknown agent name: {agent_name}')
-            response = executor.run(user_prompt, lang='json')
-            response = json.loads(response)
-
-            answer = str(response.get('answer', ''))
-            explanation = response.get('explanation', '')
-            logger.info(f'Model Answer: {answer}')
-            logger.info(f'Model Explanation: {explanation}')
-
-            evaluator = ExamEvaluator()
-            offline_metrics = evaluator.eval(llm_answer=answer, groundtruth=groundtruth, model_name=model_name)
-
-            result = {
-                'id': groundtruth['instance_id'],
-                'test_paper_name': groundtruth['test_paper_name'],
-                'course': groundtruth['course'],
-                'year': groundtruth['year'],
-                'problem_num': groundtruth['problem_num'],
-                'points': groundtruth['points'],
-                'score_total': groundtruth['score_total'],
-                'score_max': float(groundtruth['score_max']),
-                'score_avg': float(groundtruth['score_avg']),
-                'score_median': float(groundtruth['score_median']),
-                'problem': groundtruth['problem'],
-                'type': groundtruth['type'],
-                'answer': groundtruth['answer'],
-                'llm_answer': answer,
-                'explanation': groundtruth['explanation'],
-                'llm_explanation': explanation,
-                'llm_score': int(offline_metrics['llm_score']),
-                'llmjudger_explanation': offline_metrics['llmjudger_explanation'],
-                'llmjudger_system_prompt': offline_metrics['llmjudger_system_prompt'],
-                'system_prompt': system_prompt,
-                'user_prompt': user_prompt,
-            }
-            results.append(result)
-
-            logger.info('Evaluation Result:')
-            logger.info(result)
-
-        except Exception as e:
-            logger.error(f"Error processing instance {groundtruth['instance_id']}: {e}")
-            result = {
-                'id': groundtruth['instance_id'],
-                'test_paper_name': groundtruth['test_paper_name'],
-                'course': groundtruth['course'],
-                'year': groundtruth['year'],
-                'problem_num': groundtruth['problem_num'],
-                'points': groundtruth['points'],
-                'score_total': groundtruth['score_total'],
-                'score_max': float(groundtruth['score_max']),
-                'score_avg': float(groundtruth['score_avg']),
-                'score_median': float(groundtruth['score_median']),
-                'problem': groundtruth['problem'],
-                'type': groundtruth['type'],
-                'answer': groundtruth['answer'],
-                'llm_answer': None,
-                'explanation': groundtruth['explanation'],
-                'llm_explanation': None,
-                'llm_score': 0,
-                'llmjudger_explanation': None,
-                'llmjudger_system_prompt': None,
-                'system_prompt': system_prompt,
-                'user_prompt': user_prompt,
-                'error': str(e),
-            }
-            results.append(result)
-
-        with open(os.path.join(output_dir, 'result.jsonl'), 'a+', encoding='utf-8') as output_file:
-            output_file.write(json.dumps(result))
-            output_file.write('\n')
-
-    spark = create_sparksession()
-
-    data_schema = StructType(
-        [
-            StructField('answer', StringType(), True),
-            StructField('cosine_similarity', FloatType(), True),
-            StructField('course', StringType(), True),
-            StructField('embeddings_similarity', FloatType(), True),
-            StructField('exact_match', FloatType(), True),
-            StructField('explanation', StringType(), True),
-            StructField('id', LongType(), True),
-            StructField('jaccard_similarity', FloatType(), True),
-            StructField('llm_answer', StringType(), True),
-            StructField('llm_score', IntegerType(), True),
-            StructField('llm_explanation', StringType(), True),
-            StructField('llmjudger_explanation', StringType(), True),
-            StructField('llmjudger_system_prompt', StringType(), True),
-            StructField('points', LongType(), True),
-            StructField('score_total', LongType(), True),
-            StructField('score_max', FloatType(), True),
-            StructField('score_avg', FloatType(), True),
-            StructField('score_median', FloatType(), True),
-            StructField('problem', StringType(), True),
-            StructField('problem_num', LongType(), True),
-            StructField('system_prompt', StringType(), True),
-            StructField('test_paper_name', StringType(), True),
-            StructField('type', StringType(), True),
-            StructField('user_prompt', StringType(), True),
-            StructField('year', LongType(), True),
-        ]
-    )
-    score_data = (
-        spark.createDataFrame(results, schema=data_schema)
-        .groupBy(F.lit(1))
-        .agg(
-            F.count('id').alias('question_count'),
-            F.sum('points').alias('full_score'),
-            F.sum('llm_score').alias('llm_score'),
-        )
-        .drop('1')
-        .toPandas()
-        .to_dict(orient='records')[0]
-    )
-    ref_data = (
-        spark.read.json(input_file)
-        .groupBy('test_paper_name')
-        .agg(
-            F.max('score_avg').alias('score_avg'),
-            F.max('score_total').alias('score_total'),
-            F.count('instance_id').alias('question_count'),
-        )
-        .groupBy(F.lit(1))
-        .agg(
-            F.sum('question_count').alias('question_count'),
-            F.sum('score_total').alias('full_score'),
-            F.sum('score_avg').alias('avg_score'),
-        )
-        .drop('1')
-        .toPandas()
-        .to_dict(orient='records')[0]
-    )
+            minimal_results.append(minimal_result)
+            detailed_results.append(detailed_result)
+            f_minimal.write(json.dumps(minimal_result, ensure_ascii=False) + "\n")
+            f_detailed.write(json.dumps(detailed_result, ensure_ascii=False) + "\n")
 
-    score_by_test_paper_data = (
-        spark.createDataFrame(results, schema=data_schema)
-        .groupBy('test_paper_name')
-        .agg(
-            F.count('id').alias('question_count'),
-            F.sum('llm_score').alias('llm_score'),
-            F.sum('points').alias('full_score'),
-            F.max('score_total').alias('reference_score_total'),
-            F.max('score_max').alias('reference_score_max'),
-            F.max('score_avg').alias('reference_score_avg'),
-            F.max('score_median').alias('reference_score_median'),
-        )
-        .toPandas()
-        .to_dict(orient='records')
-    )
-    summary_data = {
-        'reference': ref_data,
-        'score': score_data,
-        'score_by_test_paper': score_by_test_paper_data,
-        'final_score': score_data['llm_score'] / float(ref_data['full_score']),
-    }
+    results_df = pd.DataFrame(minimal_results)
+    summary, comparison = compute_summary(results_df, exams_metadata)
 
-    with open(os.path.join(output_dir, 'avg_score.json'), 'w', encoding='utf-8') as summary_file:
-        json.dump(summary_data, summary_file, indent=4)
+    summary_file = os.path.join(output_dir, "summary.json")
+    with open(summary_file, "w", encoding="utf-8") as f:
+        json.dump(summary, f, indent=2, ensure_ascii=False)
+    comparison_file = os.path.join(output_dir, "comparison.json")
+    with open(comparison_file, "w", encoding="utf-8") as f:
+        json.dump(comparison, f, indent=2, ensure_ascii=False)
 
-    logger.info('************ Final average score ************')
 
+if __name__ == "__main__":
+    # Load LLM endpoint configuration
+    set_llm_endpoint_from_config("env.toml")
 
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='example benchmark')
+    parser = argparse.ArgumentParser(description="Course Exam Benchmark")
     parser.add_argument(
-        '-i',
-        '--input_file',
-        help='Benchmark input file',
-        default='./data/benchmark/SystemTestPaper.jsonl',
+        "-d",
+        "--data_dir",
+        help="Directory containing exams_metadata.json and questions.jsonl",
+        default="./data/benchmark",
     )
-    parser.add_argument('-o', '--save_path', help='Result save path', default=None)
-    parser.add_argument('-a', '--agent', help='Agent Name', default='llm')
     parser.add_argument(
-        '-m',
-        '--model_name',
-        help='Model Name',
+        "-o", "--output_dir", help="Output directory for results", default=None
     )
-    # Note that if your benchmark has multiple tasks, you need to add --task <task>
-    # in your code to enable task selection.
-    parser.add_argument('-t', '--task', help='specify task in scenarios', default=None)
-
+    parser.add_argument("-a", "--agent", help="Agent type", default="llm")
+    parser.add_argument("-m", "--model_name", help="Model name", required=True)
     args = parser.parse_args()
+    if args.output_dir is None:
+        model_name_safe = args.model_name.replace("/", "_")
+        timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
+        output_dir = os.path.join(
+            "./outputs", f"course_exam__{model_name_safe}__{args.agent}__{timestamp}"
+        )
+    else:
+        output_dir = args.output_dir
+    output_dir = os.path.abspath(os.path.expanduser(output_dir))
+    os.makedirs(output_dir, exist_ok=True)
 
-    model_name = args.model_name
-    input_file = args.input_file
-    save_path = args.save_path
-
-    if save_path is None:
-        str_model_name = model_name.replace('/', '_')
-        timestamp = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
-        save_path = os.path.join('./outputs', f'systemcourseexam__{str_model_name}__{args.agent}__{timestamp}')
-
-    save_path = os.path.abspath(os.path.expanduser(save_path))
-    os.makedirs(save_path, exist_ok=True)
-
-    main(input_file, save_path, model_name, agent_name=args.agent)
+    main(args.data_dir, output_dir, args.model_name, args.agent)
diff --git a/benchmarks/course_exam_bench/test.sh b/benchmarks/course_exam_bench/test.sh
index 00820da4..a35574b0 100755
--- a/benchmarks/course_exam_bench/test.sh
+++ b/benchmarks/course_exam_bench/test.sh
@@ -2,9 +2,9 @@
 
 set -e  # Exit immediately on error.
 
-source envexamplebench/bin/activate
+source .venv/bin/activate
 pytest --version
-pytest
+pytest tests/ -v
 deactivate
 
-echo "==> ExampleBench test is done successfully."
+echo "==> CourseExamBench tests completed successfully."
diff --git a/benchmarks/course_exam_bench/tests/test_benchmark.py b/benchmarks/course_exam_bench/tests/test_benchmark.py
new file mode 100644
index 00000000..2d45ab44
--- /dev/null
+++ b/benchmarks/course_exam_bench/tests/test_benchmark.py
@@ -0,0 +1,282 @@
+"""Tests for the course exam benchmark."""
+
+import json
+import os
+import tempfile
+import unittest
+from pathlib import Path
+from unittest.mock import Mock, patch
+import sys
+
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent))
+
+
+class TestBenchmarkData(unittest.TestCase):
+    def setUp(self):
+        self.benchmark_dir = Path(__file__).parent.parent / "data" / "benchmark"
+        self.metadata_path = self.benchmark_dir / "exams_metadata.json"
+        self.questions_path = self.benchmark_dir / "questions.jsonl"
+
+    def test_required_files_exist(self):
+        self.assertTrue(
+            self.metadata_path.exists(),
+            f"Exam metadata file not found: {self.metadata_path}",
+        )
+        self.assertTrue(
+            self.questions_path.exists(),
+            f"Questions file not found: {self.questions_path}",
+        )
+
+    def test_metadata_schema(self):
+        with open(self.metadata_path, encoding="utf-8") as f:
+            metadata = json.load(f)
+        self.assertIn("exams", metadata, "Metadata must have 'exams' key")
+        self.assertIsInstance(metadata["exams"], list, "'exams' must be a list")
+        self.assertGreater(len(metadata["exams"]), 0, "Must have at least one exam")
+        required_fields = [
+            "exam_id",
+            "test_paper_name",
+            "course",
+            "year",
+            "score_total",
+            "score_max",
+            "score_avg",
+            "score_median",
+            "score_standard_deviation",
+            "num_questions",
+        ]
+
+        for i, exam in enumerate(metadata["exams"]):
+            for field in required_fields:
+                self.assertIn(field, exam, f'Exam {i}: missing field "{field}"')
+
+            # Check field types
+            self.assertIsInstance(exam["exam_id"], str)
+            self.assertIsInstance(exam["test_paper_name"], str)
+            self.assertIsInstance(exam["course"], str)
+            self.assertIsInstance(exam["year"], int)
+            self.assertIsInstance(exam["num_questions"], int)
+            # Check value constraints
+            self.assertGreater(exam["num_questions"], 0)
+            # I can't think of any other reasonable constraints for the scores for now
+
+    def test_questions_schema(self):
+        required_fields = [
+            "instance_id",
+            "exam_id",
+            "problem_num",
+            "points",
+            "problem",
+            "answer",
+            "explanation",
+            "type",
+        ]
+        valid_types = [
+            "SingleChoice",
+            "MultipleChoice",
+            "True/False Questions",
+            "ShortAnswerQuestion",
+        ]
+        instance_ids = set()
+        question_count = 0
+        with open(self.questions_path, encoding="utf-8") as f:
+            for line_num, line in enumerate(f, 1):
+                question = json.loads(line)
+                question_count += 1
+                for field in required_fields:
+                    self.assertIn(
+                        field, question, f'Line {line_num}: missing field "{field}"'
+                    )
+                self.assertIsInstance(question["instance_id"], int)
+                self.assertIsInstance(question["exam_id"], str)
+                self.assertIsInstance(question["problem_num"], int)
+                self.assertIsInstance(question["points"], int)
+                self.assertIsInstance(question["problem"], str)
+                self.assertIsInstance(question["answer"], str)
+                self.assertIsInstance(question["explanation"], str)
+                self.assertIsInstance(question["type"], str)
+                self.assertNotIn(
+                    question["instance_id"],
+                    instance_ids,
+                    f'Line {line_num}: duplicate instance_id {question["instance_id"]}',
+                )
+                instance_ids.add(question["instance_id"])
+                self.assertIn(
+                    question["type"],
+                    valid_types,
+                    f'Line {line_num}: invalid type "{question["type"]}"',
+                )
+                self.assertGreater(len(question["problem"]), 0)
+                self.assertGreater(len(question["answer"]), 0)
+        self.assertGreater(question_count, 0, "Must have at least one question")
+
+    def test_data_integrity(self):
+        """Test that metadata and questions are consistent."""
+        with open(self.metadata_path, encoding="utf-8") as f:
+            metadata = json.load(f)
+        exams_dict = {exam["exam_id"]: exam for exam in metadata["exams"]}
+        # Load questions and count per exam
+        question_counts = {}
+        with open(self.questions_path, encoding="utf-8") as f:
+            for line in f:
+                question = json.loads(line)
+                exam_id = question["exam_id"]
+                # Check that exam_id exists in metadata
+                self.assertIn(
+                    exam_id,
+                    exams_dict,
+                    f"Question references non-existent exam_id: {exam_id}",
+                )
+                question_counts[exam_id] = question_counts.get(exam_id, 0) + 1
+        # Verify question counts match metadata
+        for exam_id, exam in exams_dict.items():
+            actual_count = question_counts.get(exam_id, 0)
+            expected_count = exam["num_questions"]
+            self.assertEqual(
+                actual_count,
+                expected_count,
+                f"Exam {exam_id}: metadata says {expected_count} questions, but found {actual_count}",
+            )
+
+    def test_questions_sorted(self):
+        """Test that questions are sorted by exam_id then instance_id."""
+        questions = []
+        with open(self.questions_path, encoding="utf-8") as f:
+            for line in f:
+                questions.append(json.loads(line))
+        for i in range(len(questions) - 1):
+            curr = questions[i]
+            next_q = questions[i + 1]
+            # If same exam, instance_id should be increasing or equal
+            if curr["exam_id"] == next_q["exam_id"]:
+                self.assertLessEqual(
+                    curr["instance_id"],
+                    next_q["instance_id"],
+                    f'Questions not sorted within exam {curr["exam_id"]}: '
+                    f'instance_id {curr["instance_id"]} comes before {next_q["instance_id"]}',
+                )
+
+
+class TestBenchmarkOutput(unittest.TestCase):
+    def test_output_format(self):
+        """Test that benchmark generates expected output files with correct format."""
+        import main
+        from sdk.evaluator import ExamEvaluator
+        from sdk.executor import SimpleExecutor
+
+        with tempfile.TemporaryDirectory() as temp_dir:
+            # Mock the LLM executor to avoid actual API calls
+            with patch.object(SimpleExecutor, "__init__", return_value=None):
+                with patch.object(
+                    SimpleExecutor,
+                    "run",
+                    return_value='{"answer": "A", "explanation": "Test"}',
+                ):
+                    # Mock the evaluator
+                    with patch.object(ExamEvaluator, "__init__", return_value=None):
+                        with patch.object(
+                            ExamEvaluator,
+                            "eval",
+                            return_value={
+                                "llm_score": 5,
+                                "llmjudger_explanation": None,
+                                "llmjudger_system_prompt": None,
+                            },
+                        ):
+                            data_dir = str(
+                                Path(__file__).parent.parent / "data" / "benchmark"
+                            )
+                            main.main(data_dir, temp_dir, "test-model", "llm")
+            expected_files = [
+                "results.jsonl",
+                "results_detailed.jsonl",
+                "summary.json",
+                "comparison.json",
+            ]
+            for filename in expected_files:
+                filepath = Path(temp_dir) / filename
+                self.assertTrue(
+                    filepath.exists(),
+                    f"Expected output file not found: {filename}",
+                )
+
+            # results.jsonl format
+            results_file = Path(temp_dir) / "results.jsonl"
+            with open(results_file, encoding="utf-8") as f:
+                for line_num, line in enumerate(f, 1):
+                    result = json.loads(line)
+                    required_fields = [
+                        "instance_id",
+                        "exam_id",
+                        "question_type",
+                        "llm_answer",
+                        "correct_answer",
+                        "points_earned",
+                        "points_possible",
+                        "status",
+                    ]
+                    for field in required_fields:
+                        self.assertIn(
+                            field, result, f'Line {line_num}: missing field "{field}"'
+                        )
+                    # status is valid
+                    valid_statuses = ["correct", "incorrect", "partial", "error"]
+                    self.assertIn(result["status"], valid_statuses)
+            # results_detailed.jsonl format
+            detailed_file = Path(temp_dir) / "results_detailed.jsonl"
+            with open(detailed_file, encoding="utf-8") as f:
+                for line_num, line in enumerate(f, 1):
+                    result = json.loads(line)
+                    # has the additional fields
+                    additional_fields = [
+                        "problem",
+                        "llm_explanation",
+                        "correct_explanation",
+                        "system_prompt",
+                        "user_prompt",
+                    ]
+                    for field in additional_fields:
+                        self.assertIn(
+                            field,
+                            result,
+                            f'Detailed line {line_num}: missing field "{field}"',
+                        )
+            # summary.json format
+            summary_file = Path(temp_dir) / "summary.json"
+            with open(summary_file, encoding="utf-8") as f:
+                summary = json.load(f)
+                self.assertIn("overall", summary)
+                overall_fields = [
+                    "total_questions",
+                    "answered",
+                    "unanswered",
+                    "correct",
+                    "incorrect",
+                    "points_earned",
+                    "points_possible",
+                    "accuracy",
+                    "score_percentage",
+                ]
+                for field in overall_fields:
+                    self.assertIn(field, summary["overall"])
+
+                self.assertIn("by_exam", summary)
+                self.assertIsInstance(summary["by_exam"], list)
+
+            # comparison.json format
+            comparison_file = Path(temp_dir) / "comparison.json"
+            with open(comparison_file, encoding="utf-8") as f:
+                comparison = json.load(f)
+                self.assertIn("exams", comparison)
+                self.assertIsInstance(comparison["exams"], list)
+                if len(comparison["exams"]) > 0:
+                    exam = comparison["exams"][0]
+                    self.assertIn("exam_id", exam)
+                    self.assertIn("exam_name", exam)
+                    self.assertIn("llm_performance", exam)
+                    self.assertIn("student_baseline", exam)
+
+
+if __name__ == "__main__":
+    unittest.main()