Merge pull request #13 from sys-intelligence/fix

xuafeng · web-flow · commit 73fc32ca067c · 2025-11-17T19:43:34.000-08:00
Fix the github org name change
diff --git a/README.md b/README.md
@@ -43,7 +43,7 @@ System Intelligence Benchmark currently includes the following example benchmark
 1. Clone the repository:
 
    ```bash
-   git clone https://github.com/systemintelligence/system_intelligence_benchmark.git
+   git clone https://github.com/sys-intelligence/system_intelligence_benchmark.git
    cd system_intelligence_benchmark
    ```
 
@@ -95,8 +95,8 @@ We welcome community contributions to enrich existing benchmarks (e.g., by addin
 
 ### Contribute to Existing Benchmarks
 The easiest way to contribute is to add more tasks to existing benchmarks. Currently, the following two are highly recommended. You can simply follow the provided guidelines to submit your data—once that’s done, you’re all set.
-- **SystemExam**: If you are a professor teaching one or more courses, we highly recommend contributing **more exam problems** to SystemExam (see [this doc](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench#how-to-extend-the-benchmark) for step-by-step guidance).
-- **SystemArtifact**: If you are a researcher submitting artifacts, or an AE chair involved in artifact evaluation, we highly recommend contributing **more system artifacts** to SystemArtifact (see [this doc](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/arteval_bench/README.md) for step-by-step guidance).
+- **SystemExam**: If you are a professor teaching one or more courses, we highly recommend contributing **more exam problems** to SystemExam (see [this doc](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench#how-to-extend-the-benchmark) for step-by-step guidance).
+- **SystemArtifact**: If you are a researcher submitting artifacts, or an AE chair involved in artifact evaluation, we highly recommend contributing **more system artifacts** to SystemArtifact (see [this doc](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/arteval_bench/README.md) for step-by-step guidance).
 
 In addition, you can also help review the existing benchmarks to propose improvement ideas or directly enhance them—for example, by adding more advanced evaluators or incorporating improved metrics.
 
diff --git a/doc/creating_benchmark.md b/doc/creating_benchmark.md
@@ -14,9 +14,9 @@ Before creating a custom benchmark, ensure you have:
 
 Choose an example benchmark that **is similar to** your setting as a starting point. 
 
-If your tasks involve exam-style questions, consider starting from [course_exam_bench](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench). If your benchmark focuses on algorithm design or optimization tasks, you might use [algo_cache_bench](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/algo_cache_bench) as a template. These tasks can often be handled by a minimal agent (an LLM call plus a response parser).
+If your tasks involve exam-style questions, consider starting from [course_exam_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench). If your benchmark focuses on algorithm design or optimization tasks, you might use [algo_cache_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/algo_cache_bench) as a template. These tasks can often be handled by a minimal agent (an LLM call plus a response parser).
 
-Use [course_lab_bench](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench), if your benchmark is related to **environment setup, system understanding/implementation, performance analysis, or debugging tasks**, and each task may need different runing environments. These tasks typically require an LLM to autonomously call tools (such as the File Editor, Bash, etc.), navigate a large codebase, and run experiments or tests—similar to what a human developer would do. To support this, we provide several advanced agents (e.g., Claude Code, MiniSWEAgent) in this example, along with guidance for [integrating new agents](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/add_agents.md).
+Use [course_lab_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench), if your benchmark is related to **environment setup, system understanding/implementation, performance analysis, or debugging tasks**, and each task may need different runing environments. These tasks typically require an LLM to autonomously call tools (such as the File Editor, Bash, etc.), navigate a large codebase, and run experiments or tests—similar to what a human developer would do. To support this, we provide several advanced agents (e.g., Claude Code, MiniSWEAgent) in this example, along with guidance for [integrating new agents](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/add_agents.md).
 
 1. Navigate to the benchmarks directory:
 
@@ -72,7 +72,7 @@ Create your evaluation dataset in a structured format:
    - `user_prompt`: User query/task description
    - `response`: Expected/ground truth response
 
-3. **NOTES:** for more complex scenarios, you can use **any custom formats**. See [course_exam_bench](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/course_exam_bench/data/benchmark/questions.jsonl) and [course_lab_bench](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/data/benchmark/env_setup_examples.jsonl) for examples.
+3. **NOTES:** for more complex scenarios, you can use **any custom formats**. See [course_exam_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/course_exam_bench/data/benchmark/questions.jsonl) and [course_lab_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/data/benchmark/env_setup_examples.jsonl) for examples.
 
 ## Step 3: Select or Implement Your Executor and Evaluator