Skip to content

Commit 73fc32c

Browse files
authored
Merge pull request #13 from sys-intelligence/fix
Fix the github org name change
2 parents b839fb2 + f88182d commit 73fc32c

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ System Intelligence Benchmark currently includes the following example benchmark
4343
1. Clone the repository:
4444

4545
```bash
46-
git clone https://github.com/systemintelligence/system_intelligence_benchmark.git
46+
git clone https://github.com/sys-intelligence/system_intelligence_benchmark.git
4747
cd system_intelligence_benchmark
4848
```
4949

@@ -95,8 +95,8 @@ We welcome community contributions to enrich existing benchmarks (e.g., by addin
9595
9696
### Contribute to Existing Benchmarks
9797
The easiest way to contribute is to add more tasks to existing benchmarks. Currently, the following two are highly recommended. You can simply follow the provided guidelines to submit your data—once that’s done, you’re all set.
98-
- **SystemExam**: If you are a professor teaching one or more courses, we highly recommend contributing **more exam problems** to SystemExam (see [this doc](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench#how-to-extend-the-benchmark) for step-by-step guidance).
99-
- **SystemArtifact**: If you are a researcher submitting artifacts, or an AE chair involved in artifact evaluation, we highly recommend contributing **more system artifacts** to SystemArtifact (see [this doc](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/arteval_bench/README.md) for step-by-step guidance).
98+
- **SystemExam**: If you are a professor teaching one or more courses, we highly recommend contributing **more exam problems** to SystemExam (see [this doc](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench#how-to-extend-the-benchmark) for step-by-step guidance).
99+
- **SystemArtifact**: If you are a researcher submitting artifacts, or an AE chair involved in artifact evaluation, we highly recommend contributing **more system artifacts** to SystemArtifact (see [this doc](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/arteval_bench/README.md) for step-by-step guidance).
100100

101101
In addition, you can also help review the existing benchmarks to propose improvement ideas or directly enhance them—for example, by adding more advanced evaluators or incorporating improved metrics.
102102

doc/creating_benchmark.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ Before creating a custom benchmark, ensure you have:
1414

1515
Choose an example benchmark that **is similar to** your setting as a starting point.
1616

17-
If your tasks involve exam-style questions, consider starting from [course_exam_bench](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench). If your benchmark focuses on algorithm design or optimization tasks, you might use [algo_cache_bench](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/algo_cache_bench) as a template. These tasks can often be handled by a minimal agent (an LLM call plus a response parser).
17+
If your tasks involve exam-style questions, consider starting from [course_exam_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench). If your benchmark focuses on algorithm design or optimization tasks, you might use [algo_cache_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/algo_cache_bench) as a template. These tasks can often be handled by a minimal agent (an LLM call plus a response parser).
1818

19-
Use [course_lab_bench](https://github.com/systemintelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench), if your benchmark is related to **environment setup, system understanding/implementation, performance analysis, or debugging tasks**, and each task may need different runing environments. These tasks typically require an LLM to autonomously call tools (such as the File Editor, Bash, etc.), navigate a large codebase, and run experiments or tests—similar to what a human developer would do. To support this, we provide several advanced agents (e.g., Claude Code, MiniSWEAgent) in this example, along with guidance for [integrating new agents](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/add_agents.md).
19+
Use [course_lab_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/tree/main/benchmarks/course_exam_bench), if your benchmark is related to **environment setup, system understanding/implementation, performance analysis, or debugging tasks**, and each task may need different runing environments. These tasks typically require an LLM to autonomously call tools (such as the File Editor, Bash, etc.), navigate a large codebase, and run experiments or tests—similar to what a human developer would do. To support this, we provide several advanced agents (e.g., Claude Code, MiniSWEAgent) in this example, along with guidance for [integrating new agents](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/add_agents.md).
2020

2121
1. Navigate to the benchmarks directory:
2222

@@ -72,7 +72,7 @@ Create your evaluation dataset in a structured format:
7272
- `user_prompt`: User query/task description
7373
- `response`: Expected/ground truth response
7474

75-
3. **NOTES:** for more complex scenarios, you can use **any custom formats**. See [course_exam_bench](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/course_exam_bench/data/benchmark/questions.jsonl) and [course_lab_bench](https://github.com/systemintelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/data/benchmark/env_setup_examples.jsonl) for examples.
75+
3. **NOTES:** for more complex scenarios, you can use **any custom formats**. See [course_exam_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/course_exam_bench/data/benchmark/questions.jsonl) and [course_lab_bench](https://github.com/sys-intelligence/system_intelligence_benchmark/blob/main/benchmarks/course_lab_bench/data/benchmark/env_setup_examples.jsonl) for examples.
7676

7777
## Step 3: Select or Implement Your Executor and Evaluator
7878

0 commit comments

Comments
 (0)