Integrate SysMoBench benchmark via Git Subtree #10

Qian-Cheng-nju · 2025-11-14T12:39:18Z

Summary

Add SysMoBench benchmark - evaluates AI models' ability to generate correct TLA+ formal specifications for real-world concurrent and distributed systems.

…m changes

… 119a763 git-subtree-dir: benchmarks/sysmobench/sysmobench_core git-subtree-split: 119a763435c4b7266a27cc8f6741239d0f4e4347

…s/sysmobench/sysmobench_core'

- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem

xuafeng

Thanks a lot for your great contribution to this project. The integration is great. I left a few suggestions.

benchmarks/sysmobench/src/main.py

xuafeng · 2025-11-14T23:36:34Z

README.md

 ### Contribute to existing Benchmarks
 The easiest way to contribute is to add more tasks to existing benchmarks. For example, you can add more questions to the course exam benchmark or more projects to the course project benchmark. You can add more system algorithm design problems into algorithm design benchmark. Please follow the existing format and structure for adding new tasks. You can also improve the existing benchmarks by adding more advanced evaluators with improved metrics.

 ### Creating New Benchmarks


As discussed in the slack, can you please help add a section called "### Porting your benchmark", and using SysMoBench as example? Thanks a lot.

I’ve written a draft of the porting benchmark guide. When you have time, I would greatly appreciate your review. Please feel free to let me know if anything should be improved. Thank you very much! See commit 72b6916

xuafeng · 2025-11-14T23:39:41Z

Resolves #7

…ench Integrate SysMoBench benchmark via Git Subtree

Integrate SysMoBench benchmark via Git Subtree

Qian-Cheng-nju added 15 commits November 7, 2025 18:07

Init: Initialize SysMoBench benchmark integration

87db1e9

feat: Add gitigore

69f4cb5

feat: Add prototype for phase 1&2

843f031

doc: update README.md

ca7e72e

feat: Add test

60d30e0

featr: Add install.sh

ff96313

fix: Use tla_specification instead of generated_text to adapt upstrea…

5ef835f

…m changes

Squashed 'benchmarks/sysmobench/sysmobench_core/' content from commit…

0490016

… 119a763 git-subtree-dir: benchmarks/sysmobench/sysmobench_core git-subtree-split: 119a763435c4b7266a27cc8f6741239d0f4e4347

Merge commit '04900168e10834f3aa5eef4d13b318e1efcdac24' as 'benchmark…

c5dfbb1

…s/sysmobench/sysmobench_core'

fix: Add gpt-4o config and fix cross-device link issue in setup_tools

a68e171

- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem

fix: Convert GenerationOutput to GenerationResult for evaluators

984336a

docs: Update README and install script for Git Subtree integration

68025ca

feat: Add docker file

25c6af8

fix: Add env.toml

97c1c3c

feat: Add tasks

76eb459

Qian-Cheng-nju changed the title ~~Sysmobench~~ Integrate SysMoBench benchmark via Git Subtree Nov 14, 2025

Merge branch 'main' into sysmobench

321320a

xuafeng requested review from tareknaser and tianyin and removed request for tareknaser November 14, 2025 22:26

xuafeng requested changes Nov 14, 2025

View reviewed changes

Qian-Cheng-nju and others added 3 commits November 15, 2025 11:24

fix: Separate the executor and evaluator for further extensibility

061b7ad

doc: Add porting benchmark guide

72b6916

doc: small updates

1a7f1b1

xuafeng merged commit faa1ff3 into main Nov 17, 2025
1 check passed

xuafeng deleted the sysmobench branch November 17, 2025 18:10

Couen pushed a commit to Couen/system-intelligence-benchmark that referenced this pull request Jan 22, 2026

Merge pull request sys-intelligence#10 from systemintelligence/sysmob…

6dbce76

…ench Integrate SysMoBench benchmark via Git Subtree

tareknaser pushed a commit that referenced this pull request Feb 5, 2026

Merge pull request #10 from systemintelligence/sysmobench

9280757

Integrate SysMoBench benchmark via Git Subtree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate SysMoBench benchmark via Git Subtree #10

Integrate SysMoBench benchmark via Git Subtree #10

Uh oh!

Qian-Cheng-nju commented Nov 14, 2025 •

edited

Loading

Uh oh!

xuafeng left a comment

Uh oh!

Uh oh!

xuafeng Nov 14, 2025

Uh oh!

Qian-Cheng-nju Nov 17, 2025

Uh oh!

xuafeng commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Integrate SysMoBench benchmark via Git Subtree #10

Integrate SysMoBench benchmark via Git Subtree #10

Uh oh!

Conversation

Qian-Cheng-nju commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

xuafeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xuafeng Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Qian-Cheng-nju Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

xuafeng commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qian-Cheng-nju commented Nov 14, 2025 •

edited

Loading