-
Notifications
You must be signed in to change notification settings - Fork 10
Integrate SysMoBench benchmark via Git Subtree #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… 119a763 git-subtree-dir: benchmarks/sysmobench/sysmobench_core git-subtree-split: 119a763435c4b7266a27cc8f6741239d0f4e4347
…s/sysmobench/sysmobench_core'
- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem
xuafeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your great contribution to this project. The integration is great. I left a few suggestions.
| ### Contribute to existing Benchmarks | ||
| The easiest way to contribute is to add more tasks to existing benchmarks. For example, you can add more questions to the course exam benchmark or more projects to the course project benchmark. You can add more system algorithm design problems into algorithm design benchmark. Please follow the existing format and structure for adding new tasks. You can also improve the existing benchmarks by adding more advanced evaluators with improved metrics. | ||
|
|
||
| ### Creating New Benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the slack, can you please help add a section called "### Porting your benchmark", and using SysMoBench as example? Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve written a draft of the porting benchmark guide. When you have time, I would greatly appreciate your review. Please feel free to let me know if anything should be improved. Thank you very much! See commit 72b6916
|
Resolves #7 |
…ench Integrate SysMoBench benchmark via Git Subtree
Integrate SysMoBench benchmark via Git Subtree
Summary
Add SysMoBench benchmark - evaluates AI models' ability to generate correct TLA+ formal specifications for real-world concurrent and distributed systems.