-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Add math benchmarks #1570
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @hallerite ,great work! but the docstring need to be polished ,please refer to:
https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#guideline-for-writing-docstrings
|
||
|
||
class GSM8KBenchmark(MathBenchmark): | ||
"""Benchmark for evaluating ChatAgents on the GSM8K dataset from Hugging Face Hub.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Benchmark for evaluating ChatAgents on the GSM8K dataset from Hugging Face Hub.""" | |
r"""Benchmark for evaluating ChatAgents on the GSM8K dataset from Hugging Face Hub.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a docstring optimize example
can we add an example under the example file directory? |
…oning data with thought process (Long Cot data)from deepseek R1 (#1532) Co-authored-by: “yifeng.wang” <“[email protected];q:wqqgit config --global user.name “yifeng.wang”git config --global user.email “[email protected]> Co-authored-by: Wendong <[email protected]> Co-authored-by: Wendong-Fan <[email protected]>
Co-authored-by: Wendong-Fan <[email protected]>
Co-authored-by: Wendong <[email protected]>
Co-authored-by: Wendong-Fan <[email protected]> Co-authored-by: Wendong <[email protected]>
Co-authored-by: Wendong-Fan <[email protected]> Co-authored-by: Wendong <[email protected]>
…mel (#1493) Co-authored-by: 任信行 <[email protected]> Co-authored-by: Harry Ye <[email protected]> Co-authored-by: Wendong-Fan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @hallerite and @apokryphosx ! Left some comments below, please also remember run pre-commit run --all-files
locally before push the code, now there are some errors
for config in self.DATASET_CONFIGS: | ||
dataset = load_dataset( | ||
self.DATASET_REPO, | ||
config, | ||
cache_dir=str(self.data_dir), | ||
download_mode="force_redownload" if force_download else "reuse_dataset_if_exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use def download(self) -> "MATHBenchmark"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it already comes with the hf datasets library
Thanks for the comments @Wendong-Fan. I will implement the changes later today. |
…nto feat/benchmarks
Description
This PR introduces a base class for math benchmarks and provides implementations for:
Motivation and Context
This PR addresses and closes #1510.
Types of Changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks ✅
Checklist 📝
Please go over all the following points and put an
x
in the boxes that apply.If you're unsure about any, feel free to ask!
Draft Status 🚧
Current Progress:
Next Steps: