-
Notifications
You must be signed in to change notification settings - Fork 301
Serbian LLM Benchmark Task #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- ARC_EASY - ARC_CHALLENGE - BOOLQ - HELLASWAG - OPENBOOK - PIQA - OZ_EVAL - WINOGRANDE
1. Change and tide ```serbian_eval_prompt```. 2. ```ruff check``` fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi ! Thanks for the PR. Only a few nits but should be good :)
1. add ```hf_revision`` because ```trust_dataset=True ```. 2. Move log to ```__main__```.
Fixed |
Mhh this should not happen, are you sure you are running the correct versions ? |
Absolutely, try checking at least one of those files manually in eg: |
let's just wait for the quality check and see if we can merge. |
…-benchmark [FIX #2]: Fix ```OzEval``` aligning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
cc @hynky1999 for viz
* Serbian LLM benchmark: - ARC_EASY - ARC_CHALLENGE - BOOLQ - HELLASWAG - OPENBOOK - PIQA - OZ_EVAL - WINOGRANDE
Serbian LLM Benchmark Task Configuration and Prompt Functions
Summary:
This pull request introduces task configurations and prompt functions for evaluating LLM models on various Serbian datasets. The module includes tasks for:
The tasks are defined using the
LightevalTaskConfig
class, and prompt generation is streamlined through a reusableserbian_eval_prompt
function.Changes:
Task Configurations:
LightevalTaskConfig
.HFSubsets
added for dataset subset management, improving code maintainability and clarity.create_task_config
function allows dynamic task creation with dependency injection for flexibility in dataset and metric selection.Prompt Functions:
serbian_eval_prompt
function creates a structured multiple-choice prompt in Serbian.Logging:
hello_message
banner is printed upon task initialization, listing all available tasks.hlog_warn
.Key Features:
HFSubsets
Enum improves the readability and maintainability of the dataset subset references.create_task_config
function simplifies task creation, promoting cleaner and more maintainable code.Future Enhancements: