Skip to content

Serbian LLM Benchmark Task #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 14, 2024
Merged

Conversation

DeanChugall
Copy link
Contributor

Serbian LLM Benchmark Task Configuration and Prompt Functions

Summary:

This pull request introduces task configurations and prompt functions for evaluating LLM models on various Serbian datasets. The module includes tasks for:

ARC (easy and challenge),
BoolQ,
Hellaswag,
OpenBookQA,
PIQA,
Winogrande,
custom OZ Eval dataset.

The tasks are defined using the LightevalTaskConfig class, and prompt generation is streamlined through a reusable serbian_eval_prompt function.

Changes:

  1. Task Configurations:

    • Configurations for ARC (Easy and Challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, and OZ Eval tasks using LightevalTaskConfig.
    • Enum class HFSubsets added for dataset subset management, improving code maintainability and clarity.
    • create_task_config function allows dynamic task creation with dependency injection for flexibility in dataset and metric selection.
  2. Prompt Functions:

    • The serbian_eval_prompt function creates a structured multiple-choice prompt in Serbian.
    • The function supports dynamic query and choice generation with configurable tasks.
  3. Logging:

    • A hello_message banner is printed upon task initialization, listing all available tasks.
    • Task names are dynamically generated and printed using hlog_warn.

Key Features:

  • Modular Design: Task configurations are modular, reusable, and easily extendable to accommodate new datasets and tasks.
  • Improved Readability: Introduction of the HFSubsets Enum improves the readability and maintainability of the dataset subset references.
  • Enhanced Flexibility: create_task_config function simplifies task creation, promoting cleaner and more maintainable code.
  • Clear Logging: Logging includes a friendly welcome message and a list of available tasks for easier debugging and interaction.

Future Enhancements:

  • Additional prompt functions can be added for different task types.
  • Unit tests should be written to ensure the integrity of prompt generation and task configuration.

- ARC_EASY
- ARC_CHALLENGE
- BOOLQ
- HELLASWAG
- OPENBOOK
- PIQA
- OZ_EVAL
- WINOGRANDE
1. Change  and tide ```serbian_eval_prompt```.
2. ```ruff check``` fix.
Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ! Thanks for the PR. Only a few nits but should be good :)

@DeanChugall
Copy link
Contributor Author

Fixed ruff format --check . for ci,

@DeanChugall
Copy link
Contributor Author

DeanChugall commented Oct 7, 2024

It would be great that we using pre-commit run but when this is run some of file not satisfy criteria, and I don't want to mess with this file.
File affected with pre-commit run on image below.

Screenshot from 2024-10-07 10-52-25

Screenshot from 2024-10-07 10-57-55

Screenshot from 2024-10-07 10-58-10

etc ...

@NathanHB
Copy link
Member

NathanHB commented Oct 7, 2024

Mhh this should not happen, are you sure you are running the correct versions ?

@DeanChugall
Copy link
Contributor Author

Mhh this should not happen, are you sure you are running the correct versions ?

Absolutely, try checking at least one of those files manually in eg: evaluation-task-request.md

https://raw.githubusercontent.com/huggingface/lighteval/refs/heads/main/.github/ISSUE_TEMPLATE/evaluation-task-request.md

@NathanHB
Copy link
Member

NathanHB commented Oct 8, 2024

let's just wait for the quality check and see if we can merge.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
cc @hynky1999 for viz

@clefourrier clefourrier merged commit 635e581 into huggingface:main Oct 14, 2024
2 checks passed
hynky1999 pushed a commit that referenced this pull request May 22, 2025
* Serbian LLM benchmark:
- ARC_EASY
- ARC_CHALLENGE
- BOOLQ
- HELLASWAG
- OPENBOOK
- PIQA
- OZ_EVAL
- WINOGRANDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants