Serbian LLM Benchmark Task #340

DeanChugall · 2024-10-03T15:34:56Z

Serbian LLM Benchmark Task Configuration and Prompt Functions

Summary:

This pull request introduces task configurations and prompt functions for evaluating LLM models on various Serbian datasets. The module includes tasks for:

ARC (easy and challenge),
BoolQ,
Hellaswag,
OpenBookQA,
PIQA,
Winogrande,
custom OZ Eval dataset.

The tasks are defined using the LightevalTaskConfig class, and prompt generation is streamlined through a reusable serbian_eval_prompt function.

Changes:

Task Configurations:
- Configurations for ARC (Easy and Challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, and OZ Eval tasks using LightevalTaskConfig.
- Enum class HFSubsets added for dataset subset management, improving code maintainability and clarity.
- create_task_config function allows dynamic task creation with dependency injection for flexibility in dataset and metric selection.
Prompt Functions:
- The serbian_eval_prompt function creates a structured multiple-choice prompt in Serbian.
- The function supports dynamic query and choice generation with configurable tasks.
Logging:
- A hello_message banner is printed upon task initialization, listing all available tasks.
- Task names are dynamically generated and printed using hlog_warn.

Key Features:

Modular Design: Task configurations are modular, reusable, and easily extendable to accommodate new datasets and tasks.
Improved Readability: Introduction of the HFSubsets Enum improves the readability and maintainability of the dataset subset references.
Enhanced Flexibility: create_task_config function simplifies task creation, promoting cleaner and more maintainable code.
Clear Logging: Logging includes a friendly welcome message and a list of available tasks for easier debugging and interaction.

Future Enhancements:

Additional prompt functions can be added for different task types.
Unit tests should be written to ensure the integrity of prompt generation and task configuration.

- ARC_EASY - ARC_CHALLENGE - BOOLQ - HELLASWAG - OPENBOOK - PIQA - OZ_EVAL - WINOGRANDE

1. Change and tide ```serbian_eval_prompt```. 2. ```ruff check``` fix.

NathanHB

Hi ! Thanks for the PR. Only a few nits but should be good :)

community_tasks/serbian_eval.py

1. add ```hf_revision`` because ```trust_dataset=True ```. 2. Move log to ```__main__```.

DeanChugall · 2024-10-07T08:47:45Z

Fixed ruff format --check . for ci,

DeanChugall · 2024-10-07T08:55:17Z

It would be great that we using pre-commit run but when this is run some of file not satisfy criteria, and I don't want to mess with this file.
File affected with pre-commit run on image below.

etc ...

NathanHB · 2024-10-07T11:52:01Z

Mhh this should not happen, are you sure you are running the correct versions ?

DeanChugall · 2024-10-07T11:58:17Z

Mhh this should not happen, are you sure you are running the correct versions ?

Absolutely, try checking at least one of those files manually in eg: evaluation-task-request.md

https://raw.githubusercontent.com/huggingface/lighteval/refs/heads/main/.github/ISSUE_TEMPLATE/evaluation-task-request.md

…atch groups.

…evals

NathanHB · 2024-10-08T09:59:56Z

let's just wait for the quality check and see if we can merge.

…-benchmark [FIX #2]: Fix ```OzEval``` aligning.

clefourrier

LGTM!
cc @hynky1999 for viz

* Serbian LLM benchmark: - ARC_EASY - ARC_CHALLENGE - BOOLQ - HELLASWAG - OPENBOOK - PIQA - OZ_EVAL - WINOGRANDE

DeanChugall added 2 commits October 3, 2024 17:30

Serbian LLM benchmark:

db9e2c1

- ARC_EASY - ARC_CHALLENGE - BOOLQ - HELLASWAG - OPENBOOK - PIQA - OZ_EVAL - WINOGRANDE

Minor changes:

530ea64

1. Change and tide ```serbian_eval_prompt```. 2. ```ruff check``` fix.

NathanHB reviewed Oct 4, 2024

View reviewed changes

community_tasks/serbian_eval.py Show resolved Hide resolved

community_tasks/serbian_eval.py Outdated Show resolved Hide resolved

community_tasks/serbian_eval.py Outdated Show resolved Hide resolved

DeanChugall and others added 5 commits October 4, 2024 18:42

Merge branch 'huggingface:main' into serbian_evals

586679a

[FIX huggingface#340]: Only a few nits:

30de1e3

1. add ```hf_revision`` because ```trust_dataset=True ```. 2. Move log to ```__main__```.

[DEV]: Add MMLU Serbian task and refactor prompt.

6d08483

[DEV]: Change HF_REVISION to new version.

bad0378

[FIX]: Fixed ruff format --check .

3e01f80

DeanChugall and others added 5 commits October 7, 2024 16:39

[DEV huggingface#340]: Add serbian task group and rearrange task to m…

f45ae71

…atch groups.

Merge branch 'huggingface:main' into serbian_evals

9c7f589

[FIX huggingface#340]: Again pre-commit formatting.

40a1760

Merge commit '9c7f58924f2d6dd3d84222d3b425037934ca7918' into serbian_…

5de71c2

…evals

[FIX huggingface#340]: Again pre-commit formatting.

83a18ac

DeanChugall and others added 4 commits October 10, 2024 13:45

Merge branch 'huggingface:main' into serbian_evals

c1cfb0c

[FIX #2]: Fix OzEval aligning.

8fcf76d

Merge pull request #3 from textintellect/2-eval-aligning-with-oz-eval…

cb58c50

…-benchmark [FIX #2]: Fix ```OzEval``` aligning.

Merge branch 'huggingface:main' into serbian_evals

96bafc8

clefourrier approved these changes Oct 14, 2024

View reviewed changes

clefourrier merged commit 635e581 into huggingface:main Oct 14, 2024
2 checks passed

hynky1999 pushed a commit that referenced this pull request May 22, 2025

Serbian LLM Benchmark Task (#340)

9a6337e

* Serbian LLM benchmark: - ARC_EASY - ARC_CHALLENGE - BOOLQ - HELLASWAG - OPENBOOK - PIQA - OZ_EVAL - WINOGRANDE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serbian LLM Benchmark Task #340

Serbian LLM Benchmark Task #340

Uh oh!

DeanChugall commented Oct 3, 2024

Uh oh!

NathanHB left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DeanChugall commented Oct 7, 2024

Uh oh!

DeanChugall commented Oct 7, 2024 •

edited

Loading

Uh oh!

NathanHB commented Oct 7, 2024

Uh oh!

DeanChugall commented Oct 7, 2024

Uh oh!

NathanHB commented Oct 8, 2024

Uh oh!

clefourrier left a comment

Uh oh!

Uh oh!

Uh oh!

Serbian LLM Benchmark Task #340

Serbian LLM Benchmark Task #340

Uh oh!

Conversation

DeanChugall commented Oct 3, 2024

Serbian LLM Benchmark Task Configuration and Prompt Functions

Summary:

Changes:

Key Features:

Future Enhancements:

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DeanChugall commented Oct 7, 2024

Uh oh!

DeanChugall commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanHB commented Oct 7, 2024

Uh oh!

DeanChugall commented Oct 7, 2024

Uh oh!

NathanHB commented Oct 8, 2024

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DeanChugall commented Oct 7, 2024 •

edited

Loading