Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serbian LLM Benchmark Task #340

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DeanChugall
Copy link

Serbian LLM Benchmark Task Configuration and Prompt Functions

Summary:

This pull request introduces task configurations and prompt functions for evaluating LLM models on various Serbian datasets. The module includes tasks for:

ARC (easy and challenge),
BoolQ,
Hellaswag,
OpenBookQA,
PIQA,
Winogrande,
custom OZ Eval dataset.

The tasks are defined using the LightevalTaskConfig class, and prompt generation is streamlined through a reusable serbian_eval_prompt function.

Changes:

  1. Task Configurations:

    • Configurations for ARC (Easy and Challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, and OZ Eval tasks using LightevalTaskConfig.
    • Enum class HFSubsets added for dataset subset management, improving code maintainability and clarity.
    • create_task_config function allows dynamic task creation with dependency injection for flexibility in dataset and metric selection.
  2. Prompt Functions:

    • The serbian_eval_prompt function creates a structured multiple-choice prompt in Serbian.
    • The function supports dynamic query and choice generation with configurable tasks.
  3. Logging:

    • A hello_message banner is printed upon task initialization, listing all available tasks.
    • Task names are dynamically generated and printed using hlog_warn.

Key Features:

  • Modular Design: Task configurations are modular, reusable, and easily extendable to accommodate new datasets and tasks.
  • Improved Readability: Introduction of the HFSubsets Enum improves the readability and maintainability of the dataset subset references.
  • Enhanced Flexibility: create_task_config function simplifies task creation, promoting cleaner and more maintainable code.
  • Clear Logging: Logging includes a friendly welcome message and a list of available tasks for easier debugging and interaction.

Future Enhancements:

  • Additional prompt functions can be added for different task types.
  • Unit tests should be written to ensure the integrity of prompt generation and task configuration.

- ARC_EASY
- ARC_CHALLENGE
- BOOLQ
- HELLASWAG
- OPENBOOK
- PIQA
- OZ_EVAL
- WINOGRANDE
1. Change  and tide ```serbian_eval_prompt```.
2. ```ruff check``` fix.
Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ! Thanks for the PR. Only a few nits but should be good :)

community_tasks/serbian_eval.py Show resolved Hide resolved
community_tasks/serbian_eval.py Show resolved Hide resolved
community_tasks/serbian_eval.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants