Skip to content

Add multilingual benchmarking#267

Open
victorigualada wants to merge 18 commits intoallenporter:mainfrom
victorigualada:add-multilingual-benchmarking
Open

Add multilingual benchmarking#267
victorigualada wants to merge 18 commits intoallenporter:mainfrom
victorigualada:add-multilingual-benchmarking

Conversation

@victorigualada
Copy link
Contributor

@victorigualada victorigualada commented Mar 2, 2026

Description

This PR adds multilingual benchmark support.
There are a few decisions that might be questionable but I think it's a simple first approach.
It adds a first multilingual benchmark for devstral-2512.

  1. I decided to create assist-<language> directories for each dataset language so we don't change the existing structure. We can have a follow-up PR that will create a tree structure:
|- assist
  |- en
  |- es
  |- fr
|- assist-mini
  |- en
  |- es
  |- fr
...
  1. The current leaderboard stays as-is with english only results. Then for each dataset (assist, assist-mini, questions and automations) a new section is created with a table having models as rows and languages as columns. Again we can think on how to repurpose this but as groundwork I think it's good enough.

Let me know what do you think and if we can improve it.

Copy link
Owner

@allenporter allenporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems fine though i saw that there are two CI failures
(1) there is a codespell error
(2) i think some snapshots are failing

expect_changes:
light.kitchen_light:
state: "on"
# Helligkeitszustand ignorieren, Integration stellt ihn wieder her
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if its super useful to translate the comments too :)

@allenporter
Copy link
Owner

There are still some test failures. I think there maybe something wrong with the linting configuration in the pre-commit that caused this? May need to manually fix.

@victorigualada
Copy link
Contributor Author

There are still some test failures. I think there maybe something wrong with the linting configuration in the pre-commit that caused this? May need to manually fix.

Yes that's totally the issue and drove me mad last week. The different linting approach in pre-commit and CI is causing this. I'm thinking if they should be aligned so this doesn't happen again in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants