Subtasks for Fleurs/Covost2 by naymaraq · Pull Request #1431 · NVIDIA-NeMo/Skills

naymaraq · 2026-05-06T19:28:00Z

Summary

Reshape fleurs and covost2 into benchmark groups with asr/ and st/ subtasks, mirroring the contextasr-bench layout. One prepare_data invocation per dataset now produces both subtask manifests in one shot.

Summary by CodeRabbit

Release Notes

New Features
- Unified preparation workflow: CoVoST 2 and FLEURS benchmarks now generate both speech recognition and speech translation outputs in a single run.
Documentation
- Updated CoVoST 2 and FLEURS benchmark guides with clearer preparation instructions, language flag behavior, and expected output structures.
Improvements
- Simplified benchmark preparation by removing the --task flag requirement.

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

Signed-off-by: Dav Karamyan <47416614+naymaraq@users.noreply.github.com>

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

Rename fleurs_score.py to audio_score.py with a generalized docstring and remove covost2_score.py (which was byte-identical apart from docstring). Both fleurs and covost2 SCORE_MODULE now point to the shared module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: naymaraq <dkaramyan@nvidia.com>

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

coderabbitai · 2026-05-10T09:28:14Z

📝 Walkthrough

Walkthrough

This PR unifies audio dataset preparation for CoVoST2 and FLEURS benchmarks by removing task-specific CLI flags and always generating both ASR and speech-translation outputs in a single preparation run. Configuration constants move from parent modules to subtask-level modules, benchmark groups are registered with shared scoring logic, and both prepare scripts are refactored with dedicated helper functions.

Changes

Audio Benchmark Group Unification

Layer / File(s)	Summary
Benchmark Group Setup `nemo_skills/dataset/covost2/__init__.py`, `nemo_skills/dataset/fleurs/__init__.py`	Added `IS_BENCHMARK_GROUP`, `SCORE_MODULE` (pointing to `nemo_skills.dataset.fleurs.audio_score`), and `BENCHMARKS` mapping for `covost2.asr`, `covost2.st`, `fleurs.asr`, `fleurs.st`. Removed module-level `METRICS_TYPE`, `EVAL_ARGS`, `GENERATION_ARGS`.
Subtask Configuration `nemo_skills/dataset/covost2/asr/__init__.py`, `nemo_skills/dataset/covost2/st/__init__.py`, `nemo_skills/dataset/fleurs/asr/__init__.py`, `nemo_skills/dataset/fleurs/st/__init__.py`	Added module-level `METRICS_TYPE`, `EVAL_ARGS`, `GENERATION_ARGS` constants to each ASR and ST subtask package.
Unified Scoring Module `nemo_skills/dataset/fleurs/audio_score.py`	New `compute_score` function aggregates audio metrics across `asr` and `st` benchmarks per evaluation mode, computing weighted averages for WER/BLEU/COMET and summed totals for error counts and word counts.
CoVoST2 Preparation Refactor `nemo_skills/dataset/covost2/prepare.py`	Removed `task_type` parameter; refactored `prepare_covost2` to always generate both ASR and ST manifests. Added `_collect_asr_records`, `_collect_st_records`, `_dump_jsonl` helpers. Renamed `get_ast_instruction` to `get_st_instruction`. Outputs separate files under `asr/{split}.jsonl` and `st/{split}.jsonl`.
FLEURS Preparation Refactor `nemo_skills/dataset/fleurs/prepare.py`	Removed `task_type` parameter and `--task` CLI flag. Refactored `prepare_fleurs` to always generate both ASR and ST outputs. Added `_src_extra_fields`, `_collect_asr_records`, `_collect_st_records`, `_dump_jsonl` helpers. Renamed `get_ast_instruction` to `get_st_instruction`. Outputs separate files under `asr/{split}.jsonl` and `st/{split}.jsonl`.
Results Aggregation `nemo_skills/pipeline/summarize_results.py`	Updated `add_benchmark_groups` to exclude dotted benchmark names whose suffix contains `-` from prefix-based grouping, preventing subset entries from being averaged as siblings.
Documentation `docs/evaluation/speech-audio.md`	Updated CoVoST2 and FLEURS sections to document unified single-run preparation (`ns prepare_data covost2` / `ns prepare_data fleurs`) generating both ASR and ST outputs. Removed `--task ASR/AST` workflow examples. Documented `--languages` semantics for language/pair filtering. Added command examples and expected output directory layouts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: restructuring Fleurs and CoVoST2 into benchmark groups with ASR and ST subtasks.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dkaramyan/add-fleurs-covost-subtasks

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/evaluation/speech-audio.md`:
- Around line 638-643: The two unlabeled fenced blocks that start with the
directory listings "<data_dir>/covost2/" and "<data_dir>/fleurs/" should be
marked with a language identifier to satisfy MD040; update each opening
triple-backtick for the blocks containing those listings to use "```text" so the
blocks are labeled as text (look for the blocks that contain "asr/test.jsonl",
"st/test.jsonl", and "audio/<lang>..." / "audio/<locale>...").

In `@nemo_skills/dataset/covost2/prepare.py`:
- Around line 253-260: The manifest entries for CoVoST2 are being labeled with
task_type="AST"; locate the _build_record(...) call that currently passes
task_type="AST" (the call building ST records in prepare.py) and change the
task_type value to "ST" so CoVoST2 entries follow the ST convention used by
downstream grouping and the covost2.st subtask.

In `@nemo_skills/dataset/fleurs/audio_score.py`:
- Around line 51-59: The aggregation block currently uses metrics.get(...) which
silently defaults missing keys to zero and corrupts totals; change all uses of
metrics.get to direct key access (metrics["num_entries"],
metrics["gen_seconds"], metrics["success_rate"], metrics["avg_tokens"],
metrics["no_answer"]) so a KeyError surfaces immediately for missing required
keys, and keep the existing conditional on num_entries (num_entries =
metrics["num_entries"]; if num_entries == 0: continue) while still updating
total_entries, total_gen_seconds, weighted_success, weighted_tokens, and
weighted_no_answer with the directly accessed values.
- Around line 32-34: The current code takes eval_modes from only first_benchmark
(first_benchmark = next(iter(benchmarks.values())); eval_modes =
list(first_benchmark.keys())), which can omit modes present only in other
benchmarks; change the logic to aggregate keys across all benchmarks by
iterating over benchmarks.values(), collecting the union of all sub-benchmark
keys (e.g., via a set) and then using that combined list for eval_modes so group
scoring covers every eval mode defined anywhere in benchmarks.

In `@nemo_skills/dataset/fleurs/prepare.py`:
- Around line 264-271: The _build_record call currently hardcodes
task_type="AST" which is inconsistent with the new subtask naming; change the
task_type argument in the _build_record invocation (the call that passes
expected_answer, instruction=get_st_instruction(tgt_locale),
container_audio_path=cpath, etc.) to use the correct subtask label "st" (or
whatever canonical lowercase "st" is used elsewhere) so ST records are
consistently tagged as st across manifests and downstream grouping.
- Around line 244-246: The code silently skips missing aligned targets by using
target_by_id.get(source_row["id"]) and continue; change this to directly access
the mapping (e.g., target_row = target_by_id[source_row["id"]]) so a KeyError is
raised when alignment is broken, or wrap that access in a small try/except that
raises a descriptive error mentioning the source id and file/context; update the
logic in the loop where target_row, target_by_id and source_row["id"] are used
so failures fail fast instead of being silently dropped.
- Around line 237-244: The loop currently calls load_fleurs(src_locale, ...)
inside the inner (src_locale, tgt_locale) pair loop, causing repeated reloads
for the same src_locale (e.g., en_us); instead, load and cache the source rows
once per src_locale before iterating its tgt_locales (e.g., build a dict mapping
id->row from load_fleurs for that src_locale and reuse it), move the
audio_dir.mkdir(parents=True, exist_ok=True) to run once per src_locale, and
then inside the inner loop look up target_row from the cached mapping (use
existing helper get_target_index only for target locale); update references to
source_row to use the cached data so load_fleurs is not re-run for every pair.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 91617366-5404-439c-8001-aeba9664776c

📥 Commits

Reviewing files that changed from the base of the PR and between 0a3c03f and eacc399.

📒 Files selected for processing (11)

docs/evaluation/speech-audio.md
nemo_skills/dataset/covost2/__init__.py
nemo_skills/dataset/covost2/asr/__init__.py
nemo_skills/dataset/covost2/prepare.py
nemo_skills/dataset/covost2/st/__init__.py
nemo_skills/dataset/fleurs/__init__.py
nemo_skills/dataset/fleurs/asr/__init__.py
nemo_skills/dataset/fleurs/audio_score.py
nemo_skills/dataset/fleurs/prepare.py
nemo_skills/dataset/fleurs/st/__init__.py
nemo_skills/pipeline/summarize_results.py

coderabbitai · 2026-05-10T09:28:17Z

+```
+<data_dir>/covost2/
+    asr/test.jsonl   # one record per (lang, audio) for transcription
+    st/test.jsonl    # one record per (src→tgt, audio) for translation
+    audio/<lang>/<split>/...wav
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced output blocks.

Lines 638 and 676 use unlabeled fenced blocks, which triggers MD040. Mark them as text.

📝 Suggested fix

-``` +```text <data_dir>/covost2/ asr/test.jsonl # one record per (lang, audio) for transcription st/test.jsonl # one record per (src→tgt, audio) for translation audio/<lang>/<split>/...wav

@@
- +text
<data_dir>/fleurs/
asr/test.jsonl # one record per (locale, audio) for transcription
st/test.jsonl # one record per (src→tgt, audio) for translation
audio//...wav

Also applies to: 676-681

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 638-638: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/evaluation/speech-audio.md` around lines 638 - 643, The two unlabeled fenced blocks that start with the directory listings "<data_dir>/covost2/" and "<data_dir>/fleurs/" should be marked with a language identifier to satisfy MD040; update each opening triple-backtick for the blocks containing those listings to use "```text" so the blocks are labeled as text (look for the blocks that contain "asr/test.jsonl", "st/test.jsonl", and "audio/<lang>..." / "audio/<locale>...").

coderabbitai · 2026-05-10T09:28:17Z

+                _build_record(
+                    expected_answer=item["translation"],
+                    instruction=get_st_instruction(tgt_lang),
+                    container_audio_path=cpath,
+                    duration=duration,
+                    subset_for_metrics=tag,
+                    task_type="AST",
+                    extra_fields={


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

CoVoST2 ST manifest entries are still labeled AST.

Line 259 uses task_type="AST" in the ST path. This can conflict with the new covost2.st subtask convention and downstream grouping.

🔧 Suggested fix

- task_type="AST", + task_type="ST",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemo_skills/dataset/covost2/prepare.py` around lines 253 - 260, The manifest entries for CoVoST2 are being labeled with task_type="AST"; locate the _build_record(...) call that currently passes task_type="AST" (the call building ST records in prepare.py) and change the task_type value to "ST" so CoVoST2 entries follow the ST convention used by downstream grouping and the covost2.st subtask.

coderabbitai · 2026-05-10T09:28:17Z

+    first_benchmark = next(iter(benchmarks.values()))
+    eval_modes = list(first_benchmark.keys())
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Aggregate evaluation modes from all sub-benchmarks, not just the first one.

Current logic can drop valid eval modes that exist only in later benchmarks, causing incomplete group scores.

💡 Suggested fix

- first_benchmark = next(iter(benchmarks.values())) - eval_modes = list(first_benchmark.keys()) + eval_modes = sorted({mode for benchmark_data in benchmarks.values() for mode in benchmark_data.keys()})

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemo_skills/dataset/fleurs/audio_score.py` around lines 32 - 34, The current code takes eval_modes from only first_benchmark (first_benchmark = next(iter(benchmarks.values())); eval_modes = list(first_benchmark.keys())), which can omit modes present only in other benchmarks; change the logic to aggregate keys across all benchmarks by iterating over benchmarks.values(), collecting the union of all sub-benchmark keys (e.g., via a set) and then using that combined list for eval_modes so group scoring covers every eval mode defined anywhere in benchmarks.

coderabbitai · 2026-05-10T09:28:17Z

+            num_entries = metrics.get("num_entries", 0)
+            if num_entries == 0:
+                continue
+
+            total_entries += num_entries
+            total_gen_seconds += metrics.get("gen_seconds", 0)
+            weighted_success += metrics.get("success_rate", 0.0) * num_entries
+            weighted_tokens += metrics.get("avg_tokens", 0.0) * num_entries
+            weighted_no_answer += metrics.get("no_answer", 0.0) * num_entries


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on missing required metric keys instead of defaulting to zero.

Using defaults here can silently corrupt aggregate values when input schema is incomplete.

💡 Suggested fix

- num_entries = metrics.get("num_entries", 0) + num_entries = metrics["num_entries"] if num_entries == 0: continue total_entries += num_entries - total_gen_seconds += metrics.get("gen_seconds", 0) - weighted_success += metrics.get("success_rate", 0.0) * num_entries - weighted_tokens += metrics.get("avg_tokens", 0.0) * num_entries - weighted_no_answer += metrics.get("no_answer", 0.0) * num_entries + total_gen_seconds += metrics["gen_seconds"] + weighted_success += metrics["success_rate"] * num_entries + weighted_tokens += metrics["avg_tokens"] * num_entries + weighted_no_answer += metrics["no_answer"] * num_entries

As per coding guidelines, "Don't use `.get()` for accessing dictionary keys if the code expects them to be present; use direct access `data[key_name]` to fail with a clear error instead of silently corrupting data".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemo_skills/dataset/fleurs/audio_score.py` around lines 51 - 59, The aggregation block currently uses metrics.get(...) which silently defaults missing keys to zero and corrupts totals; change all uses of metrics.get to direct key access (metrics["num_entries"], metrics["gen_seconds"], metrics["success_rate"], metrics["avg_tokens"], metrics["no_answer"]) so a KeyError surfaces immediately for missing required keys, and keep the existing conditional on num_entries (num_entries = metrics["num_entries"]; if num_entries == 0: continue) while still updating total_entries, total_gen_seconds, weighted_success, weighted_tokens, and weighted_no_answer with the directly accessed values.

coderabbitai · 2026-05-10T09:28:17Z

+    for src_locale, tgt_locale in pairs:
+        locale_audio_dir = audio_dir / src_locale
+        if not no_audio:
+            locale_audio_dir.mkdir(parents=True, exist_ok=True)
+        target_by_id = get_target_index(tgt_locale)
+        tag = f"{src_locale}->{tgt_locale}"
+        for source_row in tqdm(load_fleurs(src_locale, split, local_dir=local_dir), desc=tag):
+            target_row = target_by_id.get(source_row["id"])


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cache source rows per locale in ST collection.

Line 243 reloads full locale data per (src_locale, tgt_locale) pair. For en_us, this repeats many times and significantly increases prep time.

♻️ Suggested fix

def _collect_st_records( @@ ) -> list[dict]: pairs = build_translation_pairs(languages) target_cache: dict[str, dict[int, dict]] = {} + source_cache: dict[str, list[dict]] = {} @@ def get_target_index(locale: str) -> dict[int, dict]: if locale not in target_cache: target_cache[locale] = index_by_id(load_fleurs(locale, split, local_dir=local_dir)) return target_cache[locale] + + def get_source_rows(locale: str) -> list[dict]: + if locale not in source_cache: + source_cache[locale] = load_fleurs(locale, split, local_dir=local_dir) + return source_cache[locale] @@ - for source_row in tqdm(load_fleurs(src_locale, split, local_dir=local_dir), desc=tag): + for source_row in tqdm(get_source_rows(src_locale), desc=tag):

As per coding guidelines, "Keep code simple and elegant; reuse/extend existing functionality when possible".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemo_skills/dataset/fleurs/prepare.py` around lines 237 - 244, The loop currently calls load_fleurs(src_locale, ...) inside the inner (src_locale, tgt_locale) pair loop, causing repeated reloads for the same src_locale (e.g., en_us); instead, load and cache the source rows once per src_locale before iterating its tgt_locales (e.g., build a dict mapping id->row from load_fleurs for that src_locale and reuse it), move the audio_dir.mkdir(parents=True, exist_ok=True) to run once per src_locale, and then inside the inner loop look up target_row from the cached mapping (use existing helper get_target_index only for target locale); update references to source_row to use the cached data so load_fleurs is not re-run for every pair.

coderabbitai · 2026-05-10T09:28:17Z

+            target_row = target_by_id.get(source_row["id"])
+            if target_row is None:
+                continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast instead of silently dropping missing aligned targets.

At Line 244, target_by_id.get(...) + Line 246 continue can silently lose ST samples when alignment breaks. Use direct key access so mismatches fail loudly.

🔧 Suggested fix

- target_row = target_by_id.get(source_row["id"]) - if target_row is None: - continue + target_row = target_by_id[source_row["id"]]

As per coding guidelines, "Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

target_row = target_by_id.get(source_row["id"])

if target_row is None:

continue

target_row = target_by_id[source_row["id"]]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemo_skills/dataset/fleurs/prepare.py` around lines 244 - 246, The code silently skips missing aligned targets by using target_by_id.get(source_row["id"]) and continue; change this to directly access the mapping (e.g., target_row = target_by_id[source_row["id"]]) so a KeyError is raised when alignment is broken, or wrap that access in a small try/except that raises a descriptive error mentioning the source id and file/context; update the logic in the loop where target_row, target_by_id and source_row["id"] are used so failures fail fast instead of being silently dropped.

coderabbitai · 2026-05-10T09:28:18Z

+                _build_record(
+                    expected_answer=target_row["raw_transcription"],
+                    instruction=get_st_instruction(tgt_locale),
                    container_audio_path=cpath,
                    duration=duration,
-                    subset_for_metrics=subset_for_metrics,
-                    task_type=task_type,
-                    extra_fields=extra_fields,
-                    source=source_text,
-                    reference=reference_text,
+                    subset_for_metrics=tag,
+                    task_type="AST",
+                    extra_fields=extra,


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

ST records are still tagged as AST.

Line 270 sets task_type="AST" even though this PR moves to st subtasks. This can create inconsistent metadata across generated manifests and downstream grouping.

🔧 Suggested fix

- task_type="AST", + task_type="ST",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_build_record(

expected_answer=target_row["raw_transcription"],

instruction=get_st_instruction(tgt_locale),

container_audio_path=cpath,

duration=duration,

subset_for_metrics=subset_for_metrics,

task_type=task_type,

extra_fields=extra_fields,

source=source_text,

reference=reference_text,

subset_for_metrics=tag,

task_type="AST",

extra_fields=extra,

_build_record(

expected_answer=target_row["raw_transcription"],

instruction=get_st_instruction(tgt_locale),

container_audio_path=cpath,

duration=duration,

subset_for_metrics=tag,

task_type="ST",

extra_fields=extra,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemo_skills/dataset/fleurs/prepare.py` around lines 264 - 271, The _build_record call currently hardcodes task_type="AST" which is inconsistent with the new subtask naming; change the task_type argument in the _build_record invocation (the call that passes expected_answer, instruction=get_st_instruction(tgt_locale), container_audio_path=cpath, etc.) to use the correct subtask label "st" (or whatever canonical lowercase "st" is used elsewhere) so ST records are consistently tagged as st across manifests and downstream grouping.

naymaraq and others added 13 commits May 6, 2026 21:26

add subtask

6b8eefe

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

rename ast to st

2fbf3da

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

fixies

0c6fb08

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

fix fleurs

a26e590

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

fix covost

715813b

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

dump in a single shot

695e81d

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

refactor docs for fleurs and covost

2f3b53a

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

Merge branch 'main' into dkaramyan/add-fleurs-covost-subtasks

93356ef

Signed-off-by: Dav Karamyan <47416614+naymaraq@users.noreply.github.com>

add score module

f7787b2

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

add missing source/reference filed for comet score

66b3299

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

rm CER

00d6038

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

fix grouping

eacc399

Signed-off-by: naymaraq <dkaramyan@nvidia.com>

naymaraq marked this pull request as ready for review May 10, 2026 09:24

naymaraq added the run GPU tests label May 10, 2026

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtasks for Fleurs/Covost2#1431

Subtasks for Fleurs/Covost2#1431
naymaraq wants to merge 13 commits intomainfrom
dkaramyan/add-fleurs-covost-subtasks

naymaraq commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 10, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		first_benchmark = next(iter(benchmarks.values()))
		eval_modes = list(first_benchmark.keys())

Conversation

naymaraq commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 10, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

naymaraq commented May 6, 2026 •

edited by coderabbitai Bot

Loading