NVIDIA-NeMo
diff --git a/‎core/requirements.txt‎
Lines changed: 1 addition & 0 deletions b/‎core/requirements.txt‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/evaluation/speech-audio.md‎
Lines changed: 107 additions & 0 deletions b/‎docs/evaluation/speech-audio.md‎
Lines changed: 107 additions & 0 deletions
diff --git a/‎nemo_skills/dataset/contextasr-bench/__init__.py‎
Lines changed: 36 additions & 0 deletions b/‎nemo_skills/dataset/contextasr-bench/__init__.py‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎nemo_skills/dataset/contextasr-bench/coarse/__init__.py‎
Lines changed: 19 additions & 0 deletions b/‎nemo_skills/dataset/contextasr-bench/coarse/__init__.py‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎nemo_skills/dataset/contextasr-bench/contextasr_score.py‎
Lines changed: 87 additions & 0 deletions b/‎nemo_skills/dataset/contextasr-bench/contextasr_score.py‎
Lines changed: 87 additions & 0 deletions
diff --git a/‎nemo_skills/dataset/contextasr-bench/contextless/__init__.py‎
Lines changed: 19 additions & 0 deletions b/‎nemo_skills/dataset/contextasr-bench/contextless/__init__.py‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎nemo_skills/dataset/contextasr-bench/fine/__init__.py‎
Lines changed: 19 additions & 0 deletions b/‎nemo_skills/dataset/contextasr-bench/fine/__init__.py‎
Lines changed: 19 additions & 0 deletions
@@ -5,6 +5,7 @@
 
 bs4
 compute-eval @ git+https://github.com/NVIDIA/compute-eval.git@e01a5d2
+contractions
 datasets
 editdistance
 evalplus @ git+https://github.com/evalplus/evalplus@c91370f
 
@@ -483,3 +483,110 @@ Numb3rs reports the following metrics:
 - **success_rate**: Percentage of samples with WER < 0.5
 
 Per-category breakdowns (e.g., `numb3rs-numb3rs_CARDINAL`, `numb3rs-numb3rs_MONEY`) are included automatically.
+
+## ContextASR-Bench
+
+ContextASR-Bench evaluates contextual ASR performance by measuring how well models transcribe speech when given different levels of contextual information. It focuses on named entity recognition accuracy alongside standard WER.
+
+**Dataset:** [MrSupW/ContextASR-Bench](https://huggingface.co/datasets/MrSupW/ContextASR-Bench) (English Speech subset: 15,326 samples, ~188 hours, 116,167 named entities across 10+ domains)
+
+**Evaluation Modes:**
+
+- `contextasr-bench.contextless`: Plain transcription (no context)
+- `contextasr-bench.coarse`: Domain label provided as context
+- `contextasr-bench.fine`: Domain label + entity list provided as context
+
+**Metrics:**
+
+- **WER**: Word Error Rate (corpus-level)
+- **NE-WER**: Named Entity WER — WER computed on fuzzy-matched entity token sequences
+- **NE-FNR**: Named Entity False Negative Rate — fraction of reference entities not found in the transcription
+
+### Dataset Location
+
+* Benchmark is defined in `nemo_skills/dataset/contextasr-bench/__init__.py`
+* Original dataset is hosted on [HuggingFace](https://huggingface.co/datasets/MrSupW/ContextASR-Bench)
+
+### Preparing ContextASR-Bench Data
+
+ContextASR-Bench requires audio files for meaningful evaluation. **Audio files are downloaded
+automatically by default** from HuggingFace (~22 GB, may take 30-60 minutes).
+
+```bash
+ns prepare_data contextasr-bench
+```
+
+!!! warning "Large download"
+
+    The automatic download fetches ~22 GB of audio data (JSONL + 8 tar files) from HuggingFace.
+    This can take 30-60 minutes depending on network speed. If you already have the data
+    downloaded, use `--data_dir` to skip the download.
+
+To download to a specific directory, or to use pre-downloaded data:
+
+```bash
+ns prepare_data contextasr-bench --data_dir=/path/to/ContextASR-Bench
+```
+
+If the directory already contains `ContextASR-Speech_English.jsonl`, the existing data is
+used directly. If the file is missing, data is downloaded there automatically.
+
+To use a custom audio path prefix (e.g., for container mount points):
+
+```bash
+ns prepare_data contextasr-bench --data_dir=/path/to/ContextASR-Bench --audio-prefix /data/contextasr
+```
+
+### Running ContextASR-Bench Evaluation
+
+Evaluate all three modes:
+
+```bash
+ns eval \
+    --cluster=local \
+    --benchmarks=contextasr-bench \
+    --server_type=openai \
+    --server_address=http://localhost:8000/v1 \
+    --model=Qwen/Qwen3-Omni-7B \
+    --output_dir=/workspace/contextasr-eval \
+    --data_dir=/path/to/ContextASR-Bench
+```
+
+Evaluate a single mode:
+
+```bash
+ns eval --benchmarks=contextasr-bench.fine ...
+```
+
+### Understanding ContextASR-Bench Results
+
+```
+<output_dir>/
+└── eval-results/
+    └── contextasr-bench/
+        ├── metrics.json                          # Overall aggregate
+        ├── contextasr-bench.contextless/
+        │   └── metrics.json
+        ├── contextasr-bench.coarse/
+        │   └── metrics.json
+        └── contextasr-bench.fine/
+            └── metrics.json
+```
+
+Example output:
+
+```
+----------------------- contextasr-bench.contextless -----------------------
+evaluation_mode | avg_tokens | gen_seconds | success_rate | wer    | ne_wer | ne_fnr | num_entries
+pass@1          | 128        | 12000       | 97.73%       | 2.27%  | 7.83%  | 9.08%  | 15326
+
+------------------------- contextasr-bench.coarse --------------------------
+evaluation_mode | avg_tokens | gen_seconds | success_rate | wer    | ne_wer | ne_fnr | num_entries
+pass@1          | 128        | 12000       | 97.83%       | 2.17%  | 8.11%  | 9.32%  | 15326
+
+-------------------------- contextasr-bench.fine ---------------------------
+evaluation_mode | avg_tokens | gen_seconds | success_rate | wer    | ne_wer | ne_fnr | num_entries
+pass@1          | 128        | 12000       | 98.87%       | 1.13%  | 1.55%  | 0.53%  | 15326
+```
+
+Per-domain breakdowns are included automatically based on the `domain_label` field.
@@ -0,0 +1,36 @@
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ContextASR-Bench: Contextual ASR evaluation benchmark.
+
+Evaluates ASR models across three context settings:
+- Contextless: Plain transcription
+- Coarse-grained: Domain label provided as context
+- Fine-grained: Domain label + entity list provided as context
+
+Metrics: WER, NE-WER (entity-focused WER with fuzzy matching), NE-FNR (entity miss rate)
+
+Dataset: https://huggingface.co/datasets/MrSupW/ContextASR-Bench
+Paper: ContextASR-Bench (English Speech subset, 15,326 samples, ~188 hours)
+"""
+
+REQUIRES_DATA_DIR = True
+IS_BENCHMARK_GROUP = True
+SCORE_MODULE = "nemo_skills.dataset.contextasr-bench.contextasr_score"
+
+BENCHMARKS = {
+    "contextasr-bench.contextless": {},
+    "contextasr-bench.coarse": {},
+    "contextasr-bench.fine": {},
+}
@@ -0,0 +1,19 @@
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ContextASR-Bench coarse mode: domain label provided as context."""
+
+METRICS_TYPE = "contextasr"
+EVAL_ARGS = "++eval_type=contextasr"
+GENERATION_ARGS = "++prompt_format=openai ++enable_audio=true"
@@ -0,0 +1,87 @@
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+def compute_score(combined_metrics: dict) -> dict:
+    """Aggregate metrics from the three ContextASR-Bench sub-benchmarks.
+
+    Computes weighted averages of WER, NE-WER, NE-FNR across contextless,
+    coarse, and fine evaluation modes.
+    """
+    main_names = ["contextless", "coarse", "fine"]
+    benchmarks = {k: v for k, v in combined_metrics.items() if k.split(".")[-1] in main_names}
+
+    if not benchmarks:
+        return {}
+
+    first_benchmark = next(iter(benchmarks.values()))
+    eval_modes = list(first_benchmark.keys())
+
+    aggregated = {}
+    for eval_mode in eval_modes:
+        total_entries = 0
+        weighted_success = 0.0
+        total_gen_seconds = 0
+        weighted_tokens = 0.0
+        weighted_wer = 0.0
+        weighted_ne_wer = 0.0
+        weighted_ne_fnr = 0.0
+        wer_entries = 0
+        ne_wer_entries = 0
+        ne_fnr_entries = 0
+
+        for benchmark_data in benchmarks.values():
+            if eval_mode not in benchmark_data:
+                continue
+
+            metrics = benchmark_data[eval_mode]
+            num_entries = metrics["num_entries"]
+            if num_entries == 0:
+                continue
+
+            total_entries += num_entries
+            weighted_success += metrics["success_rate"] * num_entries
+            total_gen_seconds += metrics["gen_seconds"]
+            weighted_tokens += metrics["avg_tokens"] * num_entries
+
+            if "wer" in metrics:
+                weighted_wer += metrics["wer"] * num_entries
+                wer_entries += num_entries
+            if "ne_wer" in metrics:
+                weighted_ne_wer += metrics["ne_wer"] * num_entries
+                ne_wer_entries += num_entries
+            if "ne_fnr" in metrics:
+                weighted_ne_fnr += metrics["ne_fnr"] * num_entries
+                ne_fnr_entries += num_entries
+
+        if total_entries == 0:
+            continue
+
+        agg = {
+            "avg_tokens": int(weighted_tokens / total_entries),
+            "gen_seconds": total_gen_seconds,
+            "success_rate": weighted_success / total_entries,
+            "num_entries": total_entries,
+        }
+
+        if wer_entries > 0:
+            agg["wer"] = round(weighted_wer / wer_entries, 2)
+        if ne_wer_entries > 0:
+            agg["ne_wer"] = round(weighted_ne_wer / ne_wer_entries, 2)
+        if ne_fnr_entries > 0:
+            agg["ne_fnr"] = round(weighted_ne_fnr / ne_fnr_entries, 2)
+
+        aggregated[eval_mode] = agg
+
+    return aggregated
@@ -0,0 +1,19 @@
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ContextASR-Bench contextless mode: plain transcription without any context."""
+
+METRICS_TYPE = "contextasr"
+EVAL_ARGS = "++eval_type=contextasr"
+GENERATION_ARGS = "++prompt_format=openai ++enable_audio=true"
@@ -0,0 +1,19 @@
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ContextASR-Bench fine mode: domain label and entity list provided as context."""
+
+METRICS_TYPE = "contextasr"
+EVAL_ARGS = "++eval_type=contextasr"
+GENERATION_ARGS = "++prompt_format=openai ++enable_audio=true"