You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/evaluation/speech-audio.md
+107Lines changed: 107 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -483,3 +483,110 @@ Numb3rs reports the following metrics:
483
483
-**success_rate**: Percentage of samples with WER < 0.5
484
484
485
485
Per-category breakdowns (e.g., `numb3rs-numb3rs_CARDINAL`, `numb3rs-numb3rs_MONEY`) are included automatically.
486
+
487
+
## ContextASR-Bench
488
+
489
+
ContextASR-Bench evaluates contextual ASR performance by measuring how well models transcribe speech when given different levels of contextual information. It focuses on named entity recognition accuracy alongside standard WER.
490
+
491
+
**Dataset:**[MrSupW/ContextASR-Bench](https://huggingface.co/datasets/MrSupW/ContextASR-Bench) (English Speech subset: 15,326 samples, ~188 hours, 116,167 named entities across 10+ domains)
492
+
493
+
**Evaluation Modes:**
494
+
495
+
-`contextasr-bench.contextless`: Plain transcription (no context)
496
+
-`contextasr-bench.coarse`: Domain label provided as context
497
+
-`contextasr-bench.fine`: Domain label + entity list provided as context
498
+
499
+
**Metrics:**
500
+
501
+
-**WER**: Word Error Rate (corpus-level)
502
+
-**NE-WER**: Named Entity WER — WER computed on fuzzy-matched entity token sequences
503
+
-**NE-FNR**: Named Entity False Negative Rate — fraction of reference entities not found in the transcription
504
+
505
+
### Dataset Location
506
+
507
+
* Benchmark is defined in `nemo_skills/dataset/contextasr-bench/__init__.py`
508
+
* Original dataset is hosted on [HuggingFace](https://huggingface.co/datasets/MrSupW/ContextASR-Bench)
509
+
510
+
### Preparing ContextASR-Bench Data
511
+
512
+
ContextASR-Bench requires audio files for meaningful evaluation. **Audio files are downloaded
513
+
automatically by default** from HuggingFace (~22 GB, may take 30-60 minutes).
514
+
515
+
```bash
516
+
ns prepare_data contextasr-bench
517
+
```
518
+
519
+
!!! warning "Large download"
520
+
521
+
The automatic download fetches ~22 GB of audio data (JSONL + 8 tar files) from HuggingFace.
522
+
This can take 30-60 minutes depending on network speed. If you already have the data
523
+
downloaded, use `--data_dir` to skip the download.
524
+
525
+
To download to a specific directory, or to use pre-downloaded data:
0 commit comments