You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-13Lines changed: 34 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,11 @@ Omnilingual ASR is an open-source speech recognition system supporting over 1,60
18
18
</div>
19
19
20
20
21
+
## December 2025 Update
22
+
We release two suites of models:
23
+
- Checkpoints of improved accuracy (CER) for the CTC and LLM-ASR models compared to our existing LLM-ASR model (`omniASR_{CTC,LLM}_{300M,1B,3B,7B}_v2`).
24
+
- A new variant of the LLM-ASR model that supports decoding on unlimited audio length (`omniASR_LLM_Unlimited_{300M,1B,3B,7B}_v2`). The unlimited audio length models are briefly described in the [architecture overview section](src/omnilingual_asr/models/README.md). It's accuracy is comparable to limited audio length models, however finetuning recipies for this model are currently not supported.
25
+
21
26
## Documentation
22
27
23
28
### Quick Start
@@ -54,16 +59,15 @@ uv add omnilingual-asr
54
59
```python
55
60
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
More details on running specific models can be found in the [src/omnilingual_asr/models/inference](/src/omnilingual_asr/models/inference/README.md) directory.
65
69
66
-
> **⚠️ Important:** Currently only audio files shorter than 40 seconds are accepted for inference. We plan to add support for transcribing unlimited-length audio files shortly.
70
+
> **⚠️ Important:** Currently only audio files shorter than 40 seconds are accepted for inference on CTC and LLM model suites.
|[`omniASR_tokenizer`](https://dl.fbaipublicfiles.com/mms/omniASR_tokenizer.model)| Tokenizer for most of architectures (except omniASR_LLM_7B) | - | 100 KiB | - |
137
-
|[`omniASR_tokenizer_v7`](https://dl.fbaipublicfiles.com/mms/omniASR_tokenizer_v7.model)| Tokenizer for omniASR_LLM_7B model | - | 100 KiB | - ||
152
+
|[`omniASR_tokenizer_v1`](https://dl.fbaipublicfiles.com/mms/omniASR_tokenizer.model)| Tokenizer for all non-v2 models except omniASR_LLM_7B | - | 100 KiB | - |
153
+
|[`omniASR_tokenizer_v1_variant7`](https://dl.fbaipublicfiles.com/mms/omniASR_tokenizer_v7.model)| Tokenizer for the omniASR_LLM_7B architecture | - | 100 KiB | - |
154
+
|[`omniASR_tokenizer_written_v2`](https://dl.fbaipublicfiles.com/mms/omniASR_tokenizer_written_v2.model)| Tokenizer for all v2 architectures | - | 100 KiB | - ||
138
155
139
156
¹ (batch=1, audio_len=30s, BF16, A100)
140
157
141
158
² Relative speed to `omniASR_LLM_7B`
142
159
160
+
³ (batch=1, audio_len=15min, BF16, A100)
143
161
144
162
### Model Download & Storage
145
163
@@ -165,12 +183,15 @@ Omnilingual ASR code and models are released under the [Apache 2.0](./LICENSE).
165
183
166
184
## Citation
167
185
168
-
If you use the omnilingual ASR model suite in your research and wish to cite us, please use the following BibTeX entry (arxiv version will be added soon)!
186
+
If you use the omnilingual ASR model suite in your research and wish to cite us, please use the following BibTeX entry!
169
187
```bibtex
170
-
@misc{omnilingualasr2025,
171
-
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
172
-
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Balioglu, Can and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
title={Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages},
190
+
author={Omnilingual ASR team and Gil Keren and Artyom Kozhevnikov and Yen Meng and Christophe Ropers and Matthew Setzler and Skyler Wang and Ife Adebara and Michael Auli and Can Balioglu and Kevin Chan and Chierh Cheng and Joe Chuang and Caley Droof and Mark Duppenthaler and Paul-Ambroise Duquenne and Alexander Erben and Cynthia Gao and Gabriel Mejia Gonzalez and Kehan Lyu and Sagar Miglani and Vineel Pratap and Kaushik Ram Sadagopan and Safiyyah Saleem and Arina Turkatenko and Albert Ventayol-Boada and Zheng-Xin Yong and Yu-An Chung and Jean Maillard and Rashel Moritz and Alexandre Mourachko and Mary Williamson and Shireen Yates},
Or in a training recipe configuration (e.g., [`/workflows/recipes/wav2vec2/asr/configs/ctc-finetune.yaml`](/workflows/recipes/wav2vec2/asr/configs/ctc-finetune.yaml)):
24
24
25
25
```yaml
26
26
27
27
model:
28
-
name: "omniASR_CTC_300M"
28
+
name: "omniASR_CTC_300M_v2"
29
29
30
30
trainer:
31
31
(...)
@@ -42,7 +42,7 @@ optimizer:
42
42
43
43
* `model_arch`: Specific configuration for the model family (e.g., [`1b`](/src/omnilingual_asr/models/wav2vec2_llama/config.py) for `wav2vec2_llama`)
44
44
45
-
* `checkpoint`: Model storage URI, can be a local path (`"$HOME/.cache/"`), a direct download link (`"https://dl.fbaipublicfiles.com/mms/omniASR_LLM_300M.pt"`) or a reference to a huggingface repository (`"hg://qwen/qwen2.5-7b"`) if the model is in a `.safetensors` format.
45
+
* `checkpoint`: Model storage URI, can be a local path (`"$HOME/.cache/"`), a direct download link (`"https://dl.fbaipublicfiles.com/mms/omniASR_LLM_300M_v2.pt"`) or a reference to a huggingface repository (`"hg://qwen/qwen2.5-7b"`) if the model is in a `.safetensors` format.
46
46
47
47
* `tokenizer_ref`: Links to tokenizer asset for training.
0 commit comments