Skip to content

Commit 1caecb7

Browse files
committed
Update package version and README.md
1 parent a2998be commit 1caecb7

2 files changed

Lines changed: 33 additions & 12 deletions

File tree

README.md

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,21 @@
11
# Automatic Speech Recognition in Python using ONNX models
22

3+
[![CI](https://github.com/istupakov/onnx-asr/actions/workflows/python-package.yml/badge.svg)](https://github.com/istupakov/onnx-asr/actions/workflows/python-package.yml)
4+
[![PyPI - Version](https://img.shields.io/pypi/v/onnx-asr.svg)](https://pypi.org/project/onnx-asr)
5+
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/onnx-asr.svg)](https://pypi.org/project/onnx-asr)
6+
37
The simple speech recognition package with minimal dependencies:
48
* NumPy ([numpy](https://numpy.org/))
59
* ONNX Runtime ([onnxruntime](https://onnxruntime.ai/))
610
* (*optional*) Hugging Face Hub ([huggingface_hub](https://huggingface.co/))
711

12+
The package does not yet have built-in VAD support, so in order to recognize long audio files, they must first be split into parts.
13+
814
## Supported models
915
* Nvidia NeMo Conformer/FastConformer (with CTC and RNN-T decoders)
1016
* Kaldi Icefall Zipformer (with stateless RNN-T decoder) including Alpha Cephei Vosk 0.52+
1117
* Sber GigaAM v2 (with CTC and RNN-T decoders)
12-
* OpenAI Whisper (*experimental*)
18+
* OpenAI Whisper (with simple decoding)
1319

1420
## Installation
1521

@@ -19,15 +25,19 @@ The package can be installed from [PyPI](https://pypi.org/project/onnx-asr/):
1925
```shell
2026
pip install onnx-asr[cpu,hub]
2127
```
22-
2. With CPU `onnxruntime`
28+
2. With GPU `onnxruntime` and `huggingface_hub`
29+
```shell
30+
pip install onnx-asr[gpu,hub]
31+
```
32+
3. With CPU `onnxruntime`
2333
```shell
2434
pip install onnx-asr[cpu]
2535
```
26-
3. With GPU `onnxruntime`
36+
4. With GPU `onnxruntime`
2737
```shell
2838
pip install onnx-asr[gpu]
2939
```
30-
4. Without `onnxruntime` (if you already have some `onnxruntime` version installed)
40+
5. Without `onnxruntime` (if you already have some `onnxruntime` version installed)
3141
```shell
3242
pip install onnx-asr
3343
```
@@ -39,7 +49,7 @@ pip install onnx-asr
3949
Load ONNX model from Hugging Face and recognize wav file:
4050
```py
4151
import onnx_asr
42-
model = onnx_asr.load_model("nemo-fastconformer-ru-ctc")
52+
model = onnx_asr.load_model("gigaam-v2-rnnt")
4353
print(model.recognize("test.wav"))
4454
```
4555

@@ -48,9 +58,10 @@ print(model.recognize("test.wav"))
4858
* `gigaam-v2-rnnt` for Sber GigaAM v2 RNN-T ([origin](https://github.com/salute-developers/GigaAM), [onnx](https://huggingface.co/istupakov/gigaam-v2-onnx))
4959
* `nemo-fastconformer-ru-ctc` for Nvidia FastConformer-Hybrid Large (ru) with CTC decoder ([origin](https://huggingface.co/nvidia/stt_ru_fastconformer_hybrid_large_pc), [onnx](https://huggingface.co/istupakov/stt_ru_fastconformer_hybrid_large_pc_onnx))
5060
* `nemo-fastconformer-ru-rnnt` for Nvidia FastConformer-Hybrid Large (ru) with RNN-T decoder ([origin](https://huggingface.co/nvidia/stt_ru_fastconformer_hybrid_large_pc), [onnx](https://huggingface.co/istupakov/stt_ru_fastconformer_hybrid_large_pc_onnx))
51-
* `vosk-model-ru` for Alpha Cephei Vosk 0.54-ru ([origin](https://huggingface.co/alphacep/vosk-model-ru))
52-
* `vosk-model-small-ru` for Alpha Cephei Vosk 0.52-small-ru ([origin](https://huggingface.co/alphacep/vosk-model-small-ru))
53-
* `whisper-base-ort` for OpenAI Whisper Base exported with onnxruntime ([origin](https://huggingface.co/openai/whisper-base), [onnx](https://huggingface.co/istupakov/whisper-base-onnx))
61+
* `whisper-base` for OpenAI Whisper Base exported with onnxruntime ([origin](https://huggingface.co/openai/whisper-base), [onnx](https://huggingface.co/istupakov/whisper-base-onnx))
62+
* `alphacep/vosk-model-ru` for Alpha Cephei Vosk 0.54-ru ([origin](https://huggingface.co/alphacep/vosk-model-ru))
63+
* `alphacep/vosk-model-small-ru` for Alpha Cephei Vosk 0.52-small-ru ([origin](https://huggingface.co/alphacep/vosk-model-small-ru))
64+
* `onnx-community/whisper-tiny`, `onnx-community/whisper-base`, `onnx-community/whisper-small`, `onnx-community/whisper-large-v3-turbo`, etc. for OpenAI Whisper exported with Hugging Face optimum ([onnx-community](https://huggingface.co/onnx-community?search_models=whisper))
5465

5566
Supported wav file formats: PCM_U8, PCM_16, PCM_24 and PCM_32 formats with 16 kHz sample rate. For other formats, you either need to convert them first, or use a library that can read them into a numpy array.
5667

@@ -59,7 +70,7 @@ Example with `soundfile`:
5970
import onnx_asr
6071
import soundfile as sf
6172

62-
model = onnx_asr.load_model("gigaam-v2-ctc")
73+
model = onnx_asr.load_model("whisper-base")
6374

6475
waveform, sample_rate = sf.read("test.wav", dtype="float32")
6576
model.recognize(waveform)
@@ -72,6 +83,13 @@ model = onnx_asr.load_model("nemo-fastconformer-ru-ctc")
7283
print(model.recognize(["test1.wav", "test2.wav", "test3.wav", "test4.wav"]))
7384
```
7485

86+
Some models have a quantized versions:
87+
```py
88+
import onnx_asr
89+
model = onnx_asr.load_model("alphacep/vosk-model-ru", quantization="int8")
90+
print(model.recognize(["test1.wav", "test2.wav", "test3.wav", "test4.wav"]))
91+
```
92+
7593
### CLI
7694

7795
Package has simple CLI interface
@@ -162,7 +180,7 @@ Read onnxruntime [instruction](https://github.com/microsoft/onnxruntime/blob/mai
162180

163181
Download model and export with *Beam Search* and *Forced Decoder Input Ids*:
164182
```shell
165-
python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-base --output whisper-onnx --use_external_data_format --use_forced_decoder_ids --optimize_onnx --precision fp32
183+
python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-base --output ./whisper-onnx --use_external_data_format --use_forced_decoder_ids --optimize_onnx --precision fp32
166184
```
167185

168186
Save preprocessor and tokenizer configs
@@ -177,5 +195,5 @@ processor.save_pretrained("whisper-onnx")
177195

178196
Export model to ONNX with Hugging Face `optimum-cli`
179197
```shell
180-
optimum-cli export onnx --model openai/whisper-base ./whisper-onnx/
198+
optimum-cli export onnx --model openai/whisper-base ./whisper-onnx
181199
```

pyproject.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "onnx-asr"
3-
version = "0.2.0"
3+
dynamic = ["version"]
44
description = "Automatic Speech Recognition in Python using ONNX models"
55
authors = [{ name = "Ilya Stupakov", email = "istupakov@gmail.com" }]
66
keywords = ["asr", "speech recognition", "onnx"]
@@ -59,6 +59,9 @@ lint = ["ruff>=0.11.6"]
5959
[tool.pdm]
6060
distribution = true
6161

62+
[tool.pdm.version]
63+
source = "scm"
64+
6265
[tool.pdm.build]
6366
source-includes = ["preprocessors", "tests"]
6467

0 commit comments

Comments
 (0)