NeMoASR

Automatic speech recognition with speaker diarisation.

Based on:

NVIDIA NeMo Parakeet TDT 0.6b V3: Multilingual Speech-to-Text Model for automatic speech recognition
NVIDIA NeMo Sortformer Diarizer 4spk v1 for speaker diarisation

Requirements

Setup

Linux:

sudo apt install ffmpeg

conda create -n nemoasr python=3.12
conda activate nemoasr

pip install git+https://github.com/HanBnrd/NeMoASR.git

MacOS:

brew install ffmpeg

conda create -n nemoasr python=3.12
conda activate nemoasr

pip install git+https://github.com/HanBnrd/NeMoASR.git

Update NeMoASR

pip install --upgrade git+https://github.com/HanBnrd/NeMoASR.git

Usage

To transcribe a WAV or MPEG file:

nemoasr myfile.mp3

Note: running this for the first time may be long as the models need to be downloaded.

The default configuration cuts long audio files into 7-minute chunks, which should work well on machines with limited RAM or VRAM. However, the chunk duration can be adjusted if needed. For example with more RAM or VRAM:

nemoasr myfile.mp3 --max-duration=12

This will cut a long audio file into chunks of 12 minutes maximum.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NeMoASR

Requirements

Setup

Update NeMoASR

Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

NeMoASR

Requirements

Setup

Update NeMoASR

Usage