Skip to content

Latest commit

 

History

History
57 lines (45 loc) · 1.38 KB

File metadata and controls

57 lines (45 loc) · 1.38 KB

NeMoASR

Automatic speech recognition with speaker diarisation.

Based on:

Requirements

Python 3.12+

Setup

Linux:

sudo apt install ffmpeg
conda create -n nemoasr python=3.12
conda activate nemoasr
pip install git+https://github.com/HanBnrd/NeMoASR.git

MacOS:

brew install ffmpeg
conda create -n nemoasr python=3.12
conda activate nemoasr
pip install git+https://github.com/HanBnrd/NeMoASR.git

Update NeMoASR

pip install --upgrade git+https://github.com/HanBnrd/NeMoASR.git

Usage

To transcribe a WAV or MPEG file:

nemoasr myfile.mp3

Note: running this for the first time may be long as the models need to be downloaded.

The default configuration cuts long audio files into 7-minute chunks, which should work well on machines with limited RAM or VRAM. However, the chunk duration can be adjusted if needed. For example with more RAM or VRAM:

nemoasr myfile.mp3 --max-duration=12

This will cut a long audio file into chunks of 12 minutes maximum.