Canary ASR API

A FastAPI-based REST API for NVIDIA's family of Canary-1B ASR (Automatic Speech Recognition) models with support for translation.

Features

Fast and accurate speech-to-text transcription
Support for both local audio files and URLs
Optional translation to other languages
Real-time factor (RTF) calculation
Processing time metrics
Simple REST API interface

Installation Options

Option 1: Quick Start with Docker

Clone the repository:

git clone https://github.com/zetaphor/nemo-canary-fastapi.git
cd nemo-canary-fastapi

Start the service:

docker-compose up --build

The API will be available at http://localhost:8000

Option 2: Local Installation

Prerequisites

Python 3.10
NVIDIA GPU with CUDA support
UV package manager

Clone the repository:

git clone https://github.com/zetaphor/nemo-canary-fastapi.git
cd nemo-canary-fastapi

Install dependencies:

uv sync
source venv/bin/activate  # On Windows: venv\Scripts\activate

Change the model in api.py if desired:

Available models:

canary_model = EncDecMultiTaskModel.from_pretrained('nvidia/canary-1b')

Run the API:

uvicorn api:app --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

API Usage

Transcribe Endpoint

POST /transcribe

The endpoint accepts JSON with the following parameters:

audio_source: Path to local audio file or URL (required)
target_lang: Target language for translation (optional)

Examples

Transcribe from local file:

curl -X POST "http://localhost:8000/transcribe" \
    -H "Content-Type: application/json" \
    -d '{"audio_source": "/path/to/audio.wav"}'

Transcribe from URL:

curl -X POST "http://localhost:8000/transcribe" \
    -H "Content-Type: application/json" \
    -d '{"audio_source": "https://example.com/audio.wav"}'

Translate from local file:

curl -X POST "http://localhost:8000/transcribe" \
    -H "Content-Type: application/json" \
    -d '{"audio_source": "/path/to/audio.wav", "target_lang": "de"}'

Response Format

For ASR:

{
    "text": "Hello, world!",
    "processing_time_seconds": 0.12,
    "audio_duration_seconds": 1.0,
    "rtf": 0.12
}

For translation:

{
    "text": "Hallo, Welt!",
    "processing_time_seconds": 0.15,
    "audio_duration_seconds": 1.0,
    "rtf": 0.15,
    "source_lang": "en",
    "target_lang": "de"
}

Development

To access the interactive API documentation, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
api.py		api.py
asr_manifest.json		asr_manifest.json
canary.py		canary.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
translation_manifest.json		translation_manifest.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Canary ASR API

Features

Installation Options

Option 1: Quick Start with Docker

Option 2: Local Installation

Prerequisites

API Usage

Transcribe Endpoint

Examples

Response Format

Development

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Cross-Product-Labs/nemo-canary-fastapi

Folders and files

Latest commit

History

Repository files navigation

Canary ASR API

Features

Installation Options

Option 1: Quick Start with Docker

Option 2: Local Installation

Prerequisites

API Usage

Transcribe Endpoint

Examples

Response Format

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages