A FastAPI-based REST API for NVIDIA's family of Canary-1B ASR (Automatic Speech Recognition) models with support for translation.
- Fast and accurate speech-to-text transcription
- Support for both local audio files and URLs
- Optional translation to other languages
- Real-time factor (RTF) calculation
- Processing time metrics
- Simple REST API interface
- Clone the repository:
git clone https://github.com/zetaphor/nemo-canary-fastapi.git
cd nemo-canary-fastapi
- Start the service:
docker-compose up --build
The API will be available at http://localhost:8000
- Python 3.10
- NVIDIA GPU with CUDA support
- UV package manager
- Clone the repository:
git clone https://github.com/zetaphor/nemo-canary-fastapi.git
cd nemo-canary-fastapi
- Install dependencies:
uv sync
source venv/bin/activate # On Windows: venv\Scripts\activate
- Change the model in
api.py
if desired:
Available models:
canary_model = EncDecMultiTaskModel.from_pretrained('nvidia/canary-1b')
- Run the API:
uvicorn api:app --host 0.0.0.0 --port 8000
The API will be available at http://localhost:8000
POST /transcribe
The endpoint accepts JSON with the following parameters:
audio_source
: Path to local audio file or URL (required)target_lang
: Target language for translation (optional)
- Transcribe from local file:
curl -X POST "http://localhost:8000/transcribe" \
-H "Content-Type: application/json" \
-d '{"audio_source": "/path/to/audio.wav"}'
- Transcribe from URL:
curl -X POST "http://localhost:8000/transcribe" \
-H "Content-Type: application/json" \
-d '{"audio_source": "https://example.com/audio.wav"}'
- Translate from local file:
curl -X POST "http://localhost:8000/transcribe" \
-H "Content-Type: application/json" \
-d '{"audio_source": "/path/to/audio.wav", "target_lang": "de"}'
For ASR:
{
"text": "Hello, world!",
"processing_time_seconds": 0.12,
"audio_duration_seconds": 1.0,
"rtf": 0.12
}
For translation:
{
"text": "Hallo, Welt!",
"processing_time_seconds": 0.15,
"audio_duration_seconds": 1.0,
"rtf": 0.15,
"source_lang": "en",
"target_lang": "de"
}
To access the interactive API documentation, visit:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc