Whisper OpenAI API

FastAPI-based wrapper around WhisperX, providing an openAI compatible API for transcription and speaker diarization.

Deployment

docker run -d \
  --gpus all \
  -p 8000:8000 \
  -e API_KEY=your-api-key \
  -e HF_TOKEN=your-hf-token \
  -v /data/models:/data/models \
  ghcr.io/etalab-ia/whisperx-openai-api:latest

Models are downloaded on first startup and cached in /data/models. Mount a persistent volume to avoid re-downloading on restart.

1 worker is recommended. GPU inference is serialized internally : multiple workers each load a full model copy in VRAM, and it doesn't improve throughput unless you have multiple GPUs.

To scale workers (each worker loads its own model in VRAM):

docker run -d --gpus all ... -e WORKERS=2 whisperx-openai-api

Environment Variables

Variable	Description	Default
API_KEY	API key for API access	Required
HF_TOKEN	Hugging Face token (required for diarization)	Required
TRANSCRIBE_MODEL	WhisperX model to load	`large-v3-turbo`
BATCH_SIZE	Transcription batch size	`32`
DIARIZE_MODEL	Pyannote diarization model	`pyannote/speaker-diarization-community-1`
PRELOADED_ALIGN_MODEL_LANGUAGES	Languages to pre-load alignment models for	`["en", "fr", "nl", "de"]`
RETURN_CHAR_ALIGNMENTS	Return character-level alignments (diarization only)	`false`
INTERPOLATE_METHOD	WhisperX interpolation method (diarization only)	`nearest`
FILL_NEAREST	Fill nearest gaps in speaker assignment (diarization only)	`false`
TIMEOUT_KEEP_ALIVE	Keep-alive timeout (seconds)	`60`
PORT	Server port	`8000`
WORKERS	Number of uvicorn workers (each loads its own model in VRAM)	`1`
RELOAD	Enable auto-reload	`false`
ROOT_PATH	API root path	`None`
LOGGING_CONFIG	Path to logging config file	`logging-config.yaml`
DEBUG	Enable debug logging	`false`

Local Development

Install uv

Install instructions in this link.

API-only Development

Inference libraries (whisperx, pytorch, etc.) are heavy and may not run on all devices. We provide a dev dependency group to allow running API tests locally and IDE autocompletion. To install:

uv sync --group dev

Full Inference Development

To develop with a fully functional transcription pipeline:

uv sync --group dev --group inference

Run the server locally:

export PORT=8010
export RELOAD=true
export LOGGING_CONFIG=logging-config.yaml
python app/main.py

Testing

Tests mock actual inference and can be run locally:

cd app
python -m pytest tests/ -v

Integration tests

Check the documentation to run integration tests on GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper OpenAI API

Deployment

Environment Variables

Local Development

Install uv

API-only Development

Full Inference Development

Testing

Integration tests

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Whisper OpenAI API

Deployment

Environment Variables

Local Development

Install uv

API-only Development

Full Inference Development

Testing

Integration tests