FastAPI-based wrapper around WhisperX, providing an openAI compatible API for transcription and speaker diarization.
docker run -d \
--gpus all \
-p 8000:8000 \
-e API_KEY=your-api-key \
-e HF_TOKEN=your-hf-token \
-v /data/models:/data/models \
ghcr.io/etalab-ia/whisperx-openai-api:latestModels are downloaded on first startup and cached in /data/models. Mount a persistent volume to avoid re-downloading on restart.
1 worker is recommended. GPU inference is serialized internally : multiple workers each load a full model copy in VRAM, and it doesn't improve throughput unless you have multiple GPUs.
To scale workers (each worker loads its own model in VRAM):
docker run -d --gpus all ... -e WORKERS=2 whisperx-openai-api| Variable | Description | Default |
|---|---|---|
| API_KEY | API key for API access | Required |
| HF_TOKEN | Hugging Face token (required for diarization) | Required |
| TRANSCRIBE_MODEL | WhisperX model to load | large-v3-turbo |
| BATCH_SIZE | Transcription batch size | 32 |
| DIARIZE_MODEL | Pyannote diarization model | pyannote/speaker-diarization-community-1 |
| PRELOADED_ALIGN_MODEL_LANGUAGES | Languages to pre-load alignment models for | ["en", "fr", "nl", "de"] |
| RETURN_CHAR_ALIGNMENTS | Return character-level alignments (diarization only) | false |
| INTERPOLATE_METHOD | WhisperX interpolation method (diarization only) | nearest |
| FILL_NEAREST | Fill nearest gaps in speaker assignment (diarization only) | false |
| TIMEOUT_KEEP_ALIVE | Keep-alive timeout (seconds) | 60 |
| PORT | Server port | 8000 |
| WORKERS | Number of uvicorn workers (each loads its own model in VRAM) | 1 |
| RELOAD | Enable auto-reload | false |
| ROOT_PATH | API root path | None |
| LOGGING_CONFIG | Path to logging config file | logging-config.yaml |
| DEBUG | Enable debug logging | false |
Install instructions in this link.
Inference libraries (whisperx, pytorch, etc.) are heavy and may not run on all devices. We provide a dev dependency group to allow running API tests locally and IDE autocompletion. To install:
uv sync --group devTo develop with a fully functional transcription pipeline:
uv sync --group dev --group inferenceRun the server locally:
export PORT=8010
export RELOAD=true
export LOGGING_CONFIG=logging-config.yaml
python app/main.pyTests mock actual inference and can be run locally:
cd app
python -m pytest tests/ -vCheck the documentation to run integration tests on GPU.