Whisper OpenAI API

FastAPI-based wrapper around WhisperX, providing an openAI compatible API for transcription and speaker diarization.

Deployment

docker run -d \
  --gpus all \
  -p 8000:8000 \
  -e API_KEY=your-api-key \
  -e HF_TOKEN=your-hf-token \
  -v /data/models:/data/models \
  ghcr.io/etalab-ia/whisperx-openai-api:latest

Models are downloaded on first startup and cached in /data/models. Mount a persistent volume to avoid re-downloading on restart.

1 worker is recommended. GPU inference is serialized internally : multiple workers each load a full model copy in VRAM, and it doesn't improve throughput unless you have multiple GPUs.

To scale workers (each worker loads its own model in VRAM):

docker run -d --gpus all ... -e WORKERS=2 whisperx-openai-api

Environment Variables

Variable	Description	Default
API_KEY	API key for API access	Required
HF_TOKEN	Hugging Face token (required for diarization)	Required
TRANSCRIBE_MODEL	WhisperX model to load	`large-v3-turbo`
BATCH_SIZE	Transcription batch size	`32`
DIARIZE_MODEL	Pyannote diarization model	`pyannote/speaker-diarization-community-1`
PRELOADED_ALIGN_MODEL_LANGUAGES	Languages to pre-load alignment models for	`["en", "fr", "nl", "de"]`
RETURN_CHAR_ALIGNMENTS	Return character-level alignments (diarization only)	`false`
INTERPOLATE_METHOD	WhisperX interpolation method (diarization only)	`nearest`
FILL_NEAREST	Fill nearest gaps in speaker assignment (diarization only)	`false`
TIMEOUT_KEEP_ALIVE	Keep-alive timeout (seconds)	`60`
PORT	Server port	`8000`
WORKERS	Number of uvicorn workers (each loads its own model in VRAM)	`1`
RELOAD	Enable auto-reload	`false`
ROOT_PATH	API root path	`None`
LOGGING_CONFIG	Path to logging config file	`logging-config.yaml`
DEBUG	Enable debug logging	`false`

Local Development

Install uv

Install instructions in this link.

API-only Development

Inference libraries (whisperx, pytorch, etc.) are heavy and may not run on all devices. We provide a dev dependency group to allow running API tests locally and IDE autocompletion. To install:

uv sync --group dev

Full Inference Development

To develop with a fully functional transcription pipeline:

uv sync --group dev --group inference

Run the server locally:

export PORT=8010
export RELOAD=true
export LOGGING_CONFIG=logging-config.yaml
python app/main.py

Testing

Tests mock actual inference and can be run locally:

cd app
python -m pytest tests/ -v

Integration tests

Check the documentation to run integration tests on GPU.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
app		app
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logging-config.yaml		logging-config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper OpenAI API

Deployment

Environment Variables

Local Development

Install uv

API-only Development

Full Inference Development

Testing

Integration tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Whisper OpenAI API

Deployment

Environment Variables

Local Development

Install uv

API-only Development

Full Inference Development

Testing

Integration tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages