Skip to content

Latest commit

 

History

History
90 lines (65 loc) · 2.81 KB

File metadata and controls

90 lines (65 loc) · 2.81 KB

Whisper OpenAI API

FastAPI-based wrapper around WhisperX, providing an openAI compatible API for transcription and speaker diarization.

Deployment

docker run -d \
  --gpus all \
  -p 8000:8000 \
  -e API_KEY=your-api-key \
  -e HF_TOKEN=your-hf-token \
  -v /data/models:/data/models \
  ghcr.io/etalab-ia/whisperx-openai-api:latest

Models are downloaded on first startup and cached in /data/models. Mount a persistent volume to avoid re-downloading on restart.

1 worker is recommended. GPU inference is serialized internally : multiple workers each load a full model copy in VRAM, and it doesn't improve throughput unless you have multiple GPUs.

To scale workers (each worker loads its own model in VRAM):

docker run -d --gpus all ... -e WORKERS=2 whisperx-openai-api

Environment Variables

Variable Description Default
API_KEY API key for API access Required
HF_TOKEN Hugging Face token (required for diarization) Required
TRANSCRIBE_MODEL WhisperX model to load large-v3-turbo
BATCH_SIZE Transcription batch size 32
DIARIZE_MODEL Pyannote diarization model pyannote/speaker-diarization-community-1
PRELOADED_ALIGN_MODEL_LANGUAGES Languages to pre-load alignment models for ["en", "fr", "nl", "de"]
RETURN_CHAR_ALIGNMENTS Return character-level alignments (diarization only) false
INTERPOLATE_METHOD WhisperX interpolation method (diarization only) nearest
FILL_NEAREST Fill nearest gaps in speaker assignment (diarization only) false
TIMEOUT_KEEP_ALIVE Keep-alive timeout (seconds) 60
PORT Server port 8000
WORKERS Number of uvicorn workers (each loads its own model in VRAM) 1
RELOAD Enable auto-reload false
ROOT_PATH API root path None
LOGGING_CONFIG Path to logging config file logging-config.yaml
DEBUG Enable debug logging false

Local Development

Install uv

Install instructions in this link.

API-only Development

Inference libraries (whisperx, pytorch, etc.) are heavy and may not run on all devices. We provide a dev dependency group to allow running API tests locally and IDE autocompletion. To install:

uv sync --group dev

Full Inference Development

To develop with a fully functional transcription pipeline:

uv sync --group dev --group inference

Run the server locally:

export PORT=8010
export RELOAD=true
export LOGGING_CONFIG=logging-config.yaml
python app/main.py

Testing

Tests mock actual inference and can be run locally:

cd app
python -m pytest tests/ -v

Integration tests

Check the documentation to run integration tests on GPU.