Skip to content

Commit 9ce6c90

Browse files
authored
[Feature] Add v1 STT integration for Whisper models (#143)
## Summary Wire the STT pipeline (PRs #96, #98, #126, #133, #137) into vLLM's v1 engine so that `vllm serve openai/whisper-small` serves `/v1/audio/transcriptions` and `/v1/audio/translations` endpoints. - **model_runner**: STT model loading with caching, dummy KV cache spec (Whisper self-manages KV), warm-up from model config, greedy decode loop, audio feature extraction (handles `MultiModalKwargsItem`/UserDict, bfloat16 torch tensors, shape transpose) - **platform**: STT auto-detection, tokenizer fallback (only when unset), disable `async_scheduling` - **worker**: skip paged attention for STT, nominal memory for scheduler, `get_supported_tasks` returns `("transcription",)` - **docs**: add `whisper-large-v3-turbo` to model table ## Test <img width="1453" height="432" alt="截圖 2026-03-07 中午12 54 23" src="https://github.com/user-attachments/assets/241d81ee-1437-4ecd-8a93-ea5eca6937ca" /> <img width="1436" height="328" alt="截圖 2026-03-07 中午12 55 16" src="https://github.com/user-attachments/assets/6ff2b61d-bdd0-4023-911d-266296933ae2" /> <img width="864" height="193" alt="截圖 2026-03-07 下午1 06 37" src="https://github.com/user-attachments/assets/f2af98c7-083c-44d9-ba31-4f50f9e62ae6" /> ## Verification ### Unit tests ``` pytest tests/test_v1_stt_integration.py tests/test_stt.py tests/test_whisper.py tests/test_transcribe.py -v -m "not slow" ``` ### End-to-end ``` pytest tests/test_v1_stt_integration.py -v -m slow ``` ### Server smoke test (requires local model) ``` vllm serve /path/to/whisper-small-mlx --port 8000 ``` ``` curl http://localhost:8000/v1/audio/transcriptions \ -F file=@test.wav \ -F model=whisper ``` --------- Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
1 parent 4ab472c commit 9ce6c90

7 files changed

Lines changed: 988 additions & 18 deletions

File tree

docs/stt.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ Any OpenAI Whisper checkpoint (HuggingFace or MLX format):
5050
| Whisper Small | 244M | `openai/whisper-small` |
5151
| Whisper Medium | 769M | `openai/whisper-medium` |
5252
| Whisper Large V3 | 1.5B | `openai/whisper-large-v3` |
53+
| Whisper Large V3 Turbo | 809M | `openai/whisper-large-v3-turbo` |
5354

5455
MLX-format weights (e.g. from `mlx-community`) are also supported.
5556

0 commit comments

Comments
 (0)