Commit 9ce6c90
authored
[Feature] Add v1 STT integration for Whisper models (#143)
## Summary
Wire the STT pipeline (PRs #96, #98, #126, #133, #137) into vLLM's v1
engine so that `vllm serve openai/whisper-small` serves
`/v1/audio/transcriptions` and `/v1/audio/translations` endpoints.
- **model_runner**: STT model loading with caching, dummy KV cache spec
(Whisper self-manages KV), warm-up from model config, greedy decode
loop, audio feature extraction (handles `MultiModalKwargsItem`/UserDict,
bfloat16 torch tensors, shape transpose)
- **platform**: STT auto-detection, tokenizer fallback (only when
unset), disable `async_scheduling`
- **worker**: skip paged attention for STT, nominal memory for
scheduler, `get_supported_tasks` returns `("transcription",)`
- **docs**: add `whisper-large-v3-turbo` to model table
## Test
<img width="1453" height="432" alt="截圖 2026-03-07 中午12 54 23"
src="https://github.com/user-attachments/assets/241d81ee-1437-4ecd-8a93-ea5eca6937ca"
/>
<img width="1436" height="328" alt="截圖 2026-03-07 中午12 55 16"
src="https://github.com/user-attachments/assets/6ff2b61d-bdd0-4023-911d-266296933ae2"
/>
<img width="864" height="193" alt="截圖 2026-03-07 下午1 06 37"
src="https://github.com/user-attachments/assets/f2af98c7-083c-44d9-ba31-4f50f9e62ae6"
/>
## Verification
### Unit tests
```
pytest tests/test_v1_stt_integration.py tests/test_stt.py tests/test_whisper.py tests/test_transcribe.py -v -m "not slow"
```
### End-to-end
```
pytest tests/test_v1_stt_integration.py -v -m slow
```
### Server smoke test (requires local model)
```
vllm serve /path/to/whisper-small-mlx --port 8000
```
```
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@test.wav \
-F model=whisper
```
---------
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>1 parent 4ab472c commit 9ce6c90
7 files changed
Lines changed: 988 additions & 18 deletions
File tree
- docs
- tests
- vllm_metal
- stt
- v1
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
0 commit comments