Commit ea16013
authored
[Feature] Add Qwen3-ASR MLX model and transcriber (1/2) (#150)
## Summary
Add Qwen3-ASR-0.6B as a second STT model alongside Whisper — model
implementation and offline transcriber only. No vLLM server changes.
- MLX model: Conv2d audio encoder → Qwen3 causal LM (GQA, QK-norm, RoPE,
tied embeddings)
- `Qwen3ASRTranscriber`: prompt construction, greedy decode,
post-processing
- Weight sanitization: HF `thinker.*` prefix mapping, Conv2d NCHW→NHWC
transpose
- 38 unit tests
Part 1 of 2. Part 2 (`stt/qwen3-asr-integration`) wires this into `vllm
serve`.
## Test
```bash
# Unit tests (38, <1s)
pytest tests/test_qwen3_asr.py -v -m "not slow"
# Regression (existing tests unaffected)
pytest tests/ -m "not slow" -q
# Slow tests (requires local model)
pytest tests/test_qwen3_asr.py -v -m slow
```
<img width="1433" height="276" alt="截圖 2026-03-09 晚上8 59 16"
src="https://github.com/user-attachments/assets/be5f3571-3f6c-411b-bc47-6f909c2e44de"
/>
<img width="1418" height="356" alt="截圖 2026-03-09 晚上9 01 57"
src="https://github.com/user-attachments/assets/632a5da3-2de5-41fb-abac-635f9e27bdcd"
/>
<img width="1423" height="477" alt="截圖 2026-03-09 晚上9 02 19"
src="https://github.com/user-attachments/assets/efd2de93-bf15-4d28-95d1-e219c3d2df60"
/>
---------
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <rickychen@infinirc.com>1 parent 4b3166d commit ea16013
5 files changed
Lines changed: 1643 additions & 31 deletions
0 commit comments