Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135
Conversation
a3b824c to
d1fca8c
Compare
d1fca8c to
b7f21f4
Compare
| vllm-omni) | ||
| FRAMEWORK_ENV_SETUP="export RAY_CGRAPH_get_timeout=1800; export no_proxy=\"0.0.0.0,\$no_proxy\"; export NO_PROXY=\"0.0.0.0,\$NO_PROXY\"" | ||
| FRAMEWORK_LAUNCH="vllm serve" | ||
| ;; |
There was a problem hiding this comment.
This line indeed seems redundant to me as it is identical to the vLLM case. We may change python3 -m vllm.entrypoints.openai.api_server with vllm serve as they are identical since the former one is deprecated. Nevertheless, please note that we have massively refactored the codebase in #100 and the template.jinja file is now completely removed. Instead, we are now rendering the job script during the runtime in framework.py.
|
|
||
| model: str | ||
| framework: Literal["sglang", "vllm"] | ||
| framework: Literal["sglang", "vllm", "vllm-omni"] |
There was a problem hiding this comment.
Adding vLLM OMNI beside vLLM as a new framework should be well justified. In the long vision we have, we would like to have to golden base images for vLLM and SGL (ref: #118). As a result, I would suggest to drop vllm-omni as a new framework for now and just use (--environment/--slurm-environment) to specify which toml file you want to use.
There was a problem hiding this comment.
Thank you for clarifying, I will remove vllm-omni as new framework and stick to using the original vllm
There was a problem hiding this comment.
Isn't it possible to patch the current vLLM image?
There was a problem hiding this comment.
Just to clarify what you mean by “patch the current vLLM image”?
I'm not sure if you meant one of the following:
- Use existing Docker vLLM CUDA13 base image if it exists and make
vllm_qwen3_tts_cuda13(derived image) that only adds vllm-omni and audio dependencies. - Modify the current
vllm_cuda13Dockerfile (image) itself to include vllm-omni and audio dependencies.
There was a problem hiding this comment.
The second one. In general, we are working toward keeping the number of images and environment as minimal as possible. So, adding a new image and environment only for a small class of models is not that much aligned with our long-term goals.
Adds
examples/clariden/cli/qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign-vllm-omni.sh, single-node launcher forQwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign, serving text-to-speech via vLLM-Omni on Clariden GH200.Adds
images/vllm_qwen3_tts_cuda13/Dockerfileandsrc/swiss_ai_model_launch/assets/envs/vllm_qwen3_tts_cuda13.toml, a CUDA13 vLLM-Omni TTS environment withvllm==0.20.2,vllm-omni==0.20.0,transformers==5.8.0, and audio dependencies such as FFmpeg, libsndfile, and soundfile.Adds
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesigntosrc/swiss_ai_model_launch/assets/models.json, an interactive SML catalog entry using vLLM-Omni with--max-model-len 8192and--gpu-memory-utilization 0.40. VoiceDesign was tested withtask_type=VoiceDesignand text instructions rather than preset CustomVoice speakers.Also adds
vllm-omnias a supported framework where required, matching the existing vLLM-Omni serving pattern.Validated from a clean checkout:
sml advancedlaunch workssmlcatalog launch works