Add Qwen3-TTS VoiceDesign vLLM-Omni launcher by yfchoco208 · Pull Request #135 · swiss-ai/model-launch

yfchoco208 · 2026-05-20T04:36:17Z

Adds examples/clariden/cli/qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign-vllm-omni.sh, single-node launcher for Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign, serving text-to-speech via vLLM-Omni on Clariden GH200.

Adds images/vllm_qwen3_tts_cuda13/Dockerfile and src/swiss_ai_model_launch/assets/envs/vllm_qwen3_tts_cuda13.toml, a CUDA13 vLLM-Omni TTS environment with vllm==0.20.2, vllm-omni==0.20.0, transformers==5.8.0, and audio dependencies such as FFmpeg, libsndfile, and soundfile.

Adds Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign to src/swiss_ai_model_launch/assets/models.json, an interactive SML catalog entry using vLLM-Omni with --max-model-len 8192 and --gpu-memory-utilization 0.40. VoiceDesign was tested with task_type=VoiceDesign and text instructions rather than preset CustomVoice speakers.

Also adds vllm-omni as a supported framework where required, matching the existing vLLM-Omni serving pattern.

Validated from a clean checkout:

sml advanced launch works
interactive sml catalog launch works

AryanAhadinia

Thanks a lot for your contribution! We would love to merge your PR after addressing the listed comments! Keep doing the great job!

Please also note that your PR has now conflicts that should be solved prior to merge.

AryanAhadinia · 2026-05-20T19:02:42Z

+    vllm-omni)
+        FRAMEWORK_ENV_SETUP="export RAY_CGRAPH_get_timeout=1800; export no_proxy=\"0.0.0.0,\$no_proxy\"; export NO_PROXY=\"0.0.0.0,\$NO_PROXY\""
+        FRAMEWORK_LAUNCH="vllm serve"
+        ;;


This line indeed seems redundant to me as it is identical to the vLLM case. We may change python3 -m vllm.entrypoints.openai.api_server with vllm serve as they are identical since the former one is deprecated. Nevertheless, please note that we have massively refactored the codebase in #100 and the template.jinja file is now completely removed. Instead, we are now rendering the job script during the runtime in framework.py.

AryanAhadinia · 2026-05-20T19:04:26Z


    model: str
-    framework: Literal["sglang", "vllm"]
+    framework: Literal["sglang", "vllm", "vllm-omni"]


Adding vLLM OMNI beside vLLM as a new framework should be well justified. In the long vision we have, we would like to have to golden base images for vLLM and SGL (ref: #118). As a result, I would suggest to drop vllm-omni as a new framework for now and just use (--environment/--slurm-environment) to specify which toml file you want to use.

Thank you for clarifying, I will remove vllm-omni as new framework and stick to using the original vllm

AryanAhadinia · 2026-05-20T19:10:04Z

Isn't it possible to patch the current vLLM image?

Just to clarify what you mean by “patch the current vLLM image”?

I'm not sure if you meant one of the following:

Use existing Docker vLLM CUDA13 base image if it exists and make vllm_qwen3_tts_cuda13 (derived image) that only adds vllm-omni and audio dependencies.

Modify the current vllm_cuda13 Dockerfile (image) itself to include vllm-omni and audio dependencies.

The second one. In general, we are working toward keeping the number of images and environment as minimal as possible. So, adding a new image and environment only for a small class of models is not that much aligned with our long-term goals.

yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch from a3b824c to d1fca8c Compare May 20, 2026 04:58

Added Qwen3-TTS VoiceDesign vLLM-Omni launcher

b7f21f4

yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch from d1fca8c to b7f21f4 Compare May 20, 2026 05:06

AryanAhadinia requested changes May 20, 2026

View reviewed changes

AryanAhadinia assigned yfchoco208 May 20, 2026

AryanAhadinia added the model-support Adding support for a new model label May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135

Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135
yfchoco208 wants to merge 1 commit into
swiss-ai:mainfrom
yfchoco208:add-qwen3-tts-voicedesign

yfchoco208 commented May 20, 2026

Uh oh!

AryanAhadinia left a comment •

edited

Loading

Uh oh!

AryanAhadinia May 20, 2026 •

edited

Loading

Uh oh!

AryanAhadinia May 20, 2026

Uh oh!

yfchoco208 May 21, 2026

Uh oh!

AryanAhadinia May 20, 2026

Uh oh!

yfchoco208 May 21, 2026

Uh oh!

AryanAhadinia May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yfchoco208 commented May 20, 2026

Uh oh!

AryanAhadinia left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AryanAhadinia May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AryanAhadinia May 20, 2026

Choose a reason for hiding this comment

Uh oh!

yfchoco208 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

AryanAhadinia May 20, 2026

Choose a reason for hiding this comment

Uh oh!

yfchoco208 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

AryanAhadinia May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AryanAhadinia left a comment •

edited

Loading

AryanAhadinia May 20, 2026 •

edited

Loading