Skip to content

Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135

Open
yfchoco208 wants to merge 1 commit into
swiss-ai:mainfrom
yfchoco208:add-qwen3-tts-voicedesign
Open

Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135
yfchoco208 wants to merge 1 commit into
swiss-ai:mainfrom
yfchoco208:add-qwen3-tts-voicedesign

Conversation

@yfchoco208
Copy link
Copy Markdown
Collaborator

Adds examples/clariden/cli/qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign-vllm-omni.sh, single-node launcher for Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign, serving text-to-speech via vLLM-Omni on Clariden GH200.

Adds images/vllm_qwen3_tts_cuda13/Dockerfile and src/swiss_ai_model_launch/assets/envs/vllm_qwen3_tts_cuda13.toml, a CUDA13 vLLM-Omni TTS environment with vllm==0.20.2, vllm-omni==0.20.0, transformers==5.8.0, and audio dependencies such as FFmpeg, libsndfile, and soundfile.

Adds Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign to src/swiss_ai_model_launch/assets/models.json, an interactive SML catalog entry using vLLM-Omni with --max-model-len 8192 and --gpu-memory-utilization 0.40. VoiceDesign was tested with task_type=VoiceDesign and text instructions rather than preset CustomVoice speakers.

Also adds vllm-omni as a supported framework where required, matching the existing vLLM-Omni serving pattern.

Validated from a clean checkout:

  • sml advanced launch works
  • interactive sml catalog launch works

@yfchoco208 yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch from a3b824c to d1fca8c Compare May 20, 2026 04:58
@yfchoco208 yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch from d1fca8c to b7f21f4 Compare May 20, 2026 05:06
Copy link
Copy Markdown
Member

@AryanAhadinia AryanAhadinia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution! We would love to merge your PR after addressing the listed comments! Keep doing the great job!

Please also note that your PR has now conflicts that should be solved prior to merge.

Comment on lines +86 to +89
vllm-omni)
FRAMEWORK_ENV_SETUP="export RAY_CGRAPH_get_timeout=1800; export no_proxy=\"0.0.0.0,\$no_proxy\"; export NO_PROXY=\"0.0.0.0,\$NO_PROXY\""
FRAMEWORK_LAUNCH="vllm serve"
;;
Copy link
Copy Markdown
Member

@AryanAhadinia AryanAhadinia May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line indeed seems redundant to me as it is identical to the vLLM case. We may change python3 -m vllm.entrypoints.openai.api_server with vllm serve as they are identical since the former one is deprecated. Nevertheless, please note that we have massively refactored the codebase in #100 and the template.jinja file is now completely removed. Instead, we are now rendering the job script during the runtime in framework.py.


model: str
framework: Literal["sglang", "vllm"]
framework: Literal["sglang", "vllm", "vllm-omni"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding vLLM OMNI beside vLLM as a new framework should be well justified. In the long vision we have, we would like to have to golden base images for vLLM and SGL (ref: #118). As a result, I would suggest to drop vllm-omni as a new framework for now and just use (--environment/--slurm-environment) to specify which toml file you want to use.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying, I will remove vllm-omni as new framework and stick to using the original vllm

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to patch the current vLLM image?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify what you mean by “patch the current vLLM image”?

I'm not sure if you meant one of the following:

  1. Use existing Docker vLLM CUDA13 base image if it exists and make vllm_qwen3_tts_cuda13 (derived image) that only adds vllm-omni and audio dependencies.
  2. Modify the current vllm_cuda13 Dockerfile (image) itself to include vllm-omni and audio dependencies.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one. In general, we are working toward keeping the number of images and environment as minimal as possible. So, adding a new image and environment only for a small class of models is not that much aligned with our long-term goals.

@AryanAhadinia AryanAhadinia added the model-support Adding support for a new model label May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-support Adding support for a new model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants