Skip to content

[Feat][Qwen3-tts]: Add Gradio demo for online serving#1231

Open
lishunyang12 wants to merge 3 commits intovllm-project:mainfrom
lishunyang12:tts_gradio
Open

[Feat][Qwen3-tts]: Add Gradio demo for online serving#1231
lishunyang12 wants to merge 3 commits intovllm-project:mainfrom
lishunyang12:tts_gradio

Conversation

@lishunyang12
Copy link
Contributor

Closes part of #938 (Gradio Demo)

Summary

  • Add interactive Gradio web UI for Qwen3-TTS at examples/online_serving/qwen3_tts/
  • Support all 3 task types: CustomVoice, VoiceDesign, Base (voice cloning)
  • Dynamic UI that shows/hides fields based on selected task type
  • Fetches available speakers from /v1/audio/voices endpoint
  • Add run_gradio_demo.sh to launch server + demo together

Files Changed

  • examples/online_serving/qwen3_tts/gradio_demo.py (new)
  • examples/online_serving/qwen3_tts/run_gradio_demo.sh (new)
  • examples/online_serving/qwen3_tts/README.md (updated)

Test plan

  • Start server with ./run_server.sh CustomVoice, run python gradio_demo.py, generate speech with Vivian/Ryan speakers
  • Start server with VoiceDesign model, verify instructions field is required
  • Start server with Base model, upload reference audio and verify voice cloning
  • Test run_gradio_demo.sh launches both server and Gradio
  • Verify error messages when server is down or inputs are invalid

Notes

Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b729f7602b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +136 to +137
--omni 2>&1 | tee "$LOG_FILE" &
SERVER_PID=$!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Capture vLLM PID directly instead of pipeline tail process

The launcher backgrounds vllm-omni ... | tee ... and then stores SERVER_PID=$!, but in bash $! for a background pipeline is the last command (tee), not the server process. This means the readiness loop and cleanup (kill "$SERVER_PID") are monitoring/killing tee while the real vLLM server can keep running after Ctrl+C or error paths, leaving orphaned servers bound to the port.

Useful? React with 👍 / 👎.


# Decode audio response
try:
audio_np, sample_rate = sf.read(io.BytesIO(resp.content))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle raw PCM responses before decoding audio bytes

The UI allows response_format="pcm", but the response is always decoded via sf.read(io.BytesIO(resp.content)). For PCM, the server emits RAW bytes (no container/header), so this decode path cannot infer format/sample rate and fails with "Failed to decode audio response" whenever users pick PCM. This makes one of the advertised output formats unusable in the demo.

Useful? React with 👍 / 👎.

Signed-off-by: lishunyang <lishunyang12@163.com>
@congw729
Copy link
Contributor

congw729 commented Feb 6, 2026

Hi, if you add a markdown document under ./examples/*, please also run mkdocs serve to sync those editions to ./docs/ before merging this PR.

@linyueqian
Copy link
Contributor

The task type selector (CustomVoice / VoiceDesign / Base) in the Gradio UI feels a bit off. Since each checkpoint is already a specific task type, having users pick it again client-side can lead to mismatches, e.g. selecting VoiceDesign when the server loaded CustomVoice. Could we auto-detect which model the server is running and just show the right fields? That way we avoid confusion and the UI stays in sync with whatever checkpoint is actually loaded.

Also, this PR depends on #1203 (the multimodal_output fix) which hasn't been merged yet, so the server currently returns "TTS model did not produce audio output" instead of actual audio. I wasn't able to test this end to end. Was this tested before submitting?

@hsliuustc0106
Copy link
Collaborator

shall we move gradio to app folder after comfyui PR merged?

@linyueqian
Copy link
Contributor

shall we move gradio to app folder after comfyui PR merged?

makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants