From 0751a5ed83ec5590b2132e5b68f9a9e17f6ceb8a Mon Sep 17 00:00:00 2001 From: linyueqian Date: Thu, 5 Feb 2026 11:13:21 -0500 Subject: [PATCH 1/4] [Doc] Update Qwen3-TTS docs for consistency with Omni examples Signed-off-by: linyueqian --- .../examples/online_serving/qwen3_tts.md | 198 +++++++++++++----- examples/online_serving/qwen3_tts/README.md | 98 ++++++--- .../qwen3_tts/openai_speech_client.py | 10 +- .../online_serving/qwen3_tts/run_server.sh | 2 +- 4 files changed, 220 insertions(+), 88 deletions(-) diff --git a/docs/user_guide/examples/online_serving/qwen3_tts.md b/docs/user_guide/examples/online_serving/qwen3_tts.md index 48bd58b00c..f16039bc81 100644 --- a/docs/user_guide/examples/online_serving/qwen3_tts.md +++ b/docs/user_guide/examples/online_serving/qwen3_tts.md @@ -1,9 +1,11 @@ -# Qwen3-TTS Online Serving +# Qwen3-TTS Source . -This directory contains examples for running Qwen3-TTS models with vLLM-Omni's online serving API. +## 🛠️ Installation + +Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/README.md) ## Supported Models @@ -12,34 +14,65 @@ This directory contains examples for running Qwen3-TTS models with vLLM-Omni's o | `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` | CustomVoice | Predefined speaker voices with optional style control | | `Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign` | VoiceDesign | Natural language voice style description | | `Qwen/Qwen3-TTS-12Hz-1.7B-Base` | Base | Voice cloning from reference audio | +| `Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice` | CustomVoice | Smaller/faster variant | +| `Qwen/Qwen3-TTS-12Hz-0.6B-Base` | Base | Smaller/faster variant for voice cloning | -## Quick Start +## Run examples (Qwen3-TTS) -### 1. Start the Server +### Launch the Server ```bash -# CustomVoice model (default) -./run_server.sh +# CustomVoice model (predefined speakers) +vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \ + --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ + --omni --port 8091 --trust-remote-code --enforce-eager + +# VoiceDesign model +vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \ + --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ + --omni --port 8091 --trust-remote-code --enforce-eager + +# Base model (voice cloning) +vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-Base \ + --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ + --omni --port 8091 --trust-remote-code --enforce-eager +``` -# Or specify task type -./run_server.sh CustomVoice -./run_server.sh VoiceDesign -./run_server.sh Base +If you have custom stage configs file, launch the server with command below +```bash +vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \ + --stage-configs-path /path/to/stage_configs_file \ + --omni --port 8091 --trust-remote-code --enforce-eager ``` -### 2. Run the Client +Alternatively, use the convenience script: +```bash +./run_server.sh # Default: CustomVoice model +./run_server.sh CustomVoice # CustomVoice model +./run_server.sh VoiceDesign # VoiceDesign model +./run_server.sh Base # Base (voice clone) model +``` + +### Send TTS Request + +Get into the example folder +```bash +cd examples/online_serving/qwen3_tts +``` + +#### Send request via python ```bash # CustomVoice: Use predefined speaker python openai_speech_client.py \ --text "你好,我是通义千问" \ - --voice Vivian \ + --voice vivian \ --language Chinese # CustomVoice with style instruction python openai_speech_client.py \ --text "今天天气真好" \ - --voice Ryan \ + --voice ryan \ --instructions "用开心的语气说" # VoiceDesign: Describe the voice style @@ -58,29 +91,86 @@ python openai_speech_client.py \ --ref-text "Original transcript of the reference audio" ``` -### 3. Using curl +The Python client supports the following command-line arguments: + +- `--api-base`: API base URL (default: `http://localhost:8091`) +- `--model` (or `-m`): Model name/path (default: `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice`) +- `--task-type` (or `-t`): TTS task type. Options: `CustomVoice`, `VoiceDesign`, `Base` +- `--text`: Text to synthesize (required) +- `--voice`: Speaker/voice name (default: `vivian`). Options: `vivian`, `ryan`, `aiden`, etc. +- `--language`: Language. Options: `Auto`, `Chinese`, `English`, `Japanese`, `Korean`, `German`, `French`, `Russian`, `Portuguese`, `Spanish`, `Italian` +- `--instructions`: Voice style/emotion instructions +- `--ref-audio`: Reference audio file path or URL for voice cloning (Base task) +- `--ref-text`: Reference audio transcript for voice cloning (Base task) +- `--response-format`: Audio output format (default: `wav`). Options: `wav`, `mp3`, `flac`, `pcm`, `aac`, `opus` +- `--output` (or `-o`): Output audio file path (default: `tts_output.wav`) + +#### Send request via curl ```bash # Simple TTS request -curl -X POST http://localhost:8000/v1/audio/speech \ +curl -X POST http://localhost:8091/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "input": "Hello, how are you?", - "voice": "Vivian", + "voice": "vivian", "language": "English" }' --output output.wav # With style instruction -curl -X POST http://localhost:8000/v1/audio/speech \ +curl -X POST http://localhost:8091/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "input": "I am so excited!", - "voice": "Vivian", + "voice": "vivian", "instructions": "Speak with great enthusiasm" }' --output excited.wav # List available voices in CustomVoice models -curl http://localhost:8000/v1/audio/voices +curl http://localhost:8091/v1/audio/voices +``` + +### Using OpenAI SDK + +```python +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:8091/v1", api_key="none") + +response = client.audio.speech.create( + model="Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice", + voice="vivian", + input="Hello, how are you?", +) + +response.stream_to_file("output.wav") +``` + +### Using Python httpx + +```python +import httpx + +response = httpx.post( + "http://localhost:8091/v1/audio/speech", + json={ + "input": "Hello, how are you?", + "voice": "vivian", + "language": "English", + }, + timeout=300.0, +) + +with open("output.wav", "wb") as f: + f.write(response.content) +``` + +### FAQ + +If you encounter error about backend of librosa, try to install ffmpeg with command below. +``` +sudo apt update +sudo apt install ffmpeg ``` ## API Reference @@ -89,16 +179,31 @@ curl http://localhost:8000/v1/audio/voices ``` POST /v1/audio/speech +Content-Type: application/json ``` This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech) format with additional Qwen3-TTS parameters. +### Voices Endpoint + +``` +GET /v1/audio/voices +``` + +Lists available voices for the loaded model: + +```json +{ + "voices": ["aiden", "dylan", "eric", "one_anna", "ryan", "serena", "sohee", "uncle_fu", "vivian"] +} +``` + ### Request Body ```json { "input": "Text to synthesize", - "voice": "Vivian", + "voice": "vivian", "response_format": "wav", "task_type": "CustomVoice", "language": "Auto", @@ -114,56 +219,38 @@ This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/ ### Response -Returns audio data in the requested format (default: WAV). +Returns binary audio data with appropriate `Content-Type` header (e.g., `audio/wav`). ## Parameters -### Standard OpenAI Parameters +### OpenAI Standard Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| -| `input` | string | required | Text to synthesize | -| `voice` | string | "Vivian" | Speaker/voice name | +| `input` | string | **required** | Text to synthesize | +| `model` | string | server's model | Model to use (optional, should match server if specified) | +| `voice` | string | "vivian" | Speaker name (e.g., vivian, ryan, aiden) | | `response_format` | string | "wav" | Audio format: wav, mp3, flac, pcm, aac, opus | | `speed` | float | 1.0 | Playback speed (0.25-4.0) | -| `model` | string | optional | Model name (optional when serving single model) | -### Qwen3-TTS Parameters +### vLLM-Omni Extension Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `task_type` | string | "CustomVoice" | Task: CustomVoice, VoiceDesign, or Base | -| `language` | string | "Auto" | Language: Auto, Chinese, English, Japanese, Korean | +| `language` | string | "Auto" | Language (see supported languages below) | | `instructions` | string | "" | Voice style/emotion instructions | | `max_new_tokens` | int | 2048 | Maximum tokens to generate | +**Supported languages:** Auto, Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian + ### Voice Clone Parameters (Base task) -| Parameter | Type | Required | Description | +| Parameter | Type | Default | Description | |-----------|------|----------|-------------| -| `ref_audio` | string | Yes* | Reference audio (file path, URL, or base64) | -| `ref_text` | string | No | Transcript of reference audio (for ICL mode) | -| `x_vector_only_mode` | bool | false | Use speaker embedding only (no ICL) | - -## Python Usage - -```python -import httpx - -# Simple request -response = httpx.post( - "http://localhost:8000/v1/audio/speech", - json={ - "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice", - "input": "Hello world", - "voice": "Vivian", - }, - timeout=300.0, -) - -with open("output.wav", "wb") as f: - f.write(response.content) -``` +| `ref_audio` | string | null | Reference audio (URL or base64 data URL) | +| `ref_text` | string | null | Transcript of reference audio | +| `x_vector_only_mode` | bool | null | Use speaker embedding only (no ICL) | ## Limitations @@ -172,10 +259,11 @@ with open("output.wav", "wb") as f: ## Troubleshooting -1. **Connection refused**: Make sure the server is running on the correct port -2. **Out of memory**: Reduce `--gpu-memory-utilization` in run_server.sh -3. **Unsupported speaker**: Check supported speakers via model documentation -4. **Voice clone fails**: Ensure you're using the Base model variant for voice cloning +1. **"TTS model did not produce audio output"**: Ensure you're using the correct model variant for your task type (CustomVoice task → CustomVoice model, etc.) +2. **Connection refused**: Make sure the server is running on the correct port +3. **Out of memory**: Use smaller model variant (`Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice`) or reduce `--gpu-memory-utilization` +4. **Unsupported speaker**: Use `/v1/audio/voices` to list available voices for the loaded model +5. **Voice clone fails**: Ensure you're using the Base model variant for voice cloning ## Example materials diff --git a/examples/online_serving/qwen3_tts/README.md b/examples/online_serving/qwen3_tts/README.md index 1c9bd48203..e1ad3edc20 100644 --- a/examples/online_serving/qwen3_tts/README.md +++ b/examples/online_serving/qwen3_tts/README.md @@ -9,6 +9,8 @@ This directory contains examples for running Qwen3-TTS models with vLLM-Omni's o | `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` | CustomVoice | Predefined speaker voices with optional style control | | `Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign` | VoiceDesign | Natural language voice style description | | `Qwen/Qwen3-TTS-12Hz-1.7B-Base` | Base | Voice cloning from reference audio | +| `Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice` | CustomVoice | Smaller/faster variant | +| `Qwen/Qwen3-TTS-12Hz-0.6B-Base` | Base | Smaller/faster variant for voice cloning | ## Quick Start @@ -24,19 +26,27 @@ This directory contains examples for running Qwen3-TTS models with vLLM-Omni's o ./run_server.sh Base ``` +Or launch directly with vllm serve: + +```bash +vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \ + --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ + --omni --port 8091 --trust-remote-code --enforce-eager +``` + ### 2. Run the Client ```bash # CustomVoice: Use predefined speaker python openai_speech_client.py \ --text "你好,我是通义千问" \ - --voice Vivian \ + --voice vivian \ --language Chinese # CustomVoice with style instruction python openai_speech_client.py \ --text "今天天气真好" \ - --voice Ryan \ + --voice ryan \ --instructions "用开心的语气说" # VoiceDesign: Describe the voice style @@ -59,25 +69,25 @@ python openai_speech_client.py \ ```bash # Simple TTS request -curl -X POST http://localhost:8000/v1/audio/speech \ +curl -X POST http://localhost:8091/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "input": "Hello, how are you?", - "voice": "Vivian", + "voice": "vivian", "language": "English" }' --output output.wav # With style instruction -curl -X POST http://localhost:8000/v1/audio/speech \ +curl -X POST http://localhost:8091/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "input": "I am so excited!", - "voice": "Vivian", + "voice": "vivian", "instructions": "Speak with great enthusiasm" }' --output excited.wav # List available voices in CustomVoice models -curl http://localhost:8000/v1/audio/voices +curl http://localhost:8091/v1/audio/voices ``` ## API Reference @@ -90,12 +100,26 @@ POST /v1/audio/speech This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech) format with additional Qwen3-TTS parameters. +### Voices Endpoint + +``` +GET /v1/audio/voices +``` + +Lists available voices for the loaded model: + +```json +{ + "voices": ["aiden", "dylan", "eric", "one_anna", "ryan", "serena", "sohee", "uncle_fu", "vivian"] +} +``` + ### Request Body ```json { "input": "Text to synthesize", - "voice": "Vivian", + "voice": "vivian", "response_format": "wav", "task_type": "CustomVoice", "language": "Auto", @@ -115,45 +139,64 @@ Returns audio data in the requested format (default: WAV). ## Parameters -### Standard OpenAI Parameters +### OpenAI Standard Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| -| `input` | string | required | Text to synthesize | -| `voice` | string | "Vivian" | Speaker/voice name | +| `input` | string | **required** | Text to synthesize | +| `model` | string | server's model | Model to use (optional, should match server if specified) | +| `voice` | string | "vivian" | Speaker name (e.g., vivian, ryan, aiden) | | `response_format` | string | "wav" | Audio format: wav, mp3, flac, pcm, aac, opus | | `speed` | float | 1.0 | Playback speed (0.25-4.0) | -| `model` | string | optional | Model name (optional when serving single model) | -### Qwen3-TTS Parameters +### vLLM-Omni Extension Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `task_type` | string | "CustomVoice" | Task: CustomVoice, VoiceDesign, or Base | -| `language` | string | "Auto" | Language: Auto, Chinese, English, Japanese, Korean | +| `language` | string | "Auto" | Language (see supported languages below) | | `instructions` | string | "" | Voice style/emotion instructions | | `max_new_tokens` | int | 2048 | Maximum tokens to generate | +**Supported languages:** Auto, Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian + ### Voice Clone Parameters (Base task) -| Parameter | Type | Required | Description | +| Parameter | Type | Default | Description | |-----------|------|----------|-------------| -| `ref_audio` | string | Yes* | Reference audio (file path, URL, or base64) | -| `ref_text` | string | No | Transcript of reference audio (for ICL mode) | -| `x_vector_only_mode` | bool | false | Use speaker embedding only (no ICL) | +| `ref_audio` | string | null | Reference audio (URL or base64 data URL) | +| `ref_text` | string | null | Transcript of reference audio | +| `x_vector_only_mode` | bool | null | Use speaker embedding only (no ICL) | ## Python Usage +### Using OpenAI SDK + +```python +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:8091/v1", api_key="none") + +response = client.audio.speech.create( + model="Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice", + voice="vivian", + input="Hello, how are you?", +) + +response.stream_to_file("output.wav") +``` + +### Using httpx + ```python import httpx -# Simple request response = httpx.post( - "http://localhost:8000/v1/audio/speech", + "http://localhost:8091/v1/audio/speech", json={ - "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice", - "input": "Hello world", - "voice": "Vivian", + "input": "Hello, how are you?", + "voice": "vivian", + "language": "English", }, timeout=300.0, ) @@ -169,7 +212,8 @@ with open("output.wav", "wb") as f: ## Troubleshooting -1. **Connection refused**: Make sure the server is running on the correct port -2. **Out of memory**: Reduce `--gpu-memory-utilization` in run_server.sh -3. **Unsupported speaker**: Check supported speakers via model documentation -4. **Voice clone fails**: Ensure you're using the Base model variant for voice cloning +1. **"TTS model did not produce audio output"**: Ensure you're using the correct model variant for your task type (CustomVoice task → CustomVoice model, etc.) +2. **Connection refused**: Make sure the server is running on the correct port +3. **Out of memory**: Use smaller model variant (`Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice`) or reduce `--gpu-memory-utilization` +4. **Unsupported speaker**: Use `/v1/audio/voices` to list available voices for the loaded model +5. **Voice clone fails**: Ensure you're using the Base model variant for voice cloning diff --git a/examples/online_serving/qwen3_tts/openai_speech_client.py b/examples/online_serving/qwen3_tts/openai_speech_client.py index 4d70460940..bbc13081da 100644 --- a/examples/online_serving/qwen3_tts/openai_speech_client.py +++ b/examples/online_serving/qwen3_tts/openai_speech_client.py @@ -5,10 +5,10 @@ Examples: # CustomVoice task (predefined speaker) - python openai_speech_client.py --text "Hello, how are you?" --voice Vivian + python openai_speech_client.py --text "Hello, how are you?" --voice vivian # CustomVoice with emotion instruction - python openai_speech_client.py --text "I'm so happy!" --voice Vivian \ + python openai_speech_client.py --text "I'm so happy!" --voice vivian \ --instructions "Speak with excitement" # VoiceDesign task (voice from description) @@ -30,7 +30,7 @@ import httpx # Default server configuration -DEFAULT_API_BASE = "http://localhost:8000" +DEFAULT_API_BASE = "http://localhost:8091" DEFAULT_API_KEY = "EMPTY" @@ -173,8 +173,8 @@ def parse_args(): parser.add_argument( "--voice", type=str, - default="Vivian", - help="Speaker/voice name (default: Vivian). Options: Vivian, Ryan, etc.", + default="vivian", + help="Speaker/voice name (default: vivian). Options: vivian, ryan, aiden, etc.", ) parser.add_argument( "--language", diff --git a/examples/online_serving/qwen3_tts/run_server.sh b/examples/online_serving/qwen3_tts/run_server.sh index 203ed76e8d..fd020f18b2 100755 --- a/examples/online_serving/qwen3_tts/run_server.sh +++ b/examples/online_serving/qwen3_tts/run_server.sh @@ -33,7 +33,7 @@ echo "Starting Qwen3-TTS server with model: $MODEL" vllm-omni serve "$MODEL" \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ --host 0.0.0.0 \ - --port 8000 \ + --port 8091 \ --gpu-memory-utilization 0.9 \ --trust-remote-code \ --enforce-eager \ From b758f2bfe0d10734c4197544e7092363645d3bf7 Mon Sep 17 00:00:00 2001 From: linyueqian Date: Thu, 5 Feb 2026 12:33:10 -0500 Subject: [PATCH 2/4] Fix Voice Clone Parameters: mark ref_audio as required Signed-off-by: linyueqian --- docs/user_guide/examples/online_serving/qwen3_tts.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user_guide/examples/online_serving/qwen3_tts.md b/docs/user_guide/examples/online_serving/qwen3_tts.md index f16039bc81..3580d16f5d 100644 --- a/docs/user_guide/examples/online_serving/qwen3_tts.md +++ b/docs/user_guide/examples/online_serving/qwen3_tts.md @@ -246,11 +246,11 @@ Returns binary audio data with appropriate `Content-Type` header (e.g., `audio/w ### Voice Clone Parameters (Base task) -| Parameter | Type | Default | Description | +| Parameter | Type | Required | Description | |-----------|------|----------|-------------| -| `ref_audio` | string | null | Reference audio (URL or base64 data URL) | -| `ref_text` | string | null | Transcript of reference audio | -| `x_vector_only_mode` | bool | null | Use speaker embedding only (no ICL) | +| `ref_audio` | string | **Yes** | Reference audio (URL or base64 data URL) | +| `ref_text` | string | No | Transcript of reference audio (for ICL mode) | +| `x_vector_only_mode` | bool | No | Use speaker embedding only (no ICL) | ## Limitations From a81ce73f9c9231a2e3212952202f14917a5ee10e Mon Sep 17 00:00:00 2001 From: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Date: Thu, 5 Feb 2026 18:26:59 -0500 Subject: [PATCH 3/4] Update docs/user_guide/examples/online_serving/qwen3_tts.md Co-authored-by: Hongsheng Liu Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> --- docs/user_guide/examples/online_serving/qwen3_tts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user_guide/examples/online_serving/qwen3_tts.md b/docs/user_guide/examples/online_serving/qwen3_tts.md index 3580d16f5d..f645a8aca8 100644 --- a/docs/user_guide/examples/online_serving/qwen3_tts.md +++ b/docs/user_guide/examples/online_serving/qwen3_tts.md @@ -259,7 +259,7 @@ Returns binary audio data with appropriate `Content-Type` header (e.g., `audio/w ## Troubleshooting -1. **"TTS model did not produce audio output"**: Ensure you're using the correct model variant for your task type (CustomVoice task → CustomVoice model, etc.) +1. **TTS model did not produce audio output**: Ensure you're using the correct model variant for your task type (CustomVoice task → CustomVoice model, etc.) 2. **Connection refused**: Make sure the server is running on the correct port 3. **Out of memory**: Use smaller model variant (`Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice`) or reduce `--gpu-memory-utilization` 4. **Unsupported speaker**: Use `/v1/audio/voices` to list available voices for the loaded model From a327e8217e1093ff51d7010a50b9c8e20994cfdd Mon Sep 17 00:00:00 2001 From: linyueqian Date: Thu, 5 Feb 2026 18:38:28 -0500 Subject: [PATCH 4/4] Format server commands with one argument per line Signed-off-by: linyueqian --- .../examples/online_serving/qwen3_tts.md | 20 +++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/docs/user_guide/examples/online_serving/qwen3_tts.md b/docs/user_guide/examples/online_serving/qwen3_tts.md index f645a8aca8..f899e362ee 100644 --- a/docs/user_guide/examples/online_serving/qwen3_tts.md +++ b/docs/user_guide/examples/online_serving/qwen3_tts.md @@ -25,24 +25,36 @@ Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/ # CustomVoice model (predefined speakers) vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ - --omni --port 8091 --trust-remote-code --enforce-eager + --omni \ + --port 8091 \ + --trust-remote-code \ + --enforce-eager # VoiceDesign model vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ - --omni --port 8091 --trust-remote-code --enforce-eager + --omni \ + --port 8091 \ + --trust-remote-code \ + --enforce-eager # Base model (voice cloning) vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-Base \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \ - --omni --port 8091 --trust-remote-code --enforce-eager + --omni \ + --port 8091 \ + --trust-remote-code \ + --enforce-eager ``` If you have custom stage configs file, launch the server with command below ```bash vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \ --stage-configs-path /path/to/stage_configs_file \ - --omni --port 8091 --trust-remote-code --enforce-eager + --omni \ + --port 8091 \ + --trust-remote-code \ + --enforce-eager ``` Alternatively, use the convenience script: