Skip to content

Commit 6cbeea9

Browse files
dougbtvclaude
andcommitted
[Doc] Add Z-Image Base support and fix Z-Image Turbo parameters
Added documentation for Z-Image Base model (Tongyi-MAI/Z-Image) alongside the existing Z-Image Turbo variant. The existing ZImagePipeline already supports both models - this change updates docs to reflect that. Key changes: - Updated supported models list to include both Base and Turbo variants - Added comparison table showing differences between Base and Turbo - Fixed incorrect Z-Image Turbo example parameters: * num_inference_steps: 50 → 8 (Turbo is optimized for 8 steps) * cfg_scale (Qwen-specific) → guidance_scale 0.0 (Z-Image Turbo doesn't support CFG) - Added z_image_examples.py demonstrating both models with correct parameters - Updated quickstart examples to show both model options Z-Image Base vs Turbo: - Base: 28-50 steps, CFG support (guidance_scale 3.0-5.0), negative prompts, fine-tunable - Turbo: 8 steps, no CFG (guidance_scale must be 0.0), distilled for speed Tested Z-Image Base generation: 1280x720, 50 steps, CFG=4.0, negative prompts working. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: dougbtv <dosmith@redhat.com>
1 parent 9f00a37 commit 6cbeea9

File tree

5 files changed

+220
-8
lines changed

5 files changed

+220
-8
lines changed

docs/getting_started/quickstart.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ Text-to-image generation quickstart with vLLM-Omni:
3939
from vllm_omni.entrypoints.omni import Omni
4040

4141
if __name__ == "__main__":
42+
# Use Z-Image-Turbo for fast inference (8 steps)
43+
# or Z-Image (Base) for higher quality (28-50 steps with CFG support)
4244
omni = Omni(model="Tongyi-MAI/Z-Image-Turbo")
4345
prompt = "a cup of coffee on the table"
4446
outputs = omni.generate(prompt)
@@ -59,8 +61,9 @@ You can pass a list of prompts and wait for them to process altogether, shown be
5961
from vllm_omni.entrypoints.omni import Omni
6062

6163
if __name__ == "__main__":
64+
# For batch inference with Z-Image models
6265
omni = Omni(
63-
model="Tongyi-MAI/Z-Image-Turbo",
66+
model="Tongyi-MAI/Z-Image-Turbo", # or "Tongyi-MAI/Z-Image" for Base
6467
# stage_configs_path="./stage-config.yaml", # See below
6568
)
6669
prompts = [
@@ -93,7 +96,11 @@ For more usages, please refer to [offline inference](../user_guide/examples/offl
9396
Text-to-image generation quickstart with vLLM-Omni:
9497

9598
```bash
99+
# Fast inference with Turbo (8 steps, no CFG)
96100
vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
101+
102+
# Or use Base model for higher quality (50 steps, CFG support)
103+
# vllm serve Tongyi-MAI/Z-Image --omni --port 8091
97104
```
98105

99106
```bash

docs/models/supported_models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ th {
2626
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2509 | `Qwen/Qwen-Image-Edit-2509` |
2727
| `QwenImageLayeredPipeline` | Qwen-Image-Layered | `Qwen/Qwen-Image-Layered` |
2828
| `GlmImagePipeline` | GLM-Image | `zai-org/GLM-Image` |
29-
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image-Turbo` |
29+
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image` (Base), `Tongyi-MAI/Z-Image-Turbo` |
3030
| `WanPipeline` | Wan2.2-T2V, Wan2.2-TI2V | `Wan-AI/Wan2.2-T2V-A14B-Diffusers`, `Wan-AI/Wan2.2-TI2V-5B-Diffusers` |
3131
| `WanImageToVideoPipeline` | Wan2.2-I2V | `Wan-AI/Wan2.2-I2V-A14B-Diffusers` |
3232
| `OvisImagePipeline` | Ovis-Image | `OvisAI/Ovis-Image` |
@@ -60,7 +60,7 @@ th {
6060
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2509 | `Qwen/Qwen-Image-Edit-2509` |
6161
| `QwenImageLayeredPipeline` | Qwen-Image-Layered | `Qwen/Qwen-Image-Layered` |
6262
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2511 | `Qwen/Qwen-Image-Edit-2511` |
63-
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image-Turbo` |
63+
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image` (Base), `Tongyi-MAI/Z-Image-Turbo` |
6464
|`LongcatImagePipeline` | LongCat-Image | `meituan-longcat/LongCat-Image` |
6565
|`Flux2KleinPipeline` | FLUX.2-klein | `black-forest-labs/FLUX.2-klein-4B`, `black-forest-labs/FLUX.2-klein-9B` |
6666
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-CustomVoice | `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` |

examples/offline_inference/text_to_image/README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Text-To-Image
22

3-
This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image` `Qwen/Qwen-Image-2512` `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni:
3+
This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image`, `Qwen/Qwen-Image-2512`, `Tongyi-MAI/Z-Image` (Base), and `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni:
44

55
- `text_to_image.py`: command-line script for single image generation with advanced options.
6+
- `z_image_examples.py`: comparison examples showing Z-Image Base vs Turbo usage.
67
- `web_demo.py`: lightweight Gradio UI for interactive prompt/seed/CFG exploration.
78

89
Note that when you pass in multiple independent prompts, they will be processed sequentially. Batching requests is currently not supported.
@@ -74,17 +75,33 @@ if __name__ == "__main__":
7475

7576
## Local CLI Usage
7677

78+
### Z-Image Turbo (Fast Inference)
7779
```bash
7880
python text_to_image.py \
7981
--model Tongyi-MAI/Z-Image-Turbo \
8082
--prompt "a cup of coffee on the table" \
8183
--seed 42 \
82-
--cfg_scale 4.0 \
8384
--num_images_per_prompt 1 \
84-
--num_inference_steps 50 \
85+
--num_inference_steps 8 \
86+
--guidance_scale 0.0 \
8587
--height 1024 \
8688
--width 1024 \
87-
--output outputs/coffee.png
89+
--output outputs/coffee_turbo.png
90+
```
91+
92+
### Z-Image Base (High Quality with CFG)
93+
```bash
94+
python text_to_image.py \
95+
--model Tongyi-MAI/Z-Image \
96+
--prompt "a cup of coffee on the table" \
97+
--negative_prompt "blurry, low quality, distorted" \
98+
--seed 42 \
99+
--num_images_per_prompt 1 \
100+
--num_inference_steps 50 \
101+
--guidance_scale 4.0 \
102+
--height 1280 \
103+
--width 720 \
104+
--output outputs/coffee_base.png
88105
```
89106

90107
Key arguments:
@@ -103,6 +120,26 @@ Key arguments:
103120

104121
> ℹ️ If you encounter OOM errors, try using `--vae_use_slicing` and `--vae_use_tiling` to reduce memory usage.
105122
123+
## Z-Image Base vs Turbo Comparison
124+
125+
For detailed comparison and usage examples of both Z-Image variants, see:
126+
127+
```bash
128+
python z_image_examples.py --example all
129+
```
130+
131+
Key differences:
132+
133+
| Feature | Z-Image Base | Z-Image Turbo |
134+
|---------|--------------|---------------|
135+
| Model | `Tongyi-MAI/Z-Image` | `Tongyi-MAI/Z-Image-Turbo` |
136+
| Inference Steps | 28-50 (default: 50) | 8 |
137+
| CFG Support | ✅ Yes (guidance_scale 3.0-5.0) | ❌ Must use 0.0 |
138+
| Negative Prompts | ✅ Supported | ❌ Not supported |
139+
| Fine-tunable | ✅ Yes | ❌ No (distilled) |
140+
| Scheduler Shift | 6.0 | 3.0 |
141+
| Best For | High quality, fine-tuning | Fast iteration, speed |
142+
106143
> ℹ️ Qwen-Image currently publishes best-effort presets at `1328x1328`, `1664x928`, `928x1664`, `1472x1140`, `1140x1472`, `1584x1056`, and `1056x1584`. Adjust `--height/--width` accordingly for the most reliable outcomes.
107144
108145
## Web UI Demo

examples/offline_inference/text_to_image/text_to_image.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def parse_args() -> argparse.Namespace:
2121
"--model",
2222
default="Qwen/Qwen-Image",
2323
help="Diffusion model name or local path. Supported models: "
24-
"Qwen/Qwen-Image, Tongyi-MAI/Z-Image-Turbo, Qwen/Qwen-Image-2512",
24+
"Qwen/Qwen-Image, Tongyi-MAI/Z-Image (Base), Tongyi-MAI/Z-Image-Turbo, Qwen/Qwen-Image-2512",
2525
)
2626
parser.add_argument("--prompt", default="a cup of coffee on the table", help="Text prompt for image generation.")
2727
parser.add_argument(
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
#!/usr/bin/env python3
2+
# SPDX-License-Identifier: Apache-2.0
3+
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
4+
5+
"""
6+
Z-Image Base vs Turbo Comparison Examples
7+
8+
This script demonstrates the differences between Z-Image Base and Z-Image Turbo models.
9+
10+
Key Differences:
11+
- Z-Image Base: Foundation model with full CFG support, fine-tunable, 28-50 steps
12+
- Z-Image Turbo: Distilled model optimized for speed, 8 steps, guidance_scale must be 0.0
13+
"""
14+
15+
from vllm_omni.entrypoints.omni import Omni
16+
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
17+
18+
19+
def z_image_base_example():
20+
"""
21+
Z-Image Base - High Quality Generation
22+
23+
Features:
24+
- Full CFG support (guidance_scale 3.0-5.0)
25+
- Negative prompts work
26+
- Fine-tunable
27+
- 28-50 inference steps (default 50)
28+
- Scheduler shift: 6.0
29+
"""
30+
print("\n=== Z-Image Base (High Quality) ===")
31+
32+
omni_base = Omni(model="Tongyi-MAI/Z-Image")
33+
34+
outputs_base = omni_base.generate(
35+
"a majestic mountain landscape at sunset, detailed, photorealistic",
36+
OmniDiffusionSamplingParams(
37+
height=1280,
38+
width=720,
39+
num_inference_steps=50,
40+
guidance_scale=4.0,
41+
negative_prompt="blurry, low quality, distorted, oversaturated",
42+
seed=42,
43+
),
44+
)
45+
46+
images = outputs_base[0].request_output[0].images
47+
images[0].save("z_image_base_output.png")
48+
print("Saved to: z_image_base_output.png")
49+
print(f"Generated {len(images)} image(s) with 50 steps and CFG=4.0")
50+
51+
52+
def z_image_turbo_example():
53+
"""
54+
Z-Image Turbo - Fast Inference
55+
56+
Features:
57+
- Optimized for speed
58+
- guidance_scale MUST be 0.0 (no CFG)
59+
- Negative prompts not supported
60+
- 8 inference steps
61+
- Scheduler shift: 3.0
62+
"""
63+
print("\n=== Z-Image Turbo (Fast) ===")
64+
65+
omni_turbo = Omni(model="Tongyi-MAI/Z-Image-Turbo")
66+
67+
outputs_turbo = omni_turbo.generate(
68+
"a majestic mountain landscape at sunset, detailed, photorealistic",
69+
OmniDiffusionSamplingParams(
70+
height=1024,
71+
width=1024,
72+
num_inference_steps=8,
73+
guidance_scale=0.0, # MUST be 0.0 for Turbo!
74+
seed=42,
75+
),
76+
)
77+
78+
images = outputs_turbo[0].request_output[0].images
79+
images[0].save("z_image_turbo_output.png")
80+
print("Saved to: z_image_turbo_output.png")
81+
print(f"Generated {len(images)} image(s) with 8 steps (no CFG)")
82+
83+
84+
def batch_inference_example():
85+
"""
86+
Batch inference with Z-Image Base
87+
88+
Note: Batch processing depends on max_batch_size in stage configs.
89+
By default, diffusion models process one prompt at a time.
90+
"""
91+
print("\n=== Batch Inference Example ===")
92+
93+
omni = Omni(model="Tongyi-MAI/Z-Image")
94+
95+
prompts = [
96+
"a cup of coffee on a wooden table",
97+
"a cat sleeping on a cozy blanket",
98+
"a futuristic city skyline at night",
99+
]
100+
101+
# Note: These will be processed sequentially unless max_batch_size > 1
102+
outputs = omni.generate(
103+
prompts,
104+
OmniDiffusionSamplingParams(
105+
height=1024,
106+
width=1024,
107+
num_inference_steps=40,
108+
guidance_scale=4.0,
109+
negative_prompt="blurry, low quality",
110+
seed=42,
111+
),
112+
)
113+
114+
for i, output in enumerate(outputs):
115+
image = output.request_output[0].images[0]
116+
image.save(f"batch_output_{i}.png")
117+
print(f"Saved to: batch_output_{i}.png")
118+
119+
120+
def recommended_settings():
121+
"""
122+
Print recommended settings for both models
123+
"""
124+
print("\n=== Recommended Settings ===\n")
125+
126+
print("Z-Image Base (Tongyi-MAI/Z-Image):")
127+
print(" - num_inference_steps: 28-50 (default: 50)")
128+
print(" - guidance_scale: 3.0-5.0 (default: 4.0)")
129+
print(" - negative_prompt: Supported and recommended")
130+
print(" - resolution: 1280x720 or 720x1280")
131+
print(" - cfg_normalization: False (default)")
132+
print(" - Use when: Quality is priority, fine-tuning needed")
133+
134+
print("\nZ-Image Turbo (Tongyi-MAI/Z-Image-Turbo):")
135+
print(" - num_inference_steps: 8")
136+
print(" - guidance_scale: 0.0 (REQUIRED)")
137+
print(" - negative_prompt: Not supported")
138+
print(" - resolution: 1024x1024")
139+
print(" - Use when: Speed is priority, quick iterations")
140+
141+
142+
if __name__ == "__main__":
143+
import argparse
144+
145+
parser = argparse.ArgumentParser(
146+
description="Z-Image Base vs Turbo comparison examples"
147+
)
148+
parser.add_argument(
149+
"--example",
150+
choices=["base", "turbo", "batch", "all"],
151+
default="all",
152+
help="Which example to run (default: all)",
153+
)
154+
155+
args = parser.parse_args()
156+
157+
recommended_settings()
158+
159+
if args.example in ("base", "all"):
160+
z_image_base_example()
161+
162+
if args.example in ("turbo", "all"):
163+
z_image_turbo_example()
164+
165+
if args.example in ("batch", "all"):
166+
batch_inference_example()
167+
168+
print("\nDone!")

0 commit comments

Comments
 (0)