Skip to content

Commit c062acc

Browse files
dougbtvclaude
andcommitted
[Doc] Add Z-Image Base support docs (and turbo tweaks)
Add documentation and examples for Z-Image Base model alongside existing Z-Image Turbo support. Changes: - Add z_image_examples.py with Base vs Turbo comparison examples - Update quickstart.md with comments distinguishing Base and Turbo - Update supported_models.md to list both Z-Image variants - Update text_to_image README with usage examples and comparison table - Fix negative_prompt to be passed in prompt dict (not sampling params) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: dougbtv <dosmith@redhat.com>
1 parent 9f00a37 commit c062acc

File tree

5 files changed

+219
-8
lines changed

5 files changed

+219
-8
lines changed

docs/getting_started/quickstart.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ Text-to-image generation quickstart with vLLM-Omni:
3939
from vllm_omni.entrypoints.omni import Omni
4040

4141
if __name__ == "__main__":
42+
# Use Z-Image-Turbo for fast inference (8 steps)
43+
# or Z-Image (Base) for higher quality (28-50 steps with CFG support)
4244
omni = Omni(model="Tongyi-MAI/Z-Image-Turbo")
4345
prompt = "a cup of coffee on the table"
4446
outputs = omni.generate(prompt)
@@ -59,8 +61,9 @@ You can pass a list of prompts and wait for them to process altogether, shown be
5961
from vllm_omni.entrypoints.omni import Omni
6062

6163
if __name__ == "__main__":
64+
# For batch inference with Z-Image models
6265
omni = Omni(
63-
model="Tongyi-MAI/Z-Image-Turbo",
66+
model="Tongyi-MAI/Z-Image-Turbo", # or "Tongyi-MAI/Z-Image" for Base
6467
# stage_configs_path="./stage-config.yaml", # See below
6568
)
6669
prompts = [
@@ -93,7 +96,11 @@ For more usages, please refer to [offline inference](../user_guide/examples/offl
9396
Text-to-image generation quickstart with vLLM-Omni:
9497

9598
```bash
99+
# Fast inference with Turbo (8 steps, no CFG)
96100
vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8091
101+
102+
# Or use Base model for higher quality (50 steps, CFG support)
103+
# vllm serve Tongyi-MAI/Z-Image --omni --port 8091
97104
```
98105

99106
```bash

docs/models/supported_models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ th {
2626
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2509 | `Qwen/Qwen-Image-Edit-2509` |
2727
| `QwenImageLayeredPipeline` | Qwen-Image-Layered | `Qwen/Qwen-Image-Layered` |
2828
| `GlmImagePipeline` | GLM-Image | `zai-org/GLM-Image` |
29-
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image-Turbo` |
29+
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image` (Base), `Tongyi-MAI/Z-Image-Turbo` |
3030
| `WanPipeline` | Wan2.2-T2V, Wan2.2-TI2V | `Wan-AI/Wan2.2-T2V-A14B-Diffusers`, `Wan-AI/Wan2.2-TI2V-5B-Diffusers` |
3131
| `WanImageToVideoPipeline` | Wan2.2-I2V | `Wan-AI/Wan2.2-I2V-A14B-Diffusers` |
3232
| `OvisImagePipeline` | Ovis-Image | `OvisAI/Ovis-Image` |
@@ -60,7 +60,7 @@ th {
6060
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2509 | `Qwen/Qwen-Image-Edit-2509` |
6161
| `QwenImageLayeredPipeline` | Qwen-Image-Layered | `Qwen/Qwen-Image-Layered` |
6262
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2511 | `Qwen/Qwen-Image-Edit-2511` |
63-
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image-Turbo` |
63+
|`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image` (Base), `Tongyi-MAI/Z-Image-Turbo` |
6464
|`LongcatImagePipeline` | LongCat-Image | `meituan-longcat/LongCat-Image` |
6565
|`Flux2KleinPipeline` | FLUX.2-klein | `black-forest-labs/FLUX.2-klein-4B`, `black-forest-labs/FLUX.2-klein-9B` |
6666
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-CustomVoice | `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` |

examples/offline_inference/text_to_image/README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Text-To-Image
22

3-
This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image` `Qwen/Qwen-Image-2512` `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni:
3+
This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image`, `Qwen/Qwen-Image-2512`, `Tongyi-MAI/Z-Image` (Base), and `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni:
44

55
- `text_to_image.py`: command-line script for single image generation with advanced options.
6+
- `z_image_examples.py`: comparison examples showing Z-Image Base vs Turbo usage.
67
- `web_demo.py`: lightweight Gradio UI for interactive prompt/seed/CFG exploration.
78

89
Note that when you pass in multiple independent prompts, they will be processed sequentially. Batching requests is currently not supported.
@@ -74,17 +75,33 @@ if __name__ == "__main__":
7475

7576
## Local CLI Usage
7677

78+
### Z-Image Turbo (Fast Inference)
7779
```bash
7880
python text_to_image.py \
7981
--model Tongyi-MAI/Z-Image-Turbo \
8082
--prompt "a cup of coffee on the table" \
8183
--seed 42 \
82-
--cfg_scale 4.0 \
8384
--num_images_per_prompt 1 \
84-
--num_inference_steps 50 \
85+
--num_inference_steps 8 \
86+
--guidance_scale 0.0 \
8587
--height 1024 \
8688
--width 1024 \
87-
--output outputs/coffee.png
89+
--output outputs/coffee_turbo.png
90+
```
91+
92+
### Z-Image Base (High Quality with CFG)
93+
```bash
94+
python text_to_image.py \
95+
--model Tongyi-MAI/Z-Image \
96+
--prompt "a cup of coffee on the table" \
97+
--negative_prompt "blurry, low quality, distorted" \
98+
--seed 42 \
99+
--num_images_per_prompt 1 \
100+
--num_inference_steps 50 \
101+
--guidance_scale 4.0 \
102+
--height 1280 \
103+
--width 720 \
104+
--output outputs/coffee_base.png
88105
```
89106

90107
Key arguments:
@@ -103,6 +120,26 @@ Key arguments:
103120

104121
> ℹ️ If you encounter OOM errors, try using `--vae_use_slicing` and `--vae_use_tiling` to reduce memory usage.
105122
123+
## Z-Image Base vs Turbo Comparison
124+
125+
For detailed comparison and usage examples of both Z-Image variants, see:
126+
127+
```bash
128+
python z_image_examples.py --example all
129+
```
130+
131+
Key differences:
132+
133+
| Feature | Z-Image Base | Z-Image Turbo |
134+
|---------|--------------|---------------|
135+
| Model | `Tongyi-MAI/Z-Image` | `Tongyi-MAI/Z-Image-Turbo` |
136+
| Inference Steps | 28-50 (default: 50) | 8 |
137+
| CFG Support | ✅ Yes (guidance_scale 3.0-5.0) | ❌ Must use 0.0 |
138+
| Negative Prompts | ✅ Supported | ❌ Not supported |
139+
| Fine-tunable | ✅ Yes | ❌ No (distilled) |
140+
| Scheduler Shift | 6.0 | 3.0 |
141+
| Best For | High quality, fine-tuning | Fast iteration, speed |
142+
106143
> ℹ️ Qwen-Image currently publishes best-effort presets at `1328x1328`, `1664x928`, `928x1664`, `1472x1140`, `1140x1472`, `1584x1056`, and `1056x1584`. Adjust `--height/--width` accordingly for the most reliable outcomes.
107144
108145
## Web UI Demo

examples/offline_inference/text_to_image/text_to_image.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def parse_args() -> argparse.Namespace:
2121
"--model",
2222
default="Qwen/Qwen-Image",
2323
help="Diffusion model name or local path. Supported models: "
24-
"Qwen/Qwen-Image, Tongyi-MAI/Z-Image-Turbo, Qwen/Qwen-Image-2512",
24+
"Qwen/Qwen-Image, Tongyi-MAI/Z-Image (Base), Tongyi-MAI/Z-Image-Turbo, Qwen/Qwen-Image-2512",
2525
)
2626
parser.add_argument("--prompt", default="a cup of coffee on the table", help="Text prompt for image generation.")
2727
parser.add_argument(
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
#!/usr/bin/env python3
2+
# SPDX-License-Identifier: Apache-2.0
3+
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
4+
5+
"""
6+
Z-Image Base vs Turbo Comparison Examples
7+
8+
This script demonstrates the differences between Z-Image Base and Z-Image Turbo models.
9+
10+
Key Differences:
11+
- Z-Image Base: Foundation model with full CFG support, fine-tunable, 28-50 steps
12+
- Z-Image Turbo: Distilled model optimized for speed, 8 steps, guidance_scale must be 0.0
13+
"""
14+
15+
from vllm_omni.entrypoints.omni import Omni
16+
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
17+
18+
19+
def z_image_base_example():
20+
"""
21+
Z-Image Base - High Quality Generation
22+
23+
Features:
24+
- Full CFG support (guidance_scale 3.0-5.0)
25+
- Negative prompts work
26+
- Fine-tunable
27+
- 28-50 inference steps (default 50)
28+
- Scheduler shift: 6.0
29+
"""
30+
print("\n=== Z-Image Base (High Quality) ===")
31+
32+
omni_base = Omni(model="Tongyi-MAI/Z-Image")
33+
34+
outputs_base = omni_base.generate(
35+
{
36+
"prompt": "a majestic mountain landscape at sunset, detailed, photorealistic",
37+
"negative_prompt": "blurry, low quality, distorted, oversaturated",
38+
},
39+
OmniDiffusionSamplingParams(
40+
height=1280,
41+
width=720,
42+
num_inference_steps=50,
43+
guidance_scale=4.0,
44+
seed=42,
45+
),
46+
)
47+
48+
images = outputs_base[0].request_output[0].images
49+
images[0].save("z_image_base_output.png")
50+
print("Saved to: z_image_base_output.png")
51+
print(f"Generated {len(images)} image(s) with 50 steps and CFG=4.0")
52+
53+
54+
def z_image_turbo_example():
55+
"""
56+
Z-Image Turbo - Fast Inference
57+
58+
Features:
59+
- Optimized for speed
60+
- guidance_scale MUST be 0.0 (no CFG)
61+
- Negative prompts not supported
62+
- 8 inference steps
63+
- Scheduler shift: 3.0
64+
"""
65+
print("\n=== Z-Image Turbo (Fast) ===")
66+
67+
omni_turbo = Omni(model="Tongyi-MAI/Z-Image-Turbo")
68+
69+
outputs_turbo = omni_turbo.generate(
70+
"a majestic mountain landscape at sunset, detailed, photorealistic",
71+
OmniDiffusionSamplingParams(
72+
height=1024,
73+
width=1024,
74+
num_inference_steps=8,
75+
guidance_scale=0.0, # MUST be 0.0 for Turbo!
76+
seed=42,
77+
),
78+
)
79+
80+
images = outputs_turbo[0].request_output[0].images
81+
images[0].save("z_image_turbo_output.png")
82+
print("Saved to: z_image_turbo_output.png")
83+
print(f"Generated {len(images)} image(s) with 8 steps (no CFG)")
84+
85+
86+
def batch_inference_example():
87+
"""
88+
Batch inference with Z-Image Base
89+
90+
Note: Batch processing depends on max_batch_size in stage configs.
91+
By default, diffusion models process one prompt at a time.
92+
"""
93+
print("\n=== Batch Inference Example ===")
94+
95+
omni = Omni(model="Tongyi-MAI/Z-Image")
96+
97+
prompts = [
98+
{"prompt": "a cup of coffee on a wooden table", "negative_prompt": "blurry, low quality"},
99+
{"prompt": "a cat sleeping on a cozy blanket", "negative_prompt": "blurry, low quality"},
100+
{"prompt": "a futuristic city skyline at night", "negative_prompt": "blurry, low quality"},
101+
]
102+
103+
# Note: These will be processed sequentially unless max_batch_size > 1
104+
outputs = omni.generate(
105+
prompts,
106+
OmniDiffusionSamplingParams(
107+
height=1024,
108+
width=1024,
109+
num_inference_steps=40,
110+
guidance_scale=4.0,
111+
seed=42,
112+
),
113+
)
114+
115+
for i, output in enumerate(outputs):
116+
image = output.request_output[0].images[0]
117+
image.save(f"batch_output_{i}.png")
118+
print(f"Saved to: batch_output_{i}.png")
119+
120+
121+
def recommended_settings():
122+
"""
123+
Print recommended settings for both models
124+
"""
125+
print("\n=== Recommended Settings ===\n")
126+
127+
print("Z-Image Base (Tongyi-MAI/Z-Image):")
128+
print(" - num_inference_steps: 28-50 (default: 50)")
129+
print(" - guidance_scale: 3.0-5.0 (default: 4.0)")
130+
print(" - negative_prompt: Supported and recommended")
131+
print(" - resolution: 1280x720 or 720x1280")
132+
print(" - cfg_normalization: False (default)")
133+
print(" - Use when: Quality is priority, fine-tuning needed")
134+
135+
print("\nZ-Image Turbo (Tongyi-MAI/Z-Image-Turbo):")
136+
print(" - num_inference_steps: 8")
137+
print(" - guidance_scale: 0.0 (REQUIRED)")
138+
print(" - negative_prompt: Not supported")
139+
print(" - resolution: 1024x1024")
140+
print(" - Use when: Speed is priority, quick iterations")
141+
142+
143+
if __name__ == "__main__":
144+
import argparse
145+
146+
parser = argparse.ArgumentParser(description="Z-Image Base vs Turbo comparison examples")
147+
parser.add_argument(
148+
"--example",
149+
choices=["base", "turbo", "batch", "all"],
150+
default="all",
151+
help="Which example to run (default: all)",
152+
)
153+
154+
args = parser.parse_args()
155+
156+
recommended_settings()
157+
158+
if args.example in ("base", "all"):
159+
z_image_base_example()
160+
161+
if args.example in ("turbo", "all"):
162+
z_image_turbo_example()
163+
164+
if args.example in ("batch", "all"):
165+
batch_inference_example()
166+
167+
print("\nDone!")

0 commit comments

Comments
 (0)