-
Notifications
You must be signed in to change notification settings - Fork 395
[Doc] Add Z-Image Base support and fix Z-Image Turbo parameters #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,9 @@ | ||
| # Text-To-Image | ||
|
|
||
| This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image` `Qwen/Qwen-Image-2512` `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni: | ||
| This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image`, `Qwen/Qwen-Image-2512`, `Tongyi-MAI/Z-Image` (Base), and `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni: | ||
|
|
||
| - `text_to_image.py`: command-line script for single image generation with advanced options. | ||
| - `z_image_examples.py`: comparison examples showing Z-Image Base vs Turbo usage. | ||
| - `web_demo.py`: lightweight Gradio UI for interactive prompt/seed/CFG exploration. | ||
|
|
||
| Note that when you pass in multiple independent prompts, they will be processed sequentially. Batching requests is currently not supported. | ||
|
|
@@ -74,17 +75,33 @@ if __name__ == "__main__": | |
|
|
||
| ## Local CLI Usage | ||
|
|
||
| ### Z-Image Turbo (Fast Inference) | ||
| ```bash | ||
| python text_to_image.py \ | ||
| --model Tongyi-MAI/Z-Image-Turbo \ | ||
| --prompt "a cup of coffee on the table" \ | ||
| --seed 42 \ | ||
| --cfg_scale 4.0 \ | ||
| --num_images_per_prompt 1 \ | ||
| --num_inference_steps 50 \ | ||
| --num_inference_steps 8 \ | ||
| --guidance_scale 0.0 \ | ||
| --height 1024 \ | ||
| --width 1024 \ | ||
| --output outputs/coffee.png | ||
| --output outputs/coffee_turbo.png | ||
| ``` | ||
|
|
||
| ### Z-Image Base (High Quality with CFG) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we add examples for every model, then it will blow up |
||
| ```bash | ||
| python text_to_image.py \ | ||
| --model Tongyi-MAI/Z-Image \ | ||
| --prompt "a cup of coffee on the table" \ | ||
| --negative_prompt "blurry, low quality, distorted" \ | ||
| --seed 42 \ | ||
| --num_images_per_prompt 1 \ | ||
| --num_inference_steps 50 \ | ||
| --guidance_scale 4.0 \ | ||
| --height 1280 \ | ||
| --width 720 \ | ||
| --output outputs/coffee_base.png | ||
| ``` | ||
|
|
||
| Key arguments: | ||
|
|
@@ -103,6 +120,26 @@ Key arguments: | |
|
|
||
| > ℹ️ If you encounter OOM errors, try using `--vae_use_slicing` and `--vae_use_tiling` to reduce memory usage. | ||
|
|
||
| ## Z-Image Base vs Turbo Comparison | ||
|
|
||
| For detailed comparison and usage examples of both Z-Image variants, see: | ||
|
|
||
| ```bash | ||
| python z_image_examples.py --example all | ||
| ``` | ||
|
|
||
| Key differences: | ||
|
|
||
| | Feature | Z-Image Base | Z-Image Turbo | | ||
| |---------|--------------|---------------| | ||
| | Model | `Tongyi-MAI/Z-Image` | `Tongyi-MAI/Z-Image-Turbo` | | ||
| | Inference Steps | 28-50 (default: 50) | 8 | | ||
| | CFG Support | ✅ Yes (guidance_scale 3.0-5.0) | ❌ Must use 0.0 | | ||
| | Negative Prompts | ✅ Supported | ❌ Not supported | | ||
| | Fine-tunable | ✅ Yes | ❌ No (distilled) | | ||
| | Scheduler Shift | 6.0 | 3.0 | | ||
| | Best For | High quality, fine-tuning | Fast iteration, speed | | ||
|
|
||
| > ℹ️ Qwen-Image currently publishes best-effort presets at `1328x1328`, `1664x928`, `928x1664`, `1472x1140`, `1140x1472`, `1584x1056`, and `1056x1584`. Adjust `--height/--width` accordingly for the most reliable outcomes. | ||
|
|
||
| ## Web UI Demo | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| #!/usr/bin/env python3 | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
|
||
| """ | ||
| Z-Image Base vs Turbo Comparison Examples | ||
|
|
||
| This script demonstrates the differences between Z-Image Base and Z-Image Turbo models. | ||
|
|
||
| Key Differences: | ||
| - Z-Image Base: Foundation model with full CFG support, fine-tunable, 28-50 steps | ||
| - Z-Image Turbo: Distilled model optimized for speed, 8 steps, guidance_scale must be 0.0 | ||
| """ | ||
|
|
||
| from vllm_omni.entrypoints.omni import Omni | ||
| from vllm_omni.inputs.data import OmniDiffusionSamplingParams | ||
|
|
||
|
|
||
| def z_image_base_example(): | ||
| """ | ||
| Z-Image Base - High Quality Generation | ||
|
|
||
| Features: | ||
| - Full CFG support (guidance_scale 3.0-5.0) | ||
| - Negative prompts work | ||
| - Fine-tunable | ||
| - 28-50 inference steps (default 50) | ||
| - Scheduler shift: 6.0 | ||
| """ | ||
| print("\n=== Z-Image Base (High Quality) ===") | ||
|
|
||
| omni_base = Omni(model="Tongyi-MAI/Z-Image") | ||
|
|
||
| outputs_base = omni_base.generate( | ||
| { | ||
| "prompt": "a majestic mountain landscape at sunset, detailed, photorealistic", | ||
| "negative_prompt": "blurry, low quality, distorted, oversaturated", | ||
| }, | ||
| OmniDiffusionSamplingParams( | ||
| height=1280, | ||
| width=720, | ||
| num_inference_steps=50, | ||
| guidance_scale=4.0, | ||
| seed=42, | ||
|
Comment on lines
39
to
44
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This example passes Useful? React with 👍 / 👎.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great, refactored it to correctly match being packed in a prompt dict. |
||
| ), | ||
| ) | ||
|
|
||
| images = outputs_base[0].request_output[0].images | ||
| images[0].save("z_image_base_output.png") | ||
| print("Saved to: z_image_base_output.png") | ||
| print(f"Generated {len(images)} image(s) with 50 steps and CFG=4.0") | ||
|
|
||
|
|
||
| def z_image_turbo_example(): | ||
| """ | ||
| Z-Image Turbo - Fast Inference | ||
|
|
||
| Features: | ||
| - Optimized for speed | ||
| - guidance_scale MUST be 0.0 (no CFG) | ||
| - Negative prompts not supported | ||
| - 8 inference steps | ||
| - Scheduler shift: 3.0 | ||
| """ | ||
| print("\n=== Z-Image Turbo (Fast) ===") | ||
|
|
||
| omni_turbo = Omni(model="Tongyi-MAI/Z-Image-Turbo") | ||
|
|
||
| outputs_turbo = omni_turbo.generate( | ||
| "a majestic mountain landscape at sunset, detailed, photorealistic", | ||
| OmniDiffusionSamplingParams( | ||
| height=1024, | ||
| width=1024, | ||
| num_inference_steps=8, | ||
| guidance_scale=0.0, # MUST be 0.0 for Turbo! | ||
| seed=42, | ||
| ), | ||
| ) | ||
|
|
||
| images = outputs_turbo[0].request_output[0].images | ||
| images[0].save("z_image_turbo_output.png") | ||
| print("Saved to: z_image_turbo_output.png") | ||
| print(f"Generated {len(images)} image(s) with 8 steps (no CFG)") | ||
|
|
||
|
|
||
| def batch_inference_example(): | ||
| """ | ||
| Batch inference with Z-Image Base | ||
|
|
||
| Note: Batch processing depends on max_batch_size in stage configs. | ||
| By default, diffusion models process one prompt at a time. | ||
| """ | ||
| print("\n=== Batch Inference Example ===") | ||
|
|
||
| omni = Omni(model="Tongyi-MAI/Z-Image") | ||
|
|
||
| prompts = [ | ||
| {"prompt": "a cup of coffee on a wooden table", "negative_prompt": "blurry, low quality"}, | ||
| {"prompt": "a cat sleeping on a cozy blanket", "negative_prompt": "blurry, low quality"}, | ||
| {"prompt": "a futuristic city skyline at night", "negative_prompt": "blurry, low quality"}, | ||
| ] | ||
|
|
||
| # Note: These will be processed sequentially unless max_batch_size > 1 | ||
| outputs = omni.generate( | ||
| prompts, | ||
| OmniDiffusionSamplingParams( | ||
| height=1024, | ||
| width=1024, | ||
| num_inference_steps=40, | ||
| guidance_scale=4.0, | ||
| seed=42, | ||
| ), | ||
| ) | ||
|
|
||
| for i, output in enumerate(outputs): | ||
| image = output.request_output[0].images[0] | ||
| image.save(f"batch_output_{i}.png") | ||
| print(f"Saved to: batch_output_{i}.png") | ||
|
|
||
|
|
||
| def recommended_settings(): | ||
| """ | ||
| Print recommended settings for both models | ||
| """ | ||
| print("\n=== Recommended Settings ===\n") | ||
|
|
||
| print("Z-Image Base (Tongyi-MAI/Z-Image):") | ||
| print(" - num_inference_steps: 28-50 (default: 50)") | ||
| print(" - guidance_scale: 3.0-5.0 (default: 4.0)") | ||
| print(" - negative_prompt: Supported and recommended") | ||
| print(" - resolution: 1280x720 or 720x1280") | ||
| print(" - cfg_normalization: False (default)") | ||
| print(" - Use when: Quality is priority, fine-tuning needed") | ||
|
|
||
| print("\nZ-Image Turbo (Tongyi-MAI/Z-Image-Turbo):") | ||
| print(" - num_inference_steps: 8") | ||
| print(" - guidance_scale: 0.0 (REQUIRED)") | ||
| print(" - negative_prompt: Not supported") | ||
| print(" - resolution: 1024x1024") | ||
| print(" - Use when: Speed is priority, quick iterations") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| import argparse | ||
|
|
||
| parser = argparse.ArgumentParser(description="Z-Image Base vs Turbo comparison examples") | ||
| parser.add_argument( | ||
| "--example", | ||
| choices=["base", "turbo", "batch", "all"], | ||
| default="all", | ||
| help="Which example to run (default: all)", | ||
| ) | ||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| recommended_settings() | ||
|
|
||
| if args.example in ("base", "all"): | ||
| z_image_base_example() | ||
|
|
||
| if args.example in ("turbo", "all"): | ||
| z_image_turbo_example() | ||
|
|
||
| if args.example in ("batch", "all"): | ||
| batch_inference_example() | ||
|
|
||
| print("\nDone!") | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to modify this file.