-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
We aim to optimize the performance of diffusion models on AMD platforms, and this issue will serve as a centralized hub to track all relevant PRs, thereby facilitating users' comprehensive understanding of the optimization progress.
Qwen-image Task List
1. Workflow
We add the workflow and the corresponding image processing size and seq length here for reference.

2. Accuracy and bug fix
- Lora Accuracy issue [FixBug] [Diffusion] Fix Qwen-Image-Edit Lightning LoRA alpha/rank scaling (read per-layer *.alpha) #16935
3. Framework performance boost
- Skip duplicate computation for negative prompt [diffusion] Skip negative prompt encoding when guidance_scale <= 1.0 or negative_prompt is None #16919
- Fix duplicate computation for Ulysess SP [Diffusion] Fix Ulysses SP for QwenImageEdit: shard text/cond/noisy tokens correctly #18516
- Text encoder: Qwen2.5 ViT and LLM optimization
4. Kernel optimization
- fuse image_text_qk_norm_rope
5. Tool
- improve the profiling tool [Diffusion] [Profiling] Add end-to-end profiling support for diffusion serving pipeline #18367
Wan Task List
1. Workflow
(1) VAE image size computation

(2) DiT structure analysis
1. Framework performance boost
- Add positive & negative prompt batch together for one loop inference
- Add torch.compile for VAE and apply ChannelLast for CausalConv, ~2% uplift
- Refine the frame padding logic for Ulysess
- VAE parallelism with FSDP or other recipe
2. Kernel optimization
- Integrate SageAttention in AITER
- fuse layernorm+scale+shift, ~1% uplift
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels