[Tracking][Diffusion][AMD] Diffusion models performance optimization on AMD Platform

We aim to optimize the performance of diffusion models on AMD platforms, and this issue will serve as a centralized hub to track all relevant PRs, thereby facilitating users' comprehensive understanding of the optimization progress.

# Qwen-image Task List
### 1. Workflow
We add the workflow and the corresponding image processing size and seq length here for reference.
<img width="1204" height="725" alt="Image" src="https://github.com/user-attachments/assets/58c5653d-fe34-4a24-bcae-55d67dfaf3ec" />

### 2. Accuracy and bug fix
- [x] Lora Accuracy issue https://github.com/sgl-project/sglang/pull/16935

### 3. Framework performance boost
- [x] Skip duplicate computation for negative prompt https://github.com/sgl-project/sglang/pull/16919
- [ ] Fix duplicate computation for Ulysess SP #18516 
- [ ] Text encoder: Qwen2.5 ViT and LLM optimization

### 4. Kernel optimization
- [ ] fuse image_text_qk_norm_rope

### 5. Tool
- [ ] improve the profiling tool #18367 

# Wan Task List
### 1. Workflow
(1) VAE image size computation
<img width="480" height="759" alt="Image" src="https://github.com/user-attachments/assets/d172436d-c1f9-4fd3-8a05-c595913cb5fe" />

(2) DiT structure analysis

<img width="1322" height="614" alt="Image" src="https://github.com/user-attachments/assets/3a15f942-f124-4536-819f-015b23839bad" />

### 1. Framework performance boost
- [ ] Add positive & negative prompt batch together for one loop inference
- [ ] Add torch.compile for VAE and apply ChannelLast for CausalConv, ~2% uplift
- [ ] Refine the frame padding logic for Ulysess
- [ ] VAE parallelism with FSDP or other recipe

### 2. Kernel optimization
- [ ] Integrate SageAttention in AITER 
- [ ] fuse layernorm+scale+shift, ~1% uplift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking][Diffusion][AMD] Diffusion models performance optimization on AMD Platform #18138

Qwen-image Task List

1. Workflow

2. Accuracy and bug fix

3. Framework performance boost

4. Kernel optimization

5. Tool

Wan Task List

1. Workflow

1. Framework performance boost

2. Kernel optimization

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracking][Diffusion][AMD] Diffusion models performance optimization on AMD Platform #18138

Description

Qwen-image Task List

1. Workflow

2. Accuracy and bug fix

3. Framework performance boost

4. Kernel optimization

5. Tool

Wan Task List

1. Workflow

1. Framework performance boost

2. Kernel optimization

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions