Skip to content

[Tracking][Diffusion][AMD] Diffusion models performance optimization on AMD Platform #18138

@wuhuikx

Description

@wuhuikx

We aim to optimize the performance of diffusion models on AMD platforms, and this issue will serve as a centralized hub to track all relevant PRs, thereby facilitating users' comprehensive understanding of the optimization progress.

Qwen-image Task List

1. Workflow

We add the workflow and the corresponding image processing size and seq length here for reference.
Image

2. Accuracy and bug fix

3. Framework performance boost

4. Kernel optimization

  • fuse image_text_qk_norm_rope

5. Tool

Wan Task List

1. Workflow

(1) VAE image size computation
Image

(2) DiT structure analysis

Image

1. Framework performance boost

  • Add positive & negative prompt batch together for one loop inference
  • Add torch.compile for VAE and apply ChannelLast for CausalConv, ~2% uplift
  • Refine the frame padding logic for Ulysess
  • VAE parallelism with FSDP or other recipe

2. Kernel optimization

  • Integrate SageAttention in AITER
  • fuse layernorm+scale+shift, ~1% uplift

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions