Skip to content

[Feature]: enhance diffusion dynamic batching scope #3846

@asukaqaq-s

Description

@asukaqaq-s

🚀 The feature, motivation and pitch

related to #874

Current dynamic batching for diffusion only supports requests with the same resolution. As a result, when benchmarking on diffusion dataset C, many requests with different resolutions still cannot be batched together, causing low MFU utilization on a single instance.

The goal is to relax the constraints inside SamplingParamsKey and allow more heterogeneous requests to share one forward pass.

  1. Remove CFG-related constraints:

    • When all CFG scales are != 1, different requests can still be batched into the same forward pass.

    • Additional considerations:

      • whether positive/negative prompts should be batched together
      • how to support CFG parallelism efficiently
  2. Remove height/width/frame constraints:

    • Support mixed-resolution batching with FlashAttention varlen functions.
    • Pass attention metadata through the existing attn_metadata interface.

Alternatives

#3638
MixFusion is relatively complex, so we should also compare it with the simpler varlen batching approach and evaluate which one provides better performance gains.

Additional context

Test plan:

  • Single-GPU correctness and performance validation
  • Support CFG parallelism
  • Support Ring Attention / Ulysses Attention with padding-free execution
  • Optimize scheduling between StagePool and the diffusion scheduler

@hsliuustc0106 @Semmer2 @wtomin @wuhang2014 @Gaohan123 @cherhh

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions