🚀 The feature, motivation and pitch
related to #874
Current dynamic batching for diffusion only supports requests with the same resolution. As a result, when benchmarking on diffusion dataset C, many requests with different resolutions still cannot be batched together, causing low MFU utilization on a single instance.
The goal is to relax the constraints inside SamplingParamsKey and allow more heterogeneous requests to share one forward pass.
-
Remove CFG-related constraints:
-
When all CFG scales are != 1, different requests can still be batched into the same forward pass.
-
Additional considerations:
- whether positive/negative prompts should be batched together
- how to support CFG parallelism efficiently
-
Remove height/width/frame constraints:
- Support mixed-resolution batching with FlashAttention varlen functions.
- Pass attention metadata through the existing
attn_metadata interface.
Alternatives
#3638
MixFusion is relatively complex, so we should also compare it with the simpler varlen batching approach and evaluate which one provides better performance gains.
Additional context
Test plan:
@hsliuustc0106 @Semmer2 @wtomin @wuhang2014 @Gaohan123 @cherhh
Before submitting a new issue...
🚀 The feature, motivation and pitch
related to #874
Current dynamic batching for diffusion only supports requests with the same resolution. As a result, when benchmarking on diffusion dataset C, many requests with different resolutions still cannot be batched together, causing low MFU utilization on a single instance.
The goal is to relax the constraints inside
SamplingParamsKeyand allow more heterogeneous requests to share one forward pass.Remove CFG-related constraints:
When all CFG scales are != 1, different requests can still be batched into the same forward pass.
Additional considerations:
Remove height/width/frame constraints:
attn_metadatainterface.Alternatives
#3638
MixFusion is relatively complex, so we should also compare it with the simpler varlen batching approach and evaluate which one provides better performance gains.
Additional context
Test plan:
@hsliuustc0106 @Semmer2 @wtomin @wuhang2014 @Gaohan123 @cherhh
Before submitting a new issue...