Skip to content

[Bugfix] Fix Wan2.2 cross-attention with Ulysses Sequence Parallelism (USP)#1233

Open
lishunyang12 wants to merge 1 commit intovllm-project:mainfrom
lishunyang12:sp_bug
Open

[Bugfix] Fix Wan2.2 cross-attention with Ulysses Sequence Parallelism (USP)#1233
lishunyang12 wants to merge 1 commit intovllm-project:mainfrom
lishunyang12:sp_bug

Conversation

@lishunyang12
Copy link
Contributor

Summary

  • Fix Wan2.2 TI2V (and I2V) failing when ulysses_degree >= 2 by correctly handling cross-attention under Ulysses Sequence Parallelism
  • Add skip_parallel parameter to Attention.forward() to allow callers to bypass built-in parallel communication when managing it externally

Fixes #1219

Root Cause

When Ulysses SP is enabled, every Attention object applies AllToAll communication to all of Q, K, V. This is correct for self-attention (where Q/K/V all come from SP-split hidden_states), but incorrect for cross-attention:

  • Q comes from hidden_states which IS split across SP ranks → AllToAll correctly reconstructs the full sequence
  • K/V come from encoder_hidden_states which is replicated (NOT split) across SP ranks → AllToAll incorrectly duplicates the encoder context P times ([B, T, H, D][B, T*P, H/P, D])

This produces incorrect attention results and causes failures.

Fix

In WanCrossAttention.forward(), when Ulysses SP is active:

  1. AllToAll on Q only: [B, S/P, H, D][B, S, H/P, D] (correct sequence reconstruction)
  2. Head-slice K/V for current Ulysses rank: [B, T, H, D][B, T, H/P, D] (match Q's head count without duplicating context)
  3. Skip built-in USP in the attention kernel via skip_parallel=True
  4. Reverse AllToAll on output: [B, S, H/P, D][B, S/P, H, D]

This follows the same pattern used by other models (Qwen Image, LongCat) where replicated encoder context is head-sliced rather than AllToAll'ed.

Test plan

  • Run Wan2.2 TI2V with ulysses_degree=2 (the failing case from [Bug]: Wan2.2 TI2V USP=2 failed #1219)
  • Verify Wan2.2 TI2V without USP still works (regression check)
  • Verify Wan2.2 T2V with USP still works (self-attention path unchanged)

Signed-off-by: lishunyang <lishunyang12@163.com>
@lishunyang12
Copy link
Contributor Author

lishunyang12 commented Feb 5, 2026

@gcanlin PTAL

@hsliuustc0106
Copy link
Collaborator

@wtomin PTAL

@gcanlin
Copy link
Contributor

gcanlin commented Feb 6, 2026

It doesn't work. The same error.

[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999] Failed on batch ['0_a54737a4-4c4e-4eb4-a413-308a9994f755']: The size of tensor a (2535) must match the size of tensor b (5070) at non-singleton dimension 1
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999] Traceback (most recent call last):
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]   File "/home/guocanlin/vllm-omni-workspace/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 907, in _stage_worker
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]     diffusion_results = stage_engine.generate(
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]                         ^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]   File "/home/guocanlin/vllm-omni-workspace/vllm-omni/vllm_omni/entrypoints/omni_diffusion.py", line 104, in generate
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]     return self._run_engine(request)
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]            ^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]   File "/home/guocanlin/vllm-omni-workspace/vllm-omni/vllm_omni/entrypoints/omni_diffusion.py", line 107, in _run_engine
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]     return self.engine.step(request)
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]            ^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]   File "/home/guocanlin/vllm-omni-workspace/vllm-omni/vllm_omni/diffusion/diffusion_engine.py", line 79, in step
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999]     raise Exception(f"{output.error}")
[Stage-0] ERROR 02-06 00:56:42 [omni_stage.py:999] Exception: The size of tensor a (2535) must match the size of tensor b (5070) at non-singleton dimension 1

@wtomin
Copy link
Contributor

wtomin commented Feb 6, 2026

This PR's solution is very similar to #1004, but targetting different issue @JustQJ.

I think this PR is targetting the same issue as #1221, @mxuax.

@mxuax
Copy link
Contributor

mxuax commented Feb 6, 2026

For SP failure in ti2v, I fixed it in #1221.

@mxuax mxuax mentioned this pull request Feb 6, 2026
1 task
@lishunyang12
Copy link
Contributor Author

For SP failure in ti2v, I fixed it in #1221.

Can help close this pr as the target issue has been solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Wan2.2 TI2V USP=2 failed

5 participants