Skip to content

Stabilize WAN MPS generation on Apple Silicon#1868

Open
SquishedSquirrel wants to merge 1 commit into
deepbeepmeep:mainfrom
SquishedSquirrel:mps-wan-stability-current
Open

Stabilize WAN MPS generation on Apple Silicon#1868
SquishedSquirrel wants to merge 1 commit into
deepbeepmeep:mainfrom
SquishedSquirrel:mps-wan-stability-current

Conversation

@SquishedSquirrel
Copy link
Copy Markdown

@SquishedSquirrel SquishedSquirrel commented Jun 5, 2026

This PR improves WAN generation stability on Apple Silicon/MPS, especially for WAN 2.2 5B and WAN 2.1 1.3B.

Changes:

  • Default MPS SDPA to a synchronized manual fallback, with native SDPA opt-in via WAN2GP_MPS_NATIVE_SDPA=1.
  • Add WAN-specific MPS synchronization/cache cleanup boundaries around transformer calls, sampler steps, generation boundaries, and cache clearing.
  • Run quantized WAN WebUI generation inline on MPS to avoid worker-thread Metal command-buffer crashes.
  • Disable live WAN latent preview frames by default on MPS while preserving progress updates.
  • Keep diagnostic env switches for native SDPA, live previews, inline worker behavior, and joint CFG.

Validation:

  • WAN 2.2 5B passed multiple WebUI generations.
  • WAN 2.1 1.3B passed multiple WebUI generations.
  • Selected 14B smoke test passed.
  • Encode/post-processing sanity check passed.

Let me be clear that these patches were done with Codex and GPT 5.5. I can't speak to the quality or durability of the patches. They center around WAN, as that was my primary goal. Memory usage is quite high, so a 36GB Mac or better is really needed if you plan to do anything with 14B, and even then the frame size has to be pretty small.

Manual fallback SDPA was chosen for lower memory impact and slightly faster speed.

WebUI worker-thread note

During testing, the quantized WAN transformer path behaved differently in headless queue processing versus the Gradio/WebUI path.

The true quantized WAN 2.2 5B checkpoint could complete repeated headless generations, but the same model/configuration would intermittently crash from the WebUI with Metal command-buffer assertions such as:

  • commit an already committed command buffer
  • commit command buffer with uncommitted encoder

The instability appeared when generation was dispatched through the extra WebUI async_run_in("generation", ...) worker thread. Running the queued WAN generation inline in the Gradio request path for quantized WAN transformer tasks avoided those crashes in local testing.

This is intentionally scoped narrowly:

  • only on MPS
  • only WAN video tasks
  • only when the selected transformer checkpoint appears quantized (quanto, int8, or fp8)
  • overrideable with WAN2GP_MPS_WEBUI_INLINE_WORKER

This keeps the WebUI behavior unchanged for CUDA and for non-quantized/non-WAN tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant