Stabilize WAN MPS generation on Apple Silicon by SquishedSquirrel · Pull Request #1868 · deepbeepmeep/Wan2GP

SquishedSquirrel · 2026-06-05T23:44:38Z

This PR improves WAN generation stability on Apple Silicon/MPS, especially for WAN 2.2 5B and WAN 2.1 1.3B.

Changes:

Default MPS SDPA to a synchronized manual fallback, with native SDPA opt-in via WAN2GP_MPS_NATIVE_SDPA=1.
Add WAN-specific MPS synchronization/cache cleanup boundaries around transformer calls, sampler steps, generation boundaries, and cache clearing.
Run quantized WAN WebUI generation inline on MPS to avoid worker-thread Metal command-buffer crashes.
Disable live WAN latent preview frames by default on MPS while preserving progress updates.
Keep diagnostic env switches for native SDPA, live previews, inline worker behavior, and joint CFG.

Validation:

WAN 2.2 5B passed multiple WebUI generations.
WAN 2.1 1.3B passed multiple WebUI generations.
Selected 14B smoke test passed.
Encode/post-processing sanity check passed.

Let me be clear that these patches were done with Codex and GPT 5.5. I can't speak to the quality or durability of the patches. They center around WAN, as that was my primary goal. Memory usage is quite high, so a 36GB Mac or better is really needed if you plan to do anything with 14B, and even then the frame size has to be pretty small.

Manual fallback SDPA was chosen for lower memory impact and slightly faster speed.

WebUI worker-thread note

During testing, the quantized WAN transformer path behaved differently in headless queue processing versus the Gradio/WebUI path.

The true quantized WAN 2.2 5B checkpoint could complete repeated headless generations, but the same model/configuration would intermittently crash from the WebUI with Metal command-buffer assertions such as:

commit an already committed command buffer
commit command buffer with uncommitted encoder

The instability appeared when generation was dispatched through the extra WebUI async_run_in("generation", ...) worker thread. Running the queued WAN generation inline in the Gradio request path for quantized WAN transformer tasks avoided those crashes in local testing.

This is intentionally scoped narrowly:

only on MPS
only WAN video tasks
only when the selected transformer checkpoint appears quantized (quanto, int8, or fp8)
overrideable with WAN2GP_MPS_WEBUI_INLINE_WORKER

This keeps the WebUI behavior unchanged for CUDA and for non-quantized/non-WAN tasks.

Stabilize WAN MPS generation on Apple Silicon

6ec7805

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize WAN MPS generation on Apple Silicon#1868

Stabilize WAN MPS generation on Apple Silicon#1868
SquishedSquirrel wants to merge 1 commit into
deepbeepmeep:mainfrom
SquishedSquirrel:mps-wan-stability-current

SquishedSquirrel commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SquishedSquirrel commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WebUI worker-thread note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SquishedSquirrel commented Jun 5, 2026 •

edited

Loading