Stabilize WAN MPS generation on Apple Silicon#1868
Open
SquishedSquirrel wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves WAN generation stability on Apple Silicon/MPS, especially for WAN 2.2 5B and WAN 2.1 1.3B.
Changes:
WAN2GP_MPS_NATIVE_SDPA=1.Validation:
Let me be clear that these patches were done with Codex and GPT 5.5. I can't speak to the quality or durability of the patches. They center around WAN, as that was my primary goal. Memory usage is quite high, so a 36GB Mac or better is really needed if you plan to do anything with 14B, and even then the frame size has to be pretty small.
Manual fallback SDPA was chosen for lower memory impact and slightly faster speed.
WebUI worker-thread note
During testing, the quantized WAN transformer path behaved differently in headless queue processing versus the Gradio/WebUI path.
The true quantized WAN 2.2 5B checkpoint could complete repeated headless generations, but the same model/configuration would intermittently crash from the WebUI with Metal command-buffer assertions such as:
commit an already committed command buffercommit command buffer with uncommitted encoderThe instability appeared when generation was dispatched through the extra WebUI
async_run_in("generation", ...)worker thread. Running the queued WAN generation inline in the Gradio request path for quantized WAN transformer tasks avoided those crashes in local testing.This is intentionally scoped narrowly:
quanto,int8, orfp8)WAN2GP_MPS_WEBUI_INLINE_WORKERThis keeps the WebUI behavior unchanged for CUDA and for non-quantized/non-WAN tasks.