Fix GPU Hang: flush Metal graph with eval() after MTPTokenIterator verification pass

Aegis-AI · Aegis-AI · commit e3fb72adbc54 · 2026-05-12T15:23:38.000-07:00
On hybrid SSM/attention models (Qwen35), the recurrent GatedDeltaNet layers
accumulate un-evaluated MLX graph nodes across each speculateRound(). Without
an explicit eval() after callMTP(), the Metal command buffer grows across
multiple speculation rounds until it triggers the GPU Watchdog (kIOGPUCommandBufferCallbackErrorHang).

Adding eval(mtpResult) immediately after the verification forward pass flushes
the accumulated graph, preventing the Metal timeout.
diff --git a/Libraries/MLXLMCommon/Evaluate.swift b/Libraries/MLXLMCommon/Evaluate.swift
@@ -1227,7 +1227,13 @@ public struct MTPTokenIterator: TokenIteratorProtocol {
         
         let mtpResult = model.callMTP(verifyInput.tokens[.newAxis], cache: cache, mtpCaches: mtpCaches)
         guard !mtpResult.isEmpty else { return }
-        
+
+        // Flush the Metal command buffer immediately after the verification forward pass.
+        // On hybrid SSM/attention models (e.g. Qwen35), the recurrent SSM layers accumulate
+        // un-evaluated graph nodes across rounds. Without an explicit sync here the Metal
+        // command buffer grows until it triggers the GPU Watchdog.
+        eval(mtpResult)
+
         let mainLogits = mtpResult[0]
 
         let mainTokens: MLXArray