Sync with latest fabric-8189 version on LLM/Embed/NMT #1874
Conversation
Stack of three logical changes squashed into one commit so the test ports stay self-consistent with the build/runtime they depend on: 1. qvac-fabric overlay ports (LLM + embed + nmtcpp): - Pin to fabric 78db8bf4 (PR tetherto/qvac-fabric-llm.cpp#121 HEAD, includes c79a8851 "ggml-vulkan: Fix NaN outputs on Mali"). - Drop -DGGML_VULKAN_DISABLE_COOPMAT*=ON for Android so coopmat shaders are compiled in. With coopmat off, runtime device->coopmat_support is false and the Mali fix's ARM-gated branches were skipped, leaving Qwen3-Q8_0 finetuning NaN on Pixel 9 Pro Mali. - Wire up overlay-ports in each package's vcpkg-configuration.json. - Add find_package(OpenSSL) before find_package(llama) in the LLM CMakeLists so llama-targets.cmake's transitive OpenSSL::SSL reference (via cpp-httplib) resolves on local builds. 2. utils.js downloadFile redirect race: - Track a handedOff flag set when the redirect branch hands off dest to a recursive call. All cleanup paths now skip fs.unlink once ownership is transferred, so a late error from the outer writestream can't delete the freshly-downloaded file (Pixel ENOENT after "successful" mmproj download). 3. Three new integration tests + their mobile harness wiring: - qwen3-5.test.js — basic / multi-turn / tool-calling - gemma4.test.js — text / multi-turn / image (forced to CPU on darwin + mobile because gemma4v projector SIGSEGVs on Metal and Adreno OpenCL) / tool-calling - ocr-paddle.test.js — OCR; mobile maxTokens capped to 768 - Ported to the new addon API (files: { model: [absPath], projectionModel?: absPath }, config: …). - Added matching unit test test_text_llm_context_qwen3.cpp. - integration.auto.cjs registers runQwen35Test, runGemma4Test, runOcrPaddleTest dispatchers. - test-groups.json: iOS heavy4 cluster (Gemma4+OcrLighton+OcrPaddle), iOS lightB adds Qwen35, Android groupB has Qwen35 first then Gemma4 / OcrPaddle. - Workflow: Android GroupB Device Farm jobTimeout 60→90 min.
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
🎯 What problem does this PR solve?
8189.0.2to pick up upstream fixes (Mali/Adreno F16 coopmat1 NaN, Qwen3.5 OpenCL kernels, Gemma 4 vision/audio, Vulkan VMA migration) accumulated since7248.x.</think>tags.📝 How does it solve it?
Fabric / build
vcpkg.jsontoqvac-fabric >= 8189.0.2, drops the temporary overlay-ports pin, ports the addon to the new fabric ABI (common_init_result→common_init_result_ptr;llama_clear_adapter_lora+llama_set_adapter_lora→llama_set_adapters_lora;LLAMA_EXAMPLE_MAIN→LLAMA_EXAMPLE_COMMON).flash-attn=offon OpenCL backends (not reliably supported); user-suppliedflash-attn/flash_attnoverrides are honored.Reasoning budget
reasoning_budgetconfig (-1unrestricted = default;0disabled). Wired through to fabric'sinputs.enable_thinking. Accepts both kebab (reasoning-budget) and snake-case (reasoning_budget) keys; rejects any other integer.generationParams.reasoning_budgetonmodel.run()— same shape astemp/top_p. Snapshot/restore through the existingapplyGenerationParamsToContextpath so the override is request-scoped and not sticky.0.5,-1.1, …) before they reach the integer cast.Synthetic
<think>opener<think>\n),getPromptnow propagates thethinking_forced_openflag back to the context.TextLlmContext/MtmdLlmContextprepend"<think>\n"to the visible stream so consumers see balanced<think>…</think>markup.TextLlmContextalso flipsreasoningState_.inside_reasoning = trueso the streaming state machine starts in the right state.<|channel>thought…<channel|>) and explicitly does not set the force-open flag — the path is a no-op there.Antiprompt matching
checkAntipromptnow lowercases both the recent-output window and each antiprompt beforefind(), so a singlePizzaentry catches every casing the model emits.Tool-call streaming
<tool_call>{json}</tool_call>), Qwen3.5 (HF function-call XML wrapped in<tool_call>…</tool_call>), Gemma 4 (<|tool_call>call:…<tool_call|>), Mistral / DeepSeek-R1 / Functionary / GPT-OSS (their own markers). The previous synthetic<tool_call>{json}</tool_call>post-processing layer is removed.Dead code / cleanup
sawMaliplumbing, Apple-M1 detection + projector-CPU routing,selectToolsCompactMarker(string)overload, Gemma 4 markers inQwen3ReasoningUtils, model-name-based Qwen3 fallback (architecture-only now),useCpuForVisiontest alias.🧪 How was it tested?
verifylabel):qwen3-5.test.js— basic, multi-turn, tool calling, image describe,reasoning-budget=0, and per-requestgenerationParams.reasoning_budgetoverride (verifies it's request-scoped, not sticky).gemma4.test.js— basic, multi-turn, image describe on GPU on mobile, tool calling via the native-dialect parser,reasoning-budget=0(tolerant of model-emitted reasoning).ocr-paddle.test.js— PaddleOCR-VL.reasoning.test.jsfor Qwen3.tuneConfigMapcoverage (test_text_llm_context_qwen3.cpp,test_tune_config_map.cpp).darwin-arm64(bare-make build && bare-make install) and ran the relevantconfig-parameters.test.jsscenarios end-to-end;Reverse prompt stops generationpasses with a mixed-casePiZzAantiprompt, confirming case-insensitive matching live against the model.🔌 API Changes
💥 Breaking Changes
Tool-call streaming format
Consumers that previously parsed the synthetic, normalized
<tool_call>{json}</tool_call>envelope must now parse the model's native tool-call dialect.BEFORE (every model normalized to the same envelope):
AFTER (model-specific):