Sync with latest fabric-8189 version on LLM/Embed/NMT #1874
Sync with latest fabric-8189 version on LLM/Embed/NMT #1874zoq wants to merge 82 commits intotetherto:mainfrom
Conversation
zoq
commented
May 3, 2026
- Qwen3.5 / Gemma4 / PaddleOCR-VL tests
- Mali coopmat fix to prevent NaN
Stack of three logical changes squashed into one commit so the test ports stay self-consistent with the build/runtime they depend on: 1. qvac-fabric overlay ports (LLM + embed + nmtcpp): - Pin to fabric 78db8bf4 (PR tetherto/qvac-fabric-llm.cpp#121 HEAD, includes c79a8851 "ggml-vulkan: Fix NaN outputs on Mali"). - Drop -DGGML_VULKAN_DISABLE_COOPMAT*=ON for Android so coopmat shaders are compiled in. With coopmat off, runtime device->coopmat_support is false and the Mali fix's ARM-gated branches were skipped, leaving Qwen3-Q8_0 finetuning NaN on Pixel 9 Pro Mali. - Wire up overlay-ports in each package's vcpkg-configuration.json. - Add find_package(OpenSSL) before find_package(llama) in the LLM CMakeLists so llama-targets.cmake's transitive OpenSSL::SSL reference (via cpp-httplib) resolves on local builds. 2. utils.js downloadFile redirect race: - Track a handedOff flag set when the redirect branch hands off dest to a recursive call. All cleanup paths now skip fs.unlink once ownership is transferred, so a late error from the outer writestream can't delete the freshly-downloaded file (Pixel ENOENT after "successful" mmproj download). 3. Three new integration tests + their mobile harness wiring: - qwen3-5.test.js — basic / multi-turn / tool-calling - gemma4.test.js — text / multi-turn / image (forced to CPU on darwin + mobile because gemma4v projector SIGSEGVs on Metal and Adreno OpenCL) / tool-calling - ocr-paddle.test.js — OCR; mobile maxTokens capped to 768 - Ported to the new addon API (files: { model: [absPath], projectionModel?: absPath }, config: …). - Added matching unit test test_text_llm_context_qwen3.cpp. - integration.auto.cjs registers runQwen35Test, runGemma4Test, runOcrPaddleTest dispatchers. - test-groups.json: iOS heavy4 cluster (Gemma4+OcrLighton+OcrPaddle), iOS lightB adds Qwen35, Android groupB has Qwen35 first then Gemma4 / OcrPaddle. - Workflow: Android GroupB Device Farm jobTimeout 60→90 min.
…nd bump gemma4 basic-test n_predict so the answer fits after the thinking preamble. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
…soning-budget=0 disables the model's <think> reasoning channel, and add coverage for Qwen3, Qwen3.5, and Gemma 4. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
The 8189.0.1 port (tetherto/qvac-registry-vcpkg#138) drops port-version 1's BUILD_LLAMA=OFF portfile workaround and ships the new fabric tip 739b309ae. Notable upstream fixes pulled in: - Inject enable_thinking into the Jinja template context so Qwen 3.5 and Gemma 4 actually emit <think> reasoning content. - GGML_OP_DELTA_NET_AR Vulkan compute shader (Qwen 3.5 / DeltaNet decode no longer falls back to CPU per token). - vulkan: f32 src1 strided cpy fix (embedding-model crash). Validated on macOS-arm64: vcpkg resolves qvac-fabric[core,gpu-backends,llama]:arm64-osx@8189.0.1 and the addon builds end-to-end. Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
🧪 C++ Test Coverage ReportCoverage: 📊 Detailed Coverage |
✅ E2E Mobile Test Results - iOSOverall Status: PASSED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
✅ E2E Mobile Test Results - AndroidOverall Status: PASSED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
…orce-opens the reasoning channel. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
… runs on Mali GPU. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
…so the model reliably reaches </think>. Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
# Conflicts: # packages/embed-llamacpp/CMakeLists.txt # packages/embed-llamacpp/vcpkg.json # packages/llm-llamacpp/CMakeLists.txt # packages/llm-llamacpp/test/integration/gemma4.test.js # packages/llm-llamacpp/test/integration/ocr-paddle.test.js # packages/llm-llamacpp/test/integration/qwen3-5.test.js # packages/llm-llamacpp/test/unit/test_text_llm_context_qwen3.cpp # packages/llm-llamacpp/vcpkg.json # packages/translation-nmtcpp/vcpkg.json
Per review: cull tests that exercise models we no longer want covered
in the LLM/SDK CI matrix.
LLM (packages/llm-llamacpp):
- Delete integration tests:
- test/integration/afriquegemma-edge-cases.test.js
- test/integration/afriquegemma-translation.test.js
- test/integration/moe.test.js (dolphin-mixtral-2x7b)
- Delete docs/afriquegemma-translation.md (only documents the
now-removed integration tests).
- Strip the medgemma-4b-it variant from:
- test/integration/tool-calling.test.js (collapses
ALL_TOOL_MODEL_VARIANTS / TOOL_MODEL_VARIANTS to qwen3-1.7b only,
drops the now-unused isMobile derived var).
- test/integration/finetuning-pause-resume.test.js (drops the
medgemma-4b-it-q4_0 entry from FINETUNE_MODELS).
- test/unit/test_model_metadata.cpp: drop the gemma3Model_ fixture +
the two Gemma3-specific TEST_F cases
(DiskSingleFile_Gemma3Arch_*); update the comment block listing
exercised arches accordingly.
- test/unit/pick-primary-gguf-path.test.js: keep the tensors.txt-first
ordering test, but rebase the fixture filenames on
Qwen3-4B-Q4_K_M-* so no medgemma names remain in the test corpus.
- test/mobile/test-groups.json + test/mobile/integration.auto.cjs:
drop runAfriquegemmaEdgeCasesTest, runAfriquegemmaTranslationTest,
runMoeTest from both ios and android groups; auto.cjs trimmed to
match. `validate-mobile-tests.js` is green.
SDK (packages/sdk/tests-qvac):
- Delete tests/translation-afriquegemma-tests.ts.
- tests/test-definitions.ts: drop translationAfriquegemmaTests
import + spread.
- tests/shared/executors/translation-executor.ts: drop the import,
the spread, and the |afriquegemma branch from the dispatch regex.
- tests/mobile/consumer.ts + tests/desktop/consumer.ts: drop the
AFRICAN_4B_TRANSLATION_Q4_K_M import and the
resources.define("afriquegemma", ...) block; mobile also drops the
afriquegemma-only SkipExecutor.
- tests/shared/resource-lifecycle.ts: rephrase the eviction-comment
example to a generic "large translation model" so it no longer
references the deleted resource.
Not touched: NOTICE/CHANGELOG (auto-generated/historical),
sdk/models/registry/* (model constants in the registry are data, not
tests), sdk/examples/translation/translation-llm-afriquegemma.ts
(consumer-facing example, not a test).
Per review: keep packages/sdk/tests-qvac/ untouched. Restore the SDK afriquegemma test file, the test-definitions / translation-executor / desktop+mobile consumer / resource-lifecycle edits to their state prior to commit 36de6ec. Only the LLM-side cull (packages/llm-llamacpp + the deleted afrique / moe / medgemma test files there) from 36de6ec is kept.
✅ E2E Mobile Test Results - iOSOverall Status: PASSED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
🧪 C++ Test Coverage ReportCoverage: 📊 Detailed Coverage |
✅ E2E Mobile Test Results - AndroidOverall Status: PASSED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
Adds an overlay port copy of qvac-fabric pointing at v8189.0.2 of tetherto/qvac-fabric-llm.cpp (tetherto/qvac-registry-vcpkg#140) to llm-llamacpp, embed-llamacpp, and translation-nmtcpp, declared via each package's vcpkg-configuration.json. Lets this PR exercise the new fabric build (incl. the Mali coopmat1 BitNet TQ NaN fix) without waiting for the registry baseline bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Point REF at the latest qvac-fabric-llm.cpp temp-8189 commit (f686a1324e13184d3257cb74c1ba17f9cf8ef575) instead of v8189.0.2 so the overlay tracks branch tip while the branch is still moving. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🧪 C++ Test Coverage ReportCoverage: 📊 Detailed Coverage |
1 similar comment
🧪 C++ Test Coverage ReportCoverage: 📊 Detailed Coverage |
✅ E2E Mobile Test Results - iOSOverall Status: PASSED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
✅ E2E Mobile Test Results - AndroidOverall Status: PASSED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
Allow slower Android Device Farm runs to finish model-heavy LLM tests before the harness marks them as timed out. Co-authored-by: Cursor <cursoragent@cursor.com>
tetherto/qvac-registry-vcpkg#140 publishes qvac-fabric@8189.0.2 in the default registry, so the temporary per-package overlay we used while the new fabric build was still being shaken out is no longer necessary. For llm-llamacpp, embed-llamacpp, and translation-nmtcpp: - Delete `packages/<pkg>/vcpkg/ports/qvac-fabric/` (portfile.cmake, vcpkg.json, android-vulkan-version.cmake) — the overlay copy. - Drop the `overlay-ports` entry from each package's vcpkg-configuration.json. The `default-registry` baseline is left untouched intentionally; the `version>=` constraints below are what forces vcpkg to resolve to the new fabric revision against the unchanged baseline. - Bump the `qvac-fabric` `version>=` pin from `8189.0.1` -> `8189.0.2` in each package's vcpkg.json. Co-authored-by: Cursor <cursoragent@cursor.com>
`sawMali` was threaded through `emplaceIfValidDevice` / `tryEmplaceDevice` / `chooseBackend` but never read by any caller — leftover from the earlier "Force BERT/Qwen3.5 to CPU on Mali" iterations. The embed-side cleanup already landed in 2ac5de0 ("Remove the Mali detection plumbing from the embed addon now that BERT runs on Mali GPU."); this finishes the symmetric removal on the LLM side. `sawAppleM1` plumbing is preserved unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
TextLlmContext flips reasoningState_.inside_reasoning = true alongside the forced "<think>\n" opener; MtmdLlmContext doesn't because it doesn't carry a reasoningState_ today. Add an inline note so the asymmetry isn't read as a bug, and point at the symmetric site to update if reasoning-aware EOS replacement is later added on the multimodal path. Co-authored-by: Cursor <cursoragent@cursor.com>
The previous post-generation regex (`([{,])(\s*)([A-Za-z_]…)(\s*):` -> quote
the ident) was too broad: it also matched `, ident:` substrings sitting
inside JSON string values, so a tool call with a free-form string argument
like `{"query":"phase one, step: validate"}` came out corrupted as
`{"query":"phase one, "step": validate"}`, which then failed JSON.parse on
the consumer side.
In practice the rewrite is only needed for one upstream quirk: the Gemma 4
parser's `gemma4_args_to_json` (common/chat-parser.cpp) uses an
`at_key_start()` helper that peeks backwards in the output buffer for a
`{`/`,` -- so the very first top-level key is left bare while every nested
or post-comma key is already quoted. All other tool dialects reach us via
`json::dump()` upstream and already start with a quoted key.
Replace the broad regex with one anchored at `^\{(\s*)<ident>\s*:`, which
fixes exactly that single leading-bare-key case and cannot match anywhere
inside a JSON string value.
Verified end-to-end on linux-x64 against gemma-4-E2B-it-Q8_0 (CPU):
- Adversarial prompt forcing `phase one, step: validate` as a tool arg
string: baseline produced invalid JSON
`{"query":"phase one, "step": validate"}` (parse fail at pos 55);
this fix yields `{"query":"phase one, step: validate"}` and the test
passes 7/7 assertions.
- Existing simple-args happy path (`get_weather` with city/unit) still
passes 5/5.
Co-authored-by: Cursor <cursoragent@cursor.com>