Skip to content

Sync with latest fabric-8189 version on LLM/Embed/NMT #1874

Merged
gianni-cor merged 104 commits into
tetherto:mainfrom
zoq:cpp-sanity-fixes-rebased
May 11, 2026
Merged

Sync with latest fabric-8189 version on LLM/Embed/NMT #1874
gianni-cor merged 104 commits into
tetherto:mainfrom
zoq:cpp-sanity-fixes-rebased

Conversation

@zoq

@zoq zoq commented May 3, 2026

Copy link
Copy Markdown
Collaborator

🎯 What problem does this PR solve?

  • Syncs the LLM / Embed / NMT addons with qvac-fabric 8189.0.2 to pick up upstream fixes (Mali/Adreno F16 coopmat1 NaN, Qwen3.5 OpenCL kernels, Gemma 4 vision/audio, Vulkan VMA migration) accumulated since 7248.x.
  • Adds first-class support for reasoning-capable models (Qwen3, Qwen3.5, DeepSeek-R1, Gemma 4) — both as a load-time knob and per-request override — and fixes a chat-template gap that left consumers seeing dangling </think> tags.
  • Expands integration coverage for the model families this PR touches and trims dead code paths exposed by the sync.

📝 How does it solve it?

Fabric / build

  • Bumps vcpkg.json to qvac-fabric >= 8189.0.2, drops the temporary overlay-ports pin, ports the addon to the new fabric ABI (common_init_resultcommon_init_result_ptr; llama_clear_adapter_lora + llama_set_adapter_lorallama_set_adapters_lora; LLAMA_EXAMPLE_MAINLLAMA_EXAMPLE_COMMON).
  • Defaults flash-attn=off on OpenCL backends (not reliably supported); user-supplied flash-attn / flash_attn overrides are honored.

Reasoning budget

  • New reasoning_budget config (-1 unrestricted = default; 0 disabled). Wired through to fabric's inputs.enable_thinking. Accepts both kebab (reasoning-budget) and snake-case (reasoning_budget) keys; rejects any other integer.
  • Per-request override via generationParams.reasoning_budget on model.run() — same shape as temp / top_p. Snapshot/restore through the existing applyGenerationParamsToContext path so the override is request-scoped and not sticky.
  • Validation lives at the JS→C++ boundary and rejects fractional inputs (0.5, -1.1, …) before they reach the integer cast.

Synthetic <think> opener

  • When the chat template force-opens the reasoning channel (Qwen3-style / DeepSeek-R1 / Seed-OSS / GLM — templates ending in <think>\n), getPrompt now propagates the thinking_forced_open flag back to the context.
  • TextLlmContext / MtmdLlmContext prepend "<think>\n" to the visible stream so consumers see balanced <think>…</think> markup. TextLlmContext also flips reasoningState_.inside_reasoning = true so the streaming state machine starts in the right state.
  • Gemma 4's reasoning channel is model-emitted (<|channel>thought…<channel|>) and explicitly does not set the force-open flag — the path is a no-op there.

Antiprompt matching

  • checkAntiprompt now lowercases both the recent-output window and each antiprompt before find(), so a single Pizza entry catches every casing the model emits.

Tool-call streaming

  • Each model streams its native dialect verbatim: Qwen3 / Hermes (<tool_call>{json}</tool_call>), Qwen3.5 (HF function-call XML wrapped in <tool_call>…</tool_call>), Gemma 4 (<|tool_call>call:…<tool_call|>), Mistral / DeepSeek-R1 / Functionary / GPT-OSS (their own markers). The previous synthetic <tool_call>{json}</tool_call> post-processing layer is removed.

Dead code / cleanup

  • Dropped: sawMali plumbing, Apple-M1 detection + projector-CPU routing, selectToolsCompactMarker(string) overload, Gemma 4 markers in Qwen3ReasoningUtils, model-name-based Qwen3 fallback (architecture-only now), useCpuForVision test alias.

🧪 How was it tested?

  • New integration tests (gated behind the verify label):
    • qwen3-5.test.js — basic, multi-turn, tool calling, image describe, reasoning-budget=0, and per-request generationParams.reasoning_budget override (verifies it's request-scoped, not sticky).
    • gemma4.test.js — basic, multi-turn, image describe on GPU on mobile, tool calling via the native-dialect parser, reasoning-budget=0 (tolerant of model-emitted reasoning).
    • ocr-paddle.test.js — PaddleOCR-VL.
  • Reasoning-budget coverage also extended in reasoning.test.js for Qwen3.
  • C++ unit tests: OpenCL flash-attn auto-disable, Qwen3 tools-at-end double-tokenize, expanded tuneConfigMap coverage (test_text_llm_context_qwen3.cpp, test_tune_config_map.cpp).
  • Local verification: built darwin-arm64 (bare-make build && bare-make install) and ran the relevant config-parameters.test.js scenarios end-to-end; Reverse prompt stops generation passes with a mixed-case PiZzA antiprompt, confirming case-insensitive matching live against the model.

🔌 API Changes

// New load-time knob (LlamaConfig)
new LlmLlamacpp({
  files: { model: [modelPath] },
  config: {
    // ...
    reasoning_budget: 0   // disable reasoning ('-1' / 0 only; default -1)
  }
})

// New per-request override (RunOptions.generationParams)
await model.run(messages, {
  generationParams: { reasoning_budget: 0 }   // request-scoped; default restored after
})

💥 Breaking Changes

Tool-call streaming format

Consumers that previously parsed the synthetic, normalized <tool_call>{json}</tool_call> envelope must now parse the model's native tool-call dialect.

BEFORE (every model normalized to the same envelope):

<tool_call>{"name":"get_weather","arguments":{"city":"Paris"}}</tool_call>

AFTER (model-specific):

# Qwen3 / Hermes:    <tool_call>{json}</tool_call>
# Qwen3.5:           <tool_call><function_call name="…"><parameter name="…">…</parameter></function_call></tool_call>
# Gemma 4:           <|tool_call>call:get_weather{arg:<|"|>city<|"|>=<|"|>Paris<|"|>}<tool_call|>
# Mistral / DeepSeek-R1 / Functionary / GPT-OSS: their own markers

Stack of three logical changes squashed into one commit so the test
ports stay self-consistent with the build/runtime they depend on:

1. qvac-fabric overlay ports (LLM + embed + nmtcpp):
   - Pin to fabric 78db8bf4 (PR tetherto/qvac-fabric-llm.cpp#121 HEAD,
     includes c79a8851 "ggml-vulkan: Fix NaN outputs on Mali").
   - Drop -DGGML_VULKAN_DISABLE_COOPMAT*=ON for Android so coopmat
     shaders are compiled in. With coopmat off, runtime
     device->coopmat_support is false and the Mali fix's ARM-gated
     branches were skipped, leaving Qwen3-Q8_0 finetuning NaN on
     Pixel 9 Pro Mali.
   - Wire up overlay-ports in each package's vcpkg-configuration.json.
   - Add find_package(OpenSSL) before find_package(llama) in the LLM
     CMakeLists so llama-targets.cmake's transitive OpenSSL::SSL
     reference (via cpp-httplib) resolves on local builds.

2. utils.js downloadFile redirect race:
   - Track a handedOff flag set when the redirect branch hands off
     dest to a recursive call. All cleanup paths now skip fs.unlink
     once ownership is transferred, so a late error from the outer
     writestream can't delete the freshly-downloaded file (Pixel
     ENOENT after "successful" mmproj download).

3. Three new integration tests + their mobile harness wiring:
   - qwen3-5.test.js — basic / multi-turn / tool-calling
   - gemma4.test.js — text / multi-turn / image (forced to CPU on
     darwin + mobile because gemma4v projector SIGSEGVs on Metal and
     Adreno OpenCL) / tool-calling
   - ocr-paddle.test.js — OCR; mobile maxTokens capped to 768
   - Ported to the new addon API (files: { model: [absPath],
     projectionModel?: absPath }, config: …).
   - Added matching unit test test_text_llm_context_qwen3.cpp.
   - integration.auto.cjs registers runQwen35Test, runGemma4Test,
     runOcrPaddleTest dispatchers.
   - test-groups.json: iOS heavy4 cluster
     (Gemma4+OcrLighton+OcrPaddle), iOS lightB adds Qwen35,
     Android groupB has Qwen35 first then Gemma4 / OcrPaddle.
   - Workflow: Android GroupB Device Farm jobTimeout 60→90 min.
@zoq zoq requested review from a team as code owners May 3, 2026 23:03
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: da9bcb0

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: e2e6745

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: e2e6745

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: e2e6745

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: e2e6745

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants