Skip to content

Sync with latest fabric-8189 version on LLM/Embed/NMT #1874

Open
zoq wants to merge 82 commits intotetherto:mainfrom
zoq:cpp-sanity-fixes-rebased
Open

Sync with latest fabric-8189 version on LLM/Embed/NMT #1874
zoq wants to merge 82 commits intotetherto:mainfrom
zoq:cpp-sanity-fixes-rebased

Conversation

@zoq
Copy link
Copy Markdown
Collaborator

@zoq zoq commented May 3, 2026

  • Qwen3.5 / Gemma4 / PaddleOCR-VL tests
  • Mali coopmat fix to prevent NaN

Stack of three logical changes squashed into one commit so the test
ports stay self-consistent with the build/runtime they depend on:

1. qvac-fabric overlay ports (LLM + embed + nmtcpp):
   - Pin to fabric 78db8bf4 (PR tetherto/qvac-fabric-llm.cpp#121 HEAD,
     includes c79a8851 "ggml-vulkan: Fix NaN outputs on Mali").
   - Drop -DGGML_VULKAN_DISABLE_COOPMAT*=ON for Android so coopmat
     shaders are compiled in. With coopmat off, runtime
     device->coopmat_support is false and the Mali fix's ARM-gated
     branches were skipped, leaving Qwen3-Q8_0 finetuning NaN on
     Pixel 9 Pro Mali.
   - Wire up overlay-ports in each package's vcpkg-configuration.json.
   - Add find_package(OpenSSL) before find_package(llama) in the LLM
     CMakeLists so llama-targets.cmake's transitive OpenSSL::SSL
     reference (via cpp-httplib) resolves on local builds.

2. utils.js downloadFile redirect race:
   - Track a handedOff flag set when the redirect branch hands off
     dest to a recursive call. All cleanup paths now skip fs.unlink
     once ownership is transferred, so a late error from the outer
     writestream can't delete the freshly-downloaded file (Pixel
     ENOENT after "successful" mmproj download).

3. Three new integration tests + their mobile harness wiring:
   - qwen3-5.test.js — basic / multi-turn / tool-calling
   - gemma4.test.js — text / multi-turn / image (forced to CPU on
     darwin + mobile because gemma4v projector SIGSEGVs on Metal and
     Adreno OpenCL) / tool-calling
   - ocr-paddle.test.js — OCR; mobile maxTokens capped to 768
   - Ported to the new addon API (files: { model: [absPath],
     projectionModel?: absPath }, config: …).
   - Added matching unit test test_text_llm_context_qwen3.cpp.
   - integration.auto.cjs registers runQwen35Test, runGemma4Test,
     runOcrPaddleTest dispatchers.
   - test-groups.json: iOS heavy4 cluster
     (Gemma4+OcrLighton+OcrPaddle), iOS lightB adds Qwen35,
     Android groupB has Qwen35 first then Gemma4 / OcrPaddle.
   - Workflow: Android GroupB Device Farm jobTimeout 60→90 min.
@zoq zoq requested review from a team as code owners May 3, 2026 23:03
zoq and others added 5 commits May 8, 2026 01:20
…nd bump gemma4 basic-test n_predict so the answer fits after the thinking preamble.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
…soning-budget=0 disables the model's <think> reasoning channel, and add coverage for Qwen3, Qwen3.5, and Gemma 4.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
The 8189.0.1 port (tetherto/qvac-registry-vcpkg#138) drops
port-version 1's BUILD_LLAMA=OFF portfile workaround and ships the
new fabric tip 739b309ae. Notable upstream fixes pulled in:

- Inject enable_thinking into the Jinja template context so Qwen 3.5
  and Gemma 4 actually emit <think> reasoning content.
- GGML_OP_DELTA_NET_AR Vulkan compute shader (Qwen 3.5 / DeltaNet
  decode no longer falls back to CPU per token).
- vulkan: f32 src1 strided cpy fix (embedding-model crash).

Validated on macOS-arm64: vcpkg resolves
qvac-fabric[core,gpu-backends,llama]:arm64-osx@8189.0.1 and the
addon builds end-to-end.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                40    59.60%          11                 2    81.82%         157                58    63.06%          66                33    50.00%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               135    25.00%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               621    60.84%         140                26    81.43%        3495               925    73.53%        1009               518    48.66%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

✅ E2E Mobile Test Results - iOS

Overall Status: PASSED
Device Farm Result: PASSED
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: c6d3e69

Test Summary

Metric Count
Total Tests 3
✅ Passed 3
❌ Failed 0
⏭️ Skipped 0

Links

  • 🔗 Device Farm Run: View on AWS Device Farm
  • 🔗 Workflow: View Details
  • 📋 Run ARN: arn:aws:devicefarm:us-west-2:833707431398:run:cef7531e-44ed-4ddf-a349-a4b0f72f680d/1b8e0d4c-f85f-420c-bce7-56fbe3b0fbd2

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

✅ E2E Mobile Test Results - Android

Overall Status: PASSED
Device Farm Result: PASSED
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: c6d3e69

Test Summary

Metric Count
Total Tests 6
✅ Passed 6
❌ Failed 0
⏭️ Skipped 0

Links

  • 🔗 Device Farm Run: View on AWS Device Farm
  • 🔗 Workflow: View Details
  • 📋 Run ARN: arn:aws:devicefarm:us-west-2:833707431398:run:cef7531e-44ed-4ddf-a349-a4b0f72f680d/af555cbf-65ce-4301-8b40-d859090181bd

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

zoq and others added 8 commits May 8, 2026 18:18
…orce-opens the reasoning channel.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
… runs on Mali GPU.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
…so the model reliably reaches </think>.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
# Conflicts:
#	packages/embed-llamacpp/CMakeLists.txt
#	packages/embed-llamacpp/vcpkg.json
#	packages/llm-llamacpp/CMakeLists.txt
#	packages/llm-llamacpp/test/integration/gemma4.test.js
#	packages/llm-llamacpp/test/integration/ocr-paddle.test.js
#	packages/llm-llamacpp/test/integration/qwen3-5.test.js
#	packages/llm-llamacpp/test/unit/test_text_llm_context_qwen3.cpp
#	packages/llm-llamacpp/vcpkg.json
#	packages/translation-nmtcpp/vcpkg.json
Per review: cull tests that exercise models we no longer want covered
in the LLM/SDK CI matrix.

LLM (packages/llm-llamacpp):
- Delete integration tests:
  - test/integration/afriquegemma-edge-cases.test.js
  - test/integration/afriquegemma-translation.test.js
  - test/integration/moe.test.js (dolphin-mixtral-2x7b)
- Delete docs/afriquegemma-translation.md (only documents the
  now-removed integration tests).
- Strip the medgemma-4b-it variant from:
  - test/integration/tool-calling.test.js (collapses
    ALL_TOOL_MODEL_VARIANTS / TOOL_MODEL_VARIANTS to qwen3-1.7b only,
    drops the now-unused isMobile derived var).
  - test/integration/finetuning-pause-resume.test.js (drops the
    medgemma-4b-it-q4_0 entry from FINETUNE_MODELS).
- test/unit/test_model_metadata.cpp: drop the gemma3Model_ fixture +
  the two Gemma3-specific TEST_F cases
  (DiskSingleFile_Gemma3Arch_*); update the comment block listing
  exercised arches accordingly.
- test/unit/pick-primary-gguf-path.test.js: keep the tensors.txt-first
  ordering test, but rebase the fixture filenames on
  Qwen3-4B-Q4_K_M-* so no medgemma names remain in the test corpus.
- test/mobile/test-groups.json + test/mobile/integration.auto.cjs:
  drop runAfriquegemmaEdgeCasesTest, runAfriquegemmaTranslationTest,
  runMoeTest from both ios and android groups; auto.cjs trimmed to
  match. `validate-mobile-tests.js` is green.

SDK (packages/sdk/tests-qvac):
- Delete tests/translation-afriquegemma-tests.ts.
- tests/test-definitions.ts: drop translationAfriquegemmaTests
  import + spread.
- tests/shared/executors/translation-executor.ts: drop the import,
  the spread, and the |afriquegemma branch from the dispatch regex.
- tests/mobile/consumer.ts + tests/desktop/consumer.ts: drop the
  AFRICAN_4B_TRANSLATION_Q4_K_M import and the
  resources.define("afriquegemma", ...) block; mobile also drops the
  afriquegemma-only SkipExecutor.
- tests/shared/resource-lifecycle.ts: rephrase the eviction-comment
  example to a generic "large translation model" so it no longer
  references the deleted resource.

Not touched: NOTICE/CHANGELOG (auto-generated/historical),
sdk/models/registry/* (model constants in the registry are data, not
tests), sdk/examples/translation/translation-llm-afriquegemma.ts
(consumer-facing example, not a test).
Per review: keep packages/sdk/tests-qvac/ untouched. Restore the SDK
afriquegemma test file, the test-definitions / translation-executor /
desktop+mobile consumer / resource-lifecycle edits to their state
prior to commit 36de6ec.

Only the LLM-side cull (packages/llm-llamacpp + the deleted afrique /
moe / medgemma test files there) from 36de6ec is kept.
Per review: keep the AfriqueGemma translation doc. Commit 36de6ec
removed it together with the LLM AfriqueGemma test files; restore it
unchanged from the merge tip (e29836d).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

✅ E2E Mobile Test Results - iOS

Overall Status: PASSED
Device Farm Result: PASSED
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 3
✅ Passed 3
❌ Failed 0
⏭️ Skipped 0

Links

  • 🔗 Device Farm Run: View on AWS Device Farm
  • 🔗 Workflow: View Details
  • 📋 Run ARN: arn:aws:devicefarm:us-west-2:833707431398:run:cef7531e-44ed-4ddf-a349-a4b0f72f680d/71bc77c7-0125-48e2-b038-8e666e5b6e70

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                40    59.60%          11                 2    81.82%         157                58    63.06%          66                33    50.00%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               135    25.00%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               621    60.84%         140                26    81.43%        3495               925    73.53%        1009               518    48.66%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

✅ E2E Mobile Test Results - Android

Overall Status: PASSED
Device Farm Result: PASSED
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 6
✅ Passed 6
❌ Failed 0
⏭️ Skipped 0

Links

  • 🔗 Device Farm Run: View on AWS Device Farm
  • 🔗 Workflow: View Details
  • 📋 Run ARN: arn:aws:devicefarm:us-west-2:833707431398:run:cef7531e-44ed-4ddf-a349-a4b0f72f680d/5aa44e64-10da-4de3-8ef8-ef36835ecbd5

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

gianni-cor and others added 2 commits May 9, 2026 09:26
Adds an overlay port copy of qvac-fabric pointing at v8189.0.2 of
tetherto/qvac-fabric-llm.cpp (tetherto/qvac-registry-vcpkg#140)
to llm-llamacpp, embed-llamacpp, and translation-nmtcpp, declared via
each package's vcpkg-configuration.json. Lets this PR exercise the new
fabric build (incl. the Mali coopmat1 BitNet TQ NaN fix) without
waiting for the registry baseline bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Point REF at the latest qvac-fabric-llm.cpp temp-8189 commit
(f686a1324e13184d3257cb74c1ba17f9cf8ef575) instead of v8189.0.2 so the
overlay tracks branch tip while the branch is still moving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                40    59.60%          11                 2    81.82%         157                58    63.06%          66                33    50.00%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               135    25.00%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               621    60.84%         140                26    81.43%        3495               925    73.53%        1009               518    48.66%

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                40    59.60%          11                 2    81.82%         157                58    63.06%          66                33    50.00%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               135    25.00%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               621    60.84%         140                26    81.43%        3495               925    73.53%        1009               518    48.66%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

✅ E2E Mobile Test Results - iOS

Overall Status: PASSED
Device Farm Result: PASSED
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 3
✅ Passed 3
❌ Failed 0
⏭️ Skipped 0

Links

  • 🔗 Device Farm Run: View on AWS Device Farm
  • 🔗 Workflow: View Details
  • 📋 Run ARN: arn:aws:devicefarm:us-west-2:833707431398:run:cef7531e-44ed-4ddf-a349-a4b0f72f680d/0f5a9421-8a14-430e-bed0-15dddd82e383

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

✅ E2E Mobile Test Results - Android

Overall Status: PASSED
Device Farm Result: PASSED
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #1874
Commit: ce2ea93

Test Summary

Metric Count
Total Tests 6
✅ Passed 6
❌ Failed 0
⏭️ Skipped 0

Links

  • 🔗 Device Farm Run: View on AWS Device Farm
  • 🔗 Workflow: View Details
  • 📋 Run ARN: arn:aws:devicefarm:us-west-2:833707431398:run:cef7531e-44ed-4ddf-a349-a4b0f72f680d/7f659385-645f-4705-9171-d13cc7099931

Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

gianni-cor and others added 5 commits May 9, 2026 15:30
Allow slower Android Device Farm runs to finish model-heavy LLM tests before the harness marks them as timed out.

Co-authored-by: Cursor <cursoragent@cursor.com>
tetherto/qvac-registry-vcpkg#140 publishes qvac-fabric@8189.0.2 in the
default registry, so the temporary per-package overlay we used while the
new fabric build was still being shaken out is no longer necessary.

For llm-llamacpp, embed-llamacpp, and translation-nmtcpp:

- Delete `packages/<pkg>/vcpkg/ports/qvac-fabric/` (portfile.cmake,
  vcpkg.json, android-vulkan-version.cmake) — the overlay copy.
- Drop the `overlay-ports` entry from each package's
  vcpkg-configuration.json. The `default-registry` baseline is left
  untouched intentionally; the `version>=` constraints below are what
  forces vcpkg to resolve to the new fabric revision against the
  unchanged baseline.
- Bump the `qvac-fabric` `version>=` pin from `8189.0.1` -> `8189.0.2`
  in each package's vcpkg.json.

Co-authored-by: Cursor <cursoragent@cursor.com>
`sawMali` was threaded through `emplaceIfValidDevice` / `tryEmplaceDevice` /
`chooseBackend` but never read by any caller — leftover from the earlier
"Force BERT/Qwen3.5 to CPU on Mali" iterations. The embed-side cleanup
already landed in 2ac5de0 ("Remove the Mali detection plumbing from the
embed addon now that BERT runs on Mali GPU."); this finishes the symmetric
removal on the LLM side. `sawAppleM1` plumbing is preserved unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
TextLlmContext flips reasoningState_.inside_reasoning = true alongside the
forced "<think>\n" opener; MtmdLlmContext doesn't because it doesn't carry a
reasoningState_ today. Add an inline note so the asymmetry isn't read as a
bug, and point at the symmetric site to update if reasoning-aware EOS
replacement is later added on the multimodal path.

Co-authored-by: Cursor <cursoragent@cursor.com>
The previous post-generation regex (`([{,])(\s*)([A-Za-z_]…)(\s*):` -> quote
the ident) was too broad: it also matched `, ident:` substrings sitting
inside JSON string values, so a tool call with a free-form string argument
like `{"query":"phase one, step: validate"}` came out corrupted as
`{"query":"phase one, "step": validate"}`, which then failed JSON.parse on
the consumer side.

In practice the rewrite is only needed for one upstream quirk: the Gemma 4
parser's `gemma4_args_to_json` (common/chat-parser.cpp) uses an
`at_key_start()` helper that peeks backwards in the output buffer for a
`{`/`,` -- so the very first top-level key is left bare while every nested
or post-comma key is already quoted. All other tool dialects reach us via
`json::dump()` upstream and already start with a quoted key.

Replace the broad regex with one anchored at `^\{(\s*)<ident>\s*:`, which
fixes exactly that single leading-bare-key case and cannot match anywhere
inside a JSON string value.

Verified end-to-end on linux-x64 against gemma-4-E2B-it-Q8_0 (CPU):

- Adversarial prompt forcing `phase one, step: validate` as a tool arg
  string: baseline produced invalid JSON
  `{"query":"phase one, "step": validate"}` (parse fail at pos 55);
  this fix yields `{"query":"phase one, step: validate"}` and the test
  passes 7/7 assertions.
- Existing simple-args happy path (`get_weather` with city/unit) still
  passes 5/5.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants