Commit 60f70ef
QVAC-16769 feat[bc]: tool calls chaining compact (#1379)
* feat: anchored tools placement for multi-round tool chains
Replace tools-at-end placement with anchored placement: tools are
positioned after the last user message and stay in the KV cache
across chain rounds instead of being removed and re-added each round.
Changes:
- Template: anchor tools after last user message (two-pass Jinja2)
- PostInfer: keep tools when output contains <tool_call>, remove
only when chain completes (no tool call in output)
- Boundary tracking: recordToolBoundary sets anchor once, preserves
across chain rounds
- Streaming: capture output when toolsAtEnd is active for tool call
detection
- Stats: forward nPastBeforeTools, firstMsgTokens, toolsTrimmed
- Generation prompt: treat role "tool" same as "user" for
add_generation_prompt (fixes empty response on tool chain
continuation)
* fix: prevent output duplication in streaming mode with toolsAtEnd
Use captured output only for internal tool call detection, don't set
it as the return value when streaming. Prevents the JobRunner from
queuing the full text again after it was already streamed token by
token, which caused the SDK to see every tool call twice.
* fix: avoid unnecessary string copy for non-tool completions
Move captured output construction inside the toolsAtEnd guard so
non-tool completions pay zero string overhead. Only the oss.str()
call and tool_call detection happen when dynamic tools are active.
* fix: context sliding with tools_at_end corrupts tool boundary tracking
When context sliding occurs with tools_at_end enabled, the
nPastBeforeTools boundary was not adjusted after token discard.
This left stale tool tokens in the KV cache, causing incorrect
trim after generation.
Changes:
- Limit discard to conversation-only region (never eat tool tokens)
- Adjust nPastBeforeTools after sliding by the discard delta
- Reset DynamicToolsState in fallback discard path
- Applied to both TextLlmContext and MtmdLlmContext
- Add regression test for sliding during generation with large tools
* refactor: extract sliding helpers into DynamicToolsState, harden edge cases
- Extract clampDiscard() and adjustAfterSlide() into DynamicToolsState
to eliminate 4x duplicated clamping/adjustment blocks
- Remove redundant std::max(safeLimit, 0) — guard already ensures > 0
- Add discard == 0 early return in applyContextDiscard to skip no-op
KV cache operations
- Guard fallback reset() with toolsAtEnd() check for consistency
- Add comment explaining eval vs generation fallback asymmetry
- Use n_predict=-2 (fill context) in test to guarantee sliding
* test: update sliding test for anchored tools behavior
With anchored tools, postInfer keeps tools in cache when the model
produces <tool_call> in output. Update the sliding regression test
to check toolsTrimmed stat instead of assuming tools are always
removed after generation.
* test: two-phase sliding test verifies adjustAfterSlide
Replace single-phase sliding test with two-phase comparison:
Phase 1 (baseline): large context, n_predict=0 → no sliding.
Records nPastBeforeTools as the original anchor.
Phase 2 (sliding): small context, n_predict=-2 → sliding fires.
After trim, nPastBeforeTools must be less than baseline.
Without adjustAfterSlide: both phases have equal nPastBeforeTools → FAIL.
With adjustAfterSlide: phase 2 anchor is smaller → PASS.
* test: exact sliding anchor assertion with session and clamped discard
Three-phase test using session cache:
Phase 1: init session (small firstMsgTokens)
Phase 2: baseline — large context, n_predict=0, records anchor
Phase 3: sliding — small context, n_predict=-2, sliding fires
Simulates per-slide clamped discard (min(nDiscarded, safeLimit))
and asserts slideNPBT == expectedNPBT with exact values. Verifies
adjustAfterSlide reduces anchor by the correct amount per slide.
* test: add unclamped sliding test with long conversation
Second sliding test with longer user message and smaller n_discarded
(20). Verifies at least 1 slide discards the full n_discarded amount
(unclamped). Both tests simulate per-slide clamped discard and assert
exact nPastBeforeTools values.
* test: use n_discarded=100 with long conversation for unclamped sliding
Longer user message (~300 tokens) ensures the conversation region
exceeds n_discarded=100. Each slide discards the full 100 tokens
without clamping. Simpler and more direct than using small n_discarded.
* fix: don't add generation prompt on system-only prefill
When nPast=0 and the only message is a system prompt (role=system),
don't set add_generation_prompt=true. This was adding a stale
<|im_start|>assistant token to the cache that the model would see
as an empty assistant turn before the actual user message.
Now check the actual last message role instead of hardcoding true.
Saves 3 tokens in the cache prefix.
* chore: remove debug prompt logging
* chore: add debug log for tokenizeChat generation prompt flag
Logs nPast, lastRole, nMsgs, nTools, addGenPrompt at DEBUG verbosity.
Helps diagnose issues with stale generation prompt in cache.
* (fix) llamacpp-llm: "tool" role generate prompt tests
* (fix) llamacpp-llm: no "think" blocks in assistant history
* (internal) llamacpp-llm: test qwen3 dynamic tools template
* (chore) llamacpp-llm: upgrade package version
* fix: skip dispatch validation when called via workflow_call
The Validate Dispatch Inputs step fails when the mobile integration
workflow is invoked via workflow_call from a workflow_dispatch parent,
because github.event.inputs.package is empty in that context.
* fix: align prebuild download path with verify step in LLM mobile workflow
Prebuilds are downloaded to runner.temp/qvac-lib-infer-llamacpp-llm but
the verify step looked in runner.temp/prebuilds-download, so prebuilds
were never found.
* (internal) llamacpp-llm: runtimeDebugStats internal method
* (chore) llamacpp-llm: tools_at_end rename to tools_compact
* (improvement) llamacpp-llm: tools_compact feature docs
* (chore) llamacpp-llm: fix test
* (chore) llamacpp-llm: rename, cleanup, tests assertions
* (internal) llamacpp-llm: improve tests
* (internal) llamacpp-llm: reduce test flakiness with 0 temp
* (internal) llamacpp-llm: test rename
* (internal) llamacpp-llm: generate tests correct
* (internal) llamacpp-llm: improve sliding ctx tests
* (chore) llamacpp-llm: version bump
* (chore) llamacpp-llm: clang-format
* (fix) llamacpp-llm: qwen3 template perf and debug null guard
* (chore) llamacpp-llm: discard tokens warning
* (chore) llamacpp-llm: reuse getStatValue at tests
* (fix) llamacpp-llm: first msg sliding guard
* (improvement) llamacpp-llm: tools_compact require tools always
* (chore) llamacpp-llm: fix linter
* (fix) llamacpp-llm: guard regression, integration tests
* (internal) llamacpp-llm: remove over-defensive checks, fix test
* (chore) llamacpp-llm: cleanup linter and unused tests
* refactoring: anchored tools structured (#1658)
* (doc) llamacpp-llm: structure proposal
* (doc) llamacpp-llm: refactoring plan
* (internal) llamacpp-llm: extract tools compact controller from llm contexts
* (internal) llamacpp-llm: extract shared context slider for text and mtmd
* (internal) llamacpp-llm: ContextSlider testable, more tests
* (internal) llamacpp-llm: migrate tools compact coverage to deterministic unit tests
* (chore) llamacpp-llm: follow up minor fixes
* (internal) llamacpp-llm: improve multi-model portability
* (internal) llamacpp-llm: decouple ChatTemplateUtils
* (internal) llamacpp-llm: tools_compact contract, tests
* (internal) llamacpp-llm: ToolsCompactController tests and comments
* (doc) llamacpp-llm: tools_compact refine verify
* (internal) llamacpp-llm: tools compact profile resolution improved
* (chore) llamacpp-llm: clang format
* (chore) llamacpp-llm: tools-compact test improved
* (chore) llamacpp-llm: test conditin check style
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* (chore) llamacpp-llm: bump version, remove nested namespace
* (chore) llamacpp-llm: changelog improved
* (chore) llamacpp-llm: cleanup, test tool token count comment
* (chore) llamacpp-llm: tests useless conditional
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* (chore) llamacpp-llm: tests refactor and remove redundant
* (chore) llamacpp-llm: deduplicate cache management tests, context slider edge coverage
* (chore) llamacpp-llm: clang format
* (fix) llamacpp-llm: ToolsCompact tools_calls check
* (internal) llamacpp-llm: oss string handle optimization
* (internal) llamacpp-llm: compute user msg index at cpp
* Revert "(internal) llamacpp-llm: compute user msg index at cpp"
This reverts commit 872eb47.
* (internal) llamacpp-llm: qwen3 dynamic template loop perf improved
* (chore) llamacpp-llm: clang format
---------
Co-authored-by: olyasir <sirkinolya@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>1 parent 8531289 commit 60f70ef
38 files changed
Lines changed: 3171 additions & 1337 deletions
File tree
- packages/qvac-lib-infer-llamacpp-llm
- addon/src
- model-interface
- utils
- docs
- examples
- test
- integration
- mobile
- unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
3 | 59 | | |
4 | 60 | | |
5 | 61 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| 73 | + | |
72 | 74 | | |
73 | 75 | | |
74 | 76 | | |
| |||
110 | 112 | | |
111 | 113 | | |
112 | 114 | | |
| 115 | + | |
113 | 116 | | |
114 | 117 | | |
115 | 118 | | |
116 | 119 | | |
| 120 | + | |
117 | 121 | | |
118 | 122 | | |
119 | 123 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
176 | | - | |
| 176 | + | |
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
| |||
315 | 315 | | |
316 | 316 | | |
317 | 317 | | |
318 | | - | |
| 318 | + | |
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
| |||
Lines changed: 3 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
| 32 | + | |
| 33 | + | |
38 | 34 | | |
39 | 35 | | |
40 | | - | |
41 | | - | |
42 | | - | |
| 36 | + | |
43 | 37 | | |
44 | 38 | | |
45 | 39 | | |
| |||
Lines changed: 9 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
13 | 20 | | |
14 | 21 | | |
15 | 22 | | |
16 | 23 | | |
17 | 24 | | |
18 | 25 | | |
19 | 26 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
Lines changed: 137 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
Lines changed: 80 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
0 commit comments