Skip to content

QVAC-18717 feat[api]: add Qwen3.5, Gemma4 tool-call dialects and reasoning_budget param#1974

Merged
gianni-cor merged 19 commits into
tetherto:mainfrom
donriddo:feat/sdk-qwen35-gemma4-reasoning-budget
May 12, 2026
Merged

QVAC-18717 feat[api]: add Qwen3.5, Gemma4 tool-call dialects and reasoning_budget param#1974
gianni-cor merged 19 commits into
tetherto:mainfrom
donriddo:feat/sdk-qwen35-gemma4-reasoning-budget

Conversation

@donriddo
Copy link
Copy Markdown
Contributor

@donriddo donriddo commented May 11, 2026

🎯 What problem does this PR solve?

  • The SDK had no support for Qwen3.5/Qwen3.6 or Gemma4 tool-call output formats, so calling tools with those models produced no parsed tool calls.
  • @qvac/llm-llamacpp@0.20.0 (llamacpp 8189+) broke all model loads: system_prompt from LlmConfig was forwarded to the C++ arg parser as --system-prompt, which was removed in that release.
  • No way to pass the reasoning_budget parameter introduced in @qvac/llm-llamacpp@0.20.0.

📝 How does it solve it?

  • Adds Qwen3.5 Pythonic-XML parser: <tool_call><function=NAME><parameter=KEY>VALUE</parameter></function></tool_call>. String values are raw text; arrays/objects are JSON-parsed; integers reject non-integer floats. Errors surface as PARSE_ERROR (matches hermes/pythonic pattern).
  • Adds Gemma4 native parser: <|tool_call>call:NAME{key:<|"|>val<|"|>,...}<tool_call|>. Splits on <|"|> delimiter, quotes bare keys only in structural parts so , key: patterns inside string values are never misquoted as object keys.
  • Wires both parsers into the dialect dispatch and the default catch-all chain in parser.ts.
  • Adds dialect specs to completion-normalizer.ts: qwen35 reuses <tool_call> framing; gemma4 uses asymmetric <|tool_call>/<tool_call|> + thinking channel frames.
  • Auto-detects qwen35/gemma4 from model name/path in dialect.ts with guards against Q4_K_M/5b quantization/size suffix collision and Qwen3 5B parameter-count collision.
  • Adds reasoning_budget: -1 | 0 to LlmConfig (load-time) and GenerationParams (per-request). Passes through transformLlmConfig unchanged.
  • Exposes reasoning_budget as boolean in the CLI SDKGenerationParams interface (true-1, false0); extractGenerationParams parses it from the request body.
  • Fixes system_prompt being forwarded to the C++ arg parser: system_prompt is JS-only (used by completion-stream.ts to seed conversation history). It is now excluded from transformLlmConfig alongside modelType.
  • Adds completion-reasoning-budget-disabled and completion-reasoning-budget-unrestricted to tests-qvac.
  • Adds tool-calling examples for qwen35 and gemma4 under examples/tools/.
  • Wires toolDialect and resourceKey through ToolsExecutor and createToolsTest so dialect-specific e2e tests can be added once model constants are available.
  • Bumps @qvac/llm-llamacpp to ^0.20.0.

🧪 How was it tested?

  • Unit tests: 75/75 pass in tool-parser.test.ts — includes regression tests for integer rejection, array/object PARSE_ERROR propagation, and all dialect negative-case coverage.
  • Security tests: 7/7 pass.
  • CLI tests: 112/112 pass in translate.test.ts — includes reasoning_budget boolean extraction tests.
  • Tests-qvac: completion-reasoning-budget-disabled and completion-reasoning-budget-unrestricted added to e2e suite.
  • Examples: llamacpp-tools-qwen35.ts and llamacpp-tools-gemma4.ts verified locally with Bare runtime.

🔌 API Changes

Qwen3.5 / Qwen3.6 — dialect auto-detected from model name/path:

import { loadModel, completion } from "@qvac/sdk";

const modelId = await loadModel({
  modelSrc: "/models/Qwen3.5-7B-Instruct-Q4_K_M.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 4096, tools: true },
});

const run = completion({
  modelId,
  history: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [weatherTool],
  // toolDialect: "qwen35" — auto-detected; override only if needed
});

Gemma4 — dialect auto-detected from model name/path:

const modelId = await loadModel({
  modelSrc: "/models/gemma-4-9b-it-Q4_K_M.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 4096, tools: true },
});

const run = completion({
  modelId,
  history: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [weatherTool],
  // toolDialect: "gemma4" — auto-detected; override only if needed
});

reasoning_budget — load-time default and per-request override:

// -1 = unrestricted thinking, 0 = disabled
const modelId = await loadModel({
  modelSrc: "/models/Qwen3.5-7B-Instruct-Q4_K_M.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 4096, reasoning_budget: -1 },
});

const run = completion({
  modelId,
  history: [{ role: "user", content: "Think step by step." }],
  generationParams: { reasoning_budget: 0 }, // override per-request
});

…t param

- Extend toolDialectSchema with 'qwen35' and 'gemma4' values
- Add Qwen3.5 Pythonic-XML parser (qwen35.ts): <tool_call><function=NAME>
  <parameter=KEY>VALUE</parameter></function></tool_call>; string values are
  raw text, arrays/objects are JSON; type coercion from tool schema
- Add Gemma4 native parser (gemma4native.ts): <|tool_call>call:NAME{...}<tool_call|>;
  JS-literal args with <|"|> quote tokens, split-then-transliterate approach
  to safely quote bare keys without corrupting string values containing ', key:'
- Wire both parsers into parser.ts dispatch and the default catch-all chain
- Add dialect specs to completion-normalizer.ts: qwen35 reuses <tool_call>
  framing; gemma4 has asymmetric <|tool_call>/<tool_call|> + thinking frames
- Auto-detect qwen35/gemma4 from model name/path in dialect.ts with guards
  against Gemma3+Q4 quant suffix and Qwen3 5B parameter-count collisions
- Add reasoning_budget (-1 | 0) to LlmConfig (load-time) and GenerationParams
  (per-request); passes through transformLlmConfig unchanged (snake_case key
  bypasses camelCase regex, number-to-string conversion handles the value)
- Mirror reasoning_budget in CLI SDKGenerationParams type
- Add tests-qvac completion tests for reasoning_budget passthrough
- Add tool-calling examples for qwen35 and gemma4 in examples/tools/
- Bump @qvac/llm-llamacpp to ^0.20.0 (adds reasoning_budget and new model
  support shipped in fabric-8189)
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

…udget to completion-executor

llamacpp 8189+ (in @qvac/llm-llamacpp@0.20.0) removed --system-prompt from its
CLI argument parser. The SDK was forwarding system_prompt through transformLlmConfig
causing all model loads to fail with 'invalid argument: --system-prompt'.

system_prompt is JS-only: completion-stream.ts reads it to seed the conversation
history. It has no meaning at the C++ level and must be excluded alongside modelType.

Also mirrors reasoning_budget in completion-executor.ts GenerationParams so the
new tests-qvac reasoning_budget tests type-check correctly.
@donriddo donriddo added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 11, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

…on tests

- Drop the over-broad qwen.*3\.5 alternative from the qwen35 regex and
  tighten the lookahead to (?![a-z0-9]) so qwen3-50b-instruct no longer
  false-matches as qwen35
- Tighten gemma4 lookahead to (?=[^a-z0-9]|$) so gemma-40b no longer
  false-matches as gemma4
- Extract transformLlmConfig to transform.ts (no addon imports) so it
  can be unit-tested without the native addon loading
- Add llm-plugin-transform.test.ts pinning that system_prompt and
  modelType are never forwarded to C++ and that reasoning_budget survives
- Add negative test cases for qwen3-50b and gemma-40b to tool-parser.test.ts
- Fix stale default-chain comment in parser.ts (was 'Harmony first',
  actual order is Gemma4 first)
- Add inline justification for qwen35/gemma4 fallback asymmetry
…teToolsTest

ToolsExecutor.generic now reads toolDialect (forwarded to completion()) and
resourceKey (selects which loaded model to use) from test params. The
createToolsTest helper accepts both as optional options, so dialect-specific
e2e test definitions can be added once the model constants are available
from update-models.
…ool names, add qwen35 to default parser chain

- coerceParamValue: reject empty/whitespace-only numeric params before Number() for both
  number and integer types; Number("") === 0 caused silent semantic corruption
- gemma4native callRegex and bare-key quoting regex: broaden [A-Za-z_]\w* to
  [A-Za-z_][\w-]* so hyphenated tool names (and param keys) are matched instead
  of returning matched=false and leaking raw frame markers as contentDelta
- pickFormatParsers default chain: insert parseQwen35Format ahead of parseHermesFormat
  so raw Qwen XML payloads are recovered when the model-name heuristic misses
- regression tests for all three cases
NamelsKing
NamelsKing previously approved these changes May 12, 2026
@donriddo
Copy link
Copy Markdown
Contributor Author

/review

@gianni-cor
Copy link
Copy Markdown
Contributor

/review

@gianni-cor
Copy link
Copy Markdown
Contributor

/review

@gianni-cor gianni-cor merged commit f12b236 into tetherto:main May 12, 2026
13 of 14 checks passed
opaninakuffo added a commit that referenced this pull request May 14, 2026
Bump @qvac/cli to 0.4.0 and add the v0.4.0 changelog set.

Includes all 5 cli-scoped PRs landed on release-cli-0.4.0 since cli-v0.3.0:

- QVAC-18677 feat[api]: qvac verify deps (#1969)
- QVAC-18717 feat[api]: Qwen3.5 / Gemma4 tool-call dialects + reasoning_budget (#1974)
- QVAC-18678 feat[api]: qvac verify bundle (#1984)
- QVAC-18730 feat[api]: POST /v1/images/generations on qvac serve (#2008)
- chore: consolidate PR templates and hide style note in HTML comment (#1924)

PR #1924's title lacked a ticket or [notask], so the changelog generator's
strict validator dropped it. It is added manually under the Chores section
to keep the changelog truthful to what shipped on release-cli-0.4.0.
opaninakuffo added a commit that referenced this pull request May 14, 2026
Bump @qvac/cli to 0.4.0 and add the v0.4.0 changelog set.

Includes all 5 cli-scoped PRs landed on release-cli-0.4.0 since cli-v0.3.0:

- QVAC-18677 feat[api]: qvac verify deps (#1969)
- QVAC-18717 feat[api]: Qwen3.5 / Gemma4 tool-call dialects + reasoning_budget (#1974)
- QVAC-18678 feat[api]: qvac verify bundle (#1984)
- QVAC-18730 feat[api]: POST /v1/images/generations on qvac serve (#2008)
- chore: consolidate PR templates and hide style note in HTML comment (#1924)

PR #1924's title lacked a ticket or [notask], so the changelog generator's
strict validator dropped it. It is added manually under the Chores section
to keep the changelog truthful to what shipped on release-cli-0.4.0.

(cherry picked from commit 22462c8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants