Skip to content

QVAC-19908 fix: recover Qwen hybrid tool-call frames#2677

Merged
simon-iribarren merged 4 commits into
tetherto:mainfrom
simon-iribarren:fix/qvac-19908-qwen-hybrid-tool-call
Jun 18, 2026
Merged

QVAC-19908 fix: recover Qwen hybrid tool-call frames#2677
simon-iribarren merged 4 commits into
tetherto:mainfrom
simon-iribarren:fix/qvac-19908-qwen-hybrid-tool-call

Conversation

@simon-iribarren

@simon-iribarren simon-iribarren commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

Qwen3.5/3.6 can emit a malformed hybrid tool-call frame that fuses its two
tool templates into one. The model normally emits either:

  • Pythonic-XML: <tool_call><function=NAME><parameter=KEY>VALUE</parameter></function></tool_call>
  • Hermes JSON: <tool_call>{"name":NAME,"arguments":{...}}</tool_call>

When it blends them, it drops the XML <function=NAME> token verbatim into a
JSON envelope, producing an invalid frame whose object key is the bare string
"function=NAME":

<tool_call>
{"function=webfetch","arguments":{"url":"https://docs.opencode.ai","format":"markdown"}}
</tool_call>

This is not valid JSON (a string key with no value), so JSON.parse throws.
The frame then surfaces as a PARSE_ERROR and no structured toolCall event
is emitted
— callers see the raw <tool_call> markup as assistant text and
the tool is never invoked.

📝 How does it solve it?

  • Adds a narrow recovery path in the Hermes JSON-frame parser
    (packages/sdk/server/utils/tools/parsers/hermes.ts) for the observed
    "function=NAME" object-key shape, rewriting it to a canonical
    {"name":NAME,"arguments":{...}} frame before validation.
  • The recovery lives at the JSON-frame layer because that is where these frames
    already route: for qwen35, the parser chain is
    [parseQwen35Format, parseHermesFormat]. parseQwen35Format only claims a
    frame when it contains the <function= angle-bracket token; the hybrid
    has function= without the bracket and starts with {, so it is treated as a
    JSON-looking frame and deferred to the Hermes parser. The Qwen XML parser
    behaviour is unchanged.
  • The repair is applied to both the complete-frame path and the
    incomplete-frame (cutoff/abort) recovery path, and works per-frame so multiple
    tool calls in one message are each recovered.

Why the repair is intentionally narrow

The rewrite is anchored to the exact observed shape and re-serializes the name
with JSON.stringify, so well-formed JSON frames are never touched and a
malformed arguments payload still surfaces as a clean PARSE_ERROR rather
than crashing the parser. The following neighbouring shapes are deliberately
out of scope
(no evidence the model emits them, and broadening would risk
accepting genuinely malformed output):

  • {"function":"NAME","arguments":{...}} — valid JSON, wrong key
  • {"function":{"name":NAME,"arguments":{...}}} — nested OpenAI envelope
  • {"function=NAME","parameters":{...}}parameters instead of arguments
  • reversed key order

If those start appearing in the wild, they can be added as additional anchored
cases.

🧪 How was it tested?

  • Added a regression test for parseToolCalls(..., "qwen35") recovering a
    webfetch call from the "function=webfetch" hybrid frame.
  • Verified the test fails before the parser change (errors.length === 1,
    toolCalls.length === 0) and passes after it.
  • bun run test/unit/tool-parser.test.ts in packages/sdk: 85/85 pass.
  • Manually exercised neighbouring shapes to confirm the recovery is scoped as
    intended: the hybrid frame, multi-frame messages, and the incomplete/cutoff
    frame all recover; the out-of-scope shapes listed above remain unrecovered;
    and canonical {"name":...} frames are unaffected.

📚 Background: why these frames appear (model specificity + runtime)

This is not a one-off corruption — it is a well-documented, model-specific
behaviour of the Qwen3.5/3.6 family, and the recovery here is the same class of
mitigation other local-LLM ecosystems have adopted.

  • Qwen3.5 natively emits an XML-style tool-call format (<tool_call><function=NAME><parameter=KEY>VALUE</parameter></function></tool_call>),
    not JSON. The Qwen team confirms this is by design and that
    framework parsers are expected to convert the XML into structured JSON
    (QwenLM/Qwen3.6#125).
  • Reliability is size/quant-dependent: in that same issue, Qwen3.5-4B is the
    most stable, 9B the least, 35B-A3B in between; the model also intermittently
    emits the tool call inside the reasoning/thinking channel, leaving
    finish_reason="stop" with empty tool_calls. The maintainers state a
    model-side fix was not available for 3.5 and recommend handling thought-state
    / parsing at the application layer (it is reportedly fixed in Qwen3.6).
  • The {"function=NAME","arguments":{...}} frame this PR recovers is the
    cross-contamination of those two native formats — the XML <function=NAME>
    token bleeding into the JSON envelope — so it sits squarely in the
    documented failure space rather than being an isolated glitch.

Other ecosystems hit and patched the exact same thing:

  • claude-code-local documents identical XML-in-JSON hybrids causing re-prompt
    loops and ships a recover_garbled_tool_json() recovery plus retries and a
    lower sampling temperature
    (reliability notes).
  • The vLLM/SGLang community converged on a corrected chat template
    (qwen3.5-enhanced.jinja) plus the qwen3_xml parser — which defers parsing
    until the full parameter block arrives and auto-heals missing closing tags —
    over qwen3_coder, which corrupts on streamed nested JSON / truncation
    (allanchan339/vLLM-Qwen3-3.5-3.6-chat-template-fix,
    NVIDIA DGX forum,
    r/LocalLLaMA thread).

🧭 Path forward for addon teams (lower-level handling)

This SDK-level repair is a deliberate, model-agnostic safety net. The more
robust long-term fixes belong below the SDK, in the inference addons / runtime,
and are worth tracking for the llamacpp addon team:

  1. Grammar-constrained decoding. Constrain generation with a GBNF grammar so
    the model can only emit well-formed tool frames in the first place. This is
    exactly the direction of the in-progress llama.cpp "autoparser" work, which
    derives both a parser and a lazy grammar from the model's chat template and
    already classifies the relevant shapes (JSON_NATIVE, TAG_WITH_JSON,
    TAG_WITH_TAGGED, plus a fun_name_is_key flag)
    (ggml-org/llama.cpp pwilkin:autoparser).
  2. Template correctness. Ship a chat template that matches the model's native
    XML tool format and keeps reasoning separate, so tool calls don't leak into
    the thinking channel (the qwen3.5-enhanced.jinja lesson above).
  3. Deferred / auto-healing parsing at the runtime (the qwen3_xml approach)
    rather than streaming JSON parsing that corrupts on partial frames.
  4. Sampling for tool turns. Lower temperature for tool-heavy steps measurably
    improves format consistency (claude-code-local uses 0.2).

Until the runtime/grammar path lands, keeping this recovery as defense-in-depth
prevents silent tool-call drops for the affected Qwen builds.

Document why repairFunctionEqualsJson exists: Qwen3.5/3.6 can fuse its
XML and JSON tool templates into a single `{"function=NAME","arguments":{...}}`
frame, and the repair is intentionally scoped to that exact shape so
well-formed JSON frames are never rewritten.
@simon-iribarren

Copy link
Copy Markdown
Contributor Author

/review

@github-actions

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@simon-iribarren simon-iribarren merged commit 4b3ac99 into tetherto:main Jun 18, 2026
18 checks passed
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 18, 2026
* fix: recover Qwen hybrid tool-call frames (QVAC-19908)

* QVAC-19908 doc: explain Qwen hybrid tool-call frame origin

Document why repairFunctionEqualsJson exists: Qwen3.5/3.6 can fuse its
XML and JSON tool templates into a single `{"function=NAME","arguments":{...}}`
frame, and the repair is intentionally scoped to that exact shape so
well-formed JSON frames are never rewritten.

(cherry picked from commit 4b3ac99)
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 18, 2026
Fix-only patch release: recover malformed Qwen hybrid tool-call frames (tetherto#2677).
Bumps @qvac/sdk and @qvac/bare-sdk to 0.13.4 and adds the 0.13.4 changelog.
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 18, 2026
Fix-only patch release: recover malformed Qwen hybrid tool-call frames (tetherto#2677).
Bumps @qvac/sdk and @qvac/bare-sdk to 0.13.4 and adds the 0.13.4 changelog.

(cherry picked from commit 92da258)
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 18, 2026
Fix-only patch release: recover malformed Qwen hybrid tool-call frames (tetherto#2677).
Bumps @qvac/sdk and @qvac/bare-sdk to 0.13.4 and adds the 0.13.4 changelog.
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 18, 2026
Fix-only patch release: recover malformed Qwen hybrid tool-call frames (tetherto#2677).
Bumps @qvac/sdk and @qvac/bare-sdk to 0.13.4 and adds the 0.13.4 changelog.

(cherry picked from commit 8fbb222)
simon-iribarren added a commit that referenced this pull request Jun 18, 2026
Fix-only patch release: recover malformed Qwen hybrid tool-call frames (#2677).
Bumps @qvac/sdk and @qvac/bare-sdk to 0.13.4 and adds the 0.13.4 changelog.
simon-iribarren added a commit that referenced this pull request Jun 18, 2026
Fix-only patch release: recover malformed Qwen hybrid tool-call frames (#2677).
Bumps @qvac/sdk and @qvac/bare-sdk to 0.13.4 and adds the 0.13.4 changelog.

(cherry picked from commit 8fbb222)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tier1 verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants