Skip to content

Fix Qwen3.5 tool calling#6

Merged
atdrendel merged 12 commits into
mainfrom
pr-133-qwen3.5-tool-calling
Mar 11, 2026
Merged

Fix Qwen3.5 tool calling#6
atdrendel merged 12 commits into
mainfrom
pr-133-qwen3.5-tool-calling

Conversation

@atdrendel

Copy link
Copy Markdown

No description provided.

Copilot AI review requested due to automatic review settings March 11, 2026 11:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the XML-based tool-calling implementation to handle Qwen3.5/Nemotron-style <tool_call>...</tool_call> wrappers and refactors integration tests to use a shared model-loader.

Changes:

  • Update .xmlFunction documentation/format assumptions to include a <tool_call> wrapper and expand model family mapping (Nemotron, Qwen3.5).
  • Configure .xmlFunction parsing/processing to use <tool_call> start/end tags and add targeted unit tests for Qwen3.5 chunked/wrapped output.
  • Refactor integration tests to lazily load model containers via a shared IntegrationTestModels actor and add (currently opt-in) Nemotron integration tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
skills/mlx-swift-lm/references/tool-calling.md Updates reference docs for .xmlFunction wrapper format and model list.
Tests/MLXLMTests/ToolTests.swift Updates XML parser init usage and adds Qwen3.5 wrapper/chunking tests + inference coverage.
Tests/MLXLMIntegrationTests/ToolCallIntegrationTests.swift Refactors model loading and adds Nemotron integration tests (currently skipped).
Tests/MLXLMIntegrationTests/IntegrationTestModels.swift Centralizes/caches model container loading for integration tests.
Libraries/MLXLMCommon/Tool/ToolCallFormat.swift Makes .xmlFunction tagged (<tool_call>...</tool_call>) and adds model_type inference for Nemotron/Qwen3.5.
Libraries/MLXLMCommon/Tool/Parsers/XMLFunctionParser.swift Changes parser init to accept start/end tags and stores them.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

startTag: "<|tool_call_start|>", endTag: "<|tool_call_end|>")
case .xmlFunction:
return XMLFunctionParser()
return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolCallFormat.xmlFunction now creates XMLFunctionParser with <tool_call> wrapper tags, which makes ToolCallProcessor(format: .xmlFunction) require those wrapper tags to detect tool calls. However, the XMLFunction unit tests still use the unwrapped Qwen3 Coder-style content (<function=...></function>), so streaming extraction would fail for that style. Consider supporting both wrapped and unwrapped XML function outputs (e.g., a fallback path in ToolCallProcessor/parser) or update the claimed format/tests so they are consistent and don’t regress Qwen3 Coder.

Suggested change
return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")
return XMLFunctionParser(startTag: nil, endTag: nil)

Copilot uses AI. Check for mistakes.
Comment on lines 214 to 219
@Test("Test XML Function Parser - Qwen3 Coder Format")
func testXMLFunctionParser() throws {
let parser = XMLFunctionParser()
let parser = XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")
let content =
"<function=get_weather><parameter=location>Tokyo</parameter><parameter=unit>celsius</parameter></function>"

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is labeled “Qwen3 Coder Format” but the sample content is unwrapped (<function=...></function>). With the new default .xmlFunction configuration requiring <tool_call>...</tool_call> for streaming detection, this test no longer reflects the actual expected on-the-wire format. Either wrap the sample input here or split into two explicit tests (wrapped vs unwrapped) and ensure the streaming processor supports whatever formats are intended.

Copilot uses AI. Check for mistakes.
Comment on lines 5 to +9
/// Parser for XML function format: <function=name><parameter=key>value</parameter></function>
/// Reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/tool_parsers/qwen3_coder.py
public struct XMLFunctionParser: ToolCallParser, Sendable {
public let startTag: String? = nil // Inline format - no wrapper tags
public let endTag: String? = nil
public let startTag: String?
public let endTag: String?

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment at the top still describes the XML function format as unwrapped (<function=...><parameter=...>...</parameter></function>), but the parser is now configured/used with <tool_call>...</tool_call> wrapper tags. Please update the comment to match the actual supported/expected format (and ideally mention whether unwrapped output is still supported).

Copilot uses AI. Check for mistakes.
for: [lfm2Expectation, glm4Expectation, mistral3Expectation], timeout: 600)
private var nemotronContainer: ModelContainer {
get async throws {
try XCTSkipIf(true, "Nemotron model is opt-in only because of its size")

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nemotronContainer always executes XCTSkipIf(true, ...), so all Nemotron integration tests are permanently skipped and will never run in any CI or local environment. If these tests are meant to be opt-in, gate them on an environment variable / launch argument (e.g., RUN_NEMOTRON_TESTS=1) rather than a hard-coded true, so they can be exercised when desired.

Suggested change
try XCTSkipIf(true, "Nemotron model is opt-in only because of its size")
let shouldSkip = ProcessInfo.processInfo.environment["RUN_NEMOTRON_TESTS"] != "1"
try XCTSkipIf(shouldSkip, "Nemotron model is opt-in only because of its size")

Copilot uses AI. Check for mistakes.
@atdrendel atdrendel merged commit 80cefa4 into main Mar 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants