Fix Qwen3.5 tool calling by atdrendel · Pull Request #6 · shareup/mlx-swift-lm

atdrendel · 2026-03-11T11:41:20Z

No description provided.

Copilot

Pull request overview

Updates the XML-based tool-calling implementation to handle Qwen3.5/Nemotron-style <tool_call>...</tool_call> wrappers and refactors integration tests to use a shared model-loader.

Changes:

Update .xmlFunction documentation/format assumptions to include a <tool_call> wrapper and expand model family mapping (Nemotron, Qwen3.5).
Configure .xmlFunction parsing/processing to use <tool_call> start/end tags and add targeted unit tests for Qwen3.5 chunked/wrapped output.
Refactor integration tests to lazily load model containers via a shared IntegrationTestModels actor and add (currently opt-in) Nemotron integration tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
skills/mlx-swift-lm/references/tool-calling.md	Updates reference docs for `.xmlFunction` wrapper format and model list.
Tests/MLXLMTests/ToolTests.swift	Updates XML parser init usage and adds Qwen3.5 wrapper/chunking tests + inference coverage.
Tests/MLXLMIntegrationTests/ToolCallIntegrationTests.swift	Refactors model loading and adds Nemotron integration tests (currently skipped).
Tests/MLXLMIntegrationTests/IntegrationTestModels.swift	Centralizes/caches model container loading for integration tests.
Libraries/MLXLMCommon/Tool/ToolCallFormat.swift	Makes `.xmlFunction` tagged (`<tool_call>...</tool_call>`) and adds model_type inference for Nemotron/Qwen3.5.
Libraries/MLXLMCommon/Tool/Parsers/XMLFunctionParser.swift	Changes parser init to accept start/end tags and stores them.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T11:46:24Z

                startTag: "<|tool_call_start|>", endTag: "<|tool_call_end|>")
        case .xmlFunction:
-            return XMLFunctionParser()
+            return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")


ToolCallFormat.xmlFunction now creates XMLFunctionParser with <tool_call> wrapper tags, which makes ToolCallProcessor(format: .xmlFunction) require those wrapper tags to detect tool calls. However, the XMLFunction unit tests still use the unwrapped Qwen3 Coder-style content (<function=...></function>), so streaming extraction would fail for that style. Consider supporting both wrapped and unwrapped XML function outputs (e.g., a fallback path in ToolCallProcessor/parser) or update the claimed format/tests so they are consistent and don’t regress Qwen3 Coder.

Suggested change

return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")

return XMLFunctionParser(startTag: nil, endTag: nil)

Copilot · 2026-03-11T11:46:24Z

    @Test("Test XML Function Parser - Qwen3 Coder Format")
    func testXMLFunctionParser() throws {
-        let parser = XMLFunctionParser()
+        let parser = XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")
        let content =
            "<function=get_weather><parameter=location>Tokyo</parameter><parameter=unit>celsius</parameter></function>"



This test is labeled “Qwen3 Coder Format” but the sample content is unwrapped (<function=...></function>). With the new default .xmlFunction configuration requiring <tool_call>...</tool_call> for streaming detection, this test no longer reflects the actual expected on-the-wire format. Either wrap the sample input here or split into two explicit tests (wrapped vs unwrapped) and ensure the streaming processor supports whatever formats are intended.

Copilot · 2026-03-11T11:46:25Z

 /// Parser for XML function format: <function=name><parameter=key>value</parameter></function>
 /// Reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/tool_parsers/qwen3_coder.py
 public struct XMLFunctionParser: ToolCallParser, Sendable {
-    public let startTag: String? = nil  // Inline format - no wrapper tags
-    public let endTag: String? = nil
+    public let startTag: String?
+    public let endTag: String?


The doc comment at the top still describes the XML function format as unwrapped (<function=...><parameter=...>...</parameter></function>), but the parser is now configured/used with <tool_call>...</tool_call> wrapper tags. Please update the comment to match the actual supported/expected format (and ideally mention whether unwrapped output is still supported).

Copilot · 2026-03-11T11:46:25Z

-            for: [lfm2Expectation, glm4Expectation, mistral3Expectation], timeout: 600)
+    private var nemotronContainer: ModelContainer {
+        get async throws {
+            try XCTSkipIf(true, "Nemotron model is opt-in only because of its size")


nemotronContainer always executes XCTSkipIf(true, ...), so all Nemotron integration tests are permanently skipped and will never run in any CI or local environment. If these tests are meant to be opt-in, gate them on an environment variable / launch argument (e.g., RUN_NEMOTRON_TESTS=1) rather than a hard-coded true, so they can be exercised when desired.

Suggested change

try XCTSkipIf(true, "Nemotron model is opt-in only because of its size")

let shouldSkip = ProcessInfo.processInfo.environment["RUN_NEMOTRON_TESTS"] != "1"

try XCTSkipIf(shouldSkip, "Nemotron model is opt-in only because of its size")

tpae and others added 10 commits February 1, 2026 06:19

added prefix matching for flexible parsing

a47f334

convert to pythonic tool converter

6be5dfc

Merge branch 'ml-explore:main' into main

600f712

added qwen3_5 tool calling support

49110a1

added detection at vlm level

b672eac

Merge branch 'main' into main

7a5b0a9

updated per feedback, added nemotron

2bf261d

Add Nemotron tool integration test

34b828b

Use IntegrationTestModels inside of ToolCallIntegrationTests

ec015a2

Skip Nemotron tests in ToolCallIntegrationTests by default

6efe31c

Copilot AI review requested due to automatic review settings March 11, 2026 11:41

Copilot started reviewing on behalf of atdrendel March 11, 2026 11:41 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

atdrendel added 2 commits March 11, 2026 13:11

Add Qwen3.5 tool call integration tests

233c467

Disable Nemotron thinking because it uses way too many tokens to think

f888b37

atdrendel merged commit 80cefa4 into main Mar 11, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen3.5 tool calling#6

Fix Qwen3.5 tool calling#6
atdrendel merged 12 commits into
mainfrom
pr-133-qwen3.5-tool-calling

atdrendel commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>")
	return XMLFunctionParser(startTag: nil, endTag: nil)

	try XCTSkipIf(true, "Nemotron model is opt-in only because of its size")
	let shouldSkip = ProcessInfo.processInfo.environment["RUN_NEMOTRON_TESTS"] != "1"
	try XCTSkipIf(shouldSkip, "Nemotron model is opt-in only because of its size")

Conversation

atdrendel commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants