Fix Qwen3.5 tool calling#6
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the XML-based tool-calling implementation to handle Qwen3.5/Nemotron-style <tool_call>...</tool_call> wrappers and refactors integration tests to use a shared model-loader.
Changes:
- Update
.xmlFunctiondocumentation/format assumptions to include a<tool_call>wrapper and expand model family mapping (Nemotron, Qwen3.5). - Configure
.xmlFunctionparsing/processing to use<tool_call>start/end tags and add targeted unit tests for Qwen3.5 chunked/wrapped output. - Refactor integration tests to lazily load model containers via a shared
IntegrationTestModelsactor and add (currently opt-in) Nemotron integration tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/mlx-swift-lm/references/tool-calling.md | Updates reference docs for .xmlFunction wrapper format and model list. |
| Tests/MLXLMTests/ToolTests.swift | Updates XML parser init usage and adds Qwen3.5 wrapper/chunking tests + inference coverage. |
| Tests/MLXLMIntegrationTests/ToolCallIntegrationTests.swift | Refactors model loading and adds Nemotron integration tests (currently skipped). |
| Tests/MLXLMIntegrationTests/IntegrationTestModels.swift | Centralizes/caches model container loading for integration tests. |
| Libraries/MLXLMCommon/Tool/ToolCallFormat.swift | Makes .xmlFunction tagged (<tool_call>...</tool_call>) and adds model_type inference for Nemotron/Qwen3.5. |
| Libraries/MLXLMCommon/Tool/Parsers/XMLFunctionParser.swift | Changes parser init to accept start/end tags and stores them. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| startTag: "<|tool_call_start|>", endTag: "<|tool_call_end|>") | ||
| case .xmlFunction: | ||
| return XMLFunctionParser() | ||
| return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>") |
There was a problem hiding this comment.
ToolCallFormat.xmlFunction now creates XMLFunctionParser with <tool_call> wrapper tags, which makes ToolCallProcessor(format: .xmlFunction) require those wrapper tags to detect tool calls. However, the XMLFunction unit tests still use the unwrapped Qwen3 Coder-style content (<function=...></function>), so streaming extraction would fail for that style. Consider supporting both wrapped and unwrapped XML function outputs (e.g., a fallback path in ToolCallProcessor/parser) or update the claimed format/tests so they are consistent and don’t regress Qwen3 Coder.
| return XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>") | |
| return XMLFunctionParser(startTag: nil, endTag: nil) |
| @Test("Test XML Function Parser - Qwen3 Coder Format") | ||
| func testXMLFunctionParser() throws { | ||
| let parser = XMLFunctionParser() | ||
| let parser = XMLFunctionParser(startTag: "<tool_call>", endTag: "</tool_call>") | ||
| let content = | ||
| "<function=get_weather><parameter=location>Tokyo</parameter><parameter=unit>celsius</parameter></function>" | ||
|
|
There was a problem hiding this comment.
This test is labeled “Qwen3 Coder Format” but the sample content is unwrapped (<function=...></function>). With the new default .xmlFunction configuration requiring <tool_call>...</tool_call> for streaming detection, this test no longer reflects the actual expected on-the-wire format. Either wrap the sample input here or split into two explicit tests (wrapped vs unwrapped) and ensure the streaming processor supports whatever formats are intended.
| /// Parser for XML function format: <function=name><parameter=key>value</parameter></function> | ||
| /// Reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/tool_parsers/qwen3_coder.py | ||
| public struct XMLFunctionParser: ToolCallParser, Sendable { | ||
| public let startTag: String? = nil // Inline format - no wrapper tags | ||
| public let endTag: String? = nil | ||
| public let startTag: String? | ||
| public let endTag: String? |
There was a problem hiding this comment.
The doc comment at the top still describes the XML function format as unwrapped (<function=...><parameter=...>...</parameter></function>), but the parser is now configured/used with <tool_call>...</tool_call> wrapper tags. Please update the comment to match the actual supported/expected format (and ideally mention whether unwrapped output is still supported).
| for: [lfm2Expectation, glm4Expectation, mistral3Expectation], timeout: 600) | ||
| private var nemotronContainer: ModelContainer { | ||
| get async throws { | ||
| try XCTSkipIf(true, "Nemotron model is opt-in only because of its size") |
There was a problem hiding this comment.
nemotronContainer always executes XCTSkipIf(true, ...), so all Nemotron integration tests are permanently skipped and will never run in any CI or local environment. If these tests are meant to be opt-in, gate them on an environment variable / launch argument (e.g., RUN_NEMOTRON_TESTS=1) rather than a hard-coded true, so they can be exercised when desired.
| try XCTSkipIf(true, "Nemotron model is opt-in only because of its size") | |
| let shouldSkip = ProcessInfo.processInfo.environment["RUN_NEMOTRON_TESTS"] != "1" | |
| try XCTSkipIf(shouldSkip, "Nemotron model is opt-in only because of its size") |
No description provided.