Skip to content

feat(tool_parser): add DeepSeek V3.1 tool call parser#1006

Merged
slin1237 merged 6 commits into
mainfrom
keyang/deepseek_3_1_tool_call
Apr 1, 2026
Merged

feat(tool_parser): add DeepSeek V3.1 tool call parser#1006
slin1237 merged 6 commits into
mainfrom
keyang/deepseek_3_1_tool_call

Conversation

@key4ng

@key4ng key4ng commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator

Description

Problem

DeepSeek V3.1 uses a different tool call format than V3 — no function type prefix and no markdown code blocks around JSON arguments. The existing deepseek parser cannot parse V3.1 tool calls.

Solution

Add a new deepseek31 tool call parser that handles the V3.1 format:
<|tool▁call▁begin|>{name}<|tool▁sep|>{json_args}<|tool▁call▁end|>

Changes

  • Add DeepSeek31Parser with parse_complete and streaming parse_incremental support
  • Register deepseek31 parser in the factory with model mappings for deepseek-v3.1* and deepseek-ai/DeepSeek-V3.1*
  • Add integration tests covering complete parsing, streaming, malformed JSON handling, and factory registration

Test Plan

cargo test -p tool-parser --test tool_parser_deepseek31
# 11 tests pass

E2E validated against live DeepSeek V3.1 (FP8) on 8x H200 via sglang gRPC + smg gateway — both non-streaming and streaming tool calls work correctly.

Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Added support for DeepSeek V3.1 models with tool-calling in both complete and incremental (streaming) modes. Parses multiple tool calls, streams function names and argument deltas, preserves preceding normal text, and handles nested JSON arguments while finalizing calls when complete.
  • Tests

    • Added comprehensive integration tests covering detection, complete and incremental parsing, nested-JSON arguments, malformed payload resilience, and multi-call ordering for DeepSeek V3.1.

key4ng added 5 commits March 31, 2026 16:08
Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Add streaming and factory registration tests to complement existing
parse_complete coverage.

Signed-off-by: key4ng <rukeyang@gmail.com>
@github-actions github-actions Bot added the tests Test changes label Apr 1, 2026
@coderabbitai

coderabbitai Bot commented Apr 1, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: eb7e1f0e-f839-4daf-b464-6457d264bfce

📥 Commits

Reviewing files that changed from the base of the PR and between ce6f429 and bb8c2f2.

📒 Files selected for processing (1)
  • crates/tool_parser/tests/tool_parser_deepseek31.rs

📝 Walkthrough

Walkthrough

Adds a new DeepSeek31Parser for DeepSeek v3.1 tool-call parsing: implements complete and streaming parsing, registers and re-exports the parser, and includes integration tests covering parsing behaviors and factory/model routing.

Changes

Cohort / File(s) Summary
Parser Registration & Public Exports
crates/tool_parser/src/factory.rs, crates/tool_parser/src/lib.rs, crates/tool_parser/src/parsers/mod.rs
Registers a new "deepseek31" parser in the factory, routes DeepSeek v3.1 model identifiers to it, adds pub mod deepseek31; and re-exports DeepSeek31Parser, and updates public re-exports.
DeepSeek V3.1 Parser Implementation
crates/tool_parser/src/parsers/deepseek31.rs
Adds DeepSeek31Parser implementing ToolParser with buffering, regex-based marker extraction, complete and incremental parsing, streaming ToolCallItem emission, JSON handling/wrapping for non-object values, validation against tool index, and state/reset helpers.
Integration Tests
crates/tool_parser/tests/tool_parser_deepseek31.rs
Adds comprehensive Tokio-based tests for complete and incremental parsing flows, marker detection, malformed payload resilience, multi-tool sequences, streaming edge cases, and factory/model mapping validations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • CatherineSue

Poem

🐰 I hopped through code with whiskers keen,
Found markers, buffers, and JSON between,
I streamed the chunks and parsed each name,
DeepSeek's new parser joined the game,
A tiny hop — but oh, what a scene!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(tool_parser): add DeepSeek V3.1 tool call parser' directly and accurately summarizes the main change: adding support for the DeepSeek V3.1 tool call format. It is concise, specific, and uses the conventional commit format.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch keyang/deepseek_3_1_tool_call

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the DeepSeek31Parser to support the DeepSeek V3.1 tool call format, including its registration in the ParserFactory and comprehensive integration tests. The review feedback identifies several critical improvements for the parser's robustness and performance: refining regex patterns to use non-greedy matching and lookaheads to prevent parsing errors, optimizing buffer management by using drain to avoid data loss and unnecessary reallocations, ensuring that parse_complete correctly captures text following tool call blocks, and avoiding redundant string cloning during incremental parsing.

Comment thread crates/tool_parser/src/parsers/deepseek31.rs
Comment thread crates/tool_parser/src/parsers/deepseek31.rs
Comment thread crates/tool_parser/src/parsers/deepseek31.rs
Comment thread crates/tool_parser/src/parsers/deepseek31.rs
Comment thread crates/tool_parser/src/parsers/deepseek31.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/parsers/deepseek31.rs`:
- Around line 162-181: The current logic sets has_tool_call true for any chunk
containing a tool marker and returns normal_text = "" which drops any plain-text
prefix that appears before the first marker; update the code in the parser
method that builds current_text (the block using self.buffer.push_str,
current_text, has_tool_call and the StreamingParseResult return) to detect the
index of the first tool marker (use has_tool_markers or search for marker
strings like "<|tool▁calls▁begin|>" / "<|tool▁call▁begin|>"), split off and
remove the bytes before that first marker into normal_text to emit immediately,
leave the remainder (starting at the first marker) in self.buffer for tool
parsing, and then proceed to parse tool calls as before; also add a regression
test feeding a chunk shaped like
"prefix<|tool▁calls▁begin|><|tool▁call▁begin|>..." to ensure the prefix is
preserved and emitted and the tool parsing still proceeds.
- Around line 186-188: The code is using
self.partial_tool_call_regex.captures(&current_text) against the entire buffer
which lets later/embedded calls bleed into the current match; change the logic
to only parse the front-most (earliest) call by locating the first match
anchored at the buffer start (use a regex find/anchored match or captures_iter
and pick the lowest start index) and then consume exactly that matched slice
from self.buffer before returning; apply the same front-most-consume fix to the
other block (the code around the other captures usage at lines 246-259) and add
a regression test that feeds a single chunk containing two coalesced tool calls
(e.g., "...<tool_call_end><tool_call_begin>next<tool_sep>...") to ensure both
complete calls are drained and emitted in sequence.

In `@crates/tool_parser/tests/tool_parser_deepseek31.rs`:
- Around line 224-243: The test currently only calls factory.has_parser and
registry().has_parser_for_model which can pass even if "deepseek-v3.1" routes to
the old parser; replace or augment those checks by using
ParserFactory::create_for_model("deepseek-v3.1") (or
registry().get_parser("deepseek-v3.1")) to obtain the actual parser instance and
then assert its identity or behavior (e.g., assert the parser's identifier
equals "deepseek31" or run a small V3.1-specific sample input through the
returned parser and assert the expected V3.1-specific output), ensuring the
mapping is behavioral rather than just present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 576c2cd4-0677-4a7c-aa22-12ddec381c96

📥 Commits

Reviewing files that changed from the base of the PR and between aba58c8 and ce6f429.

📒 Files selected for processing (5)
  • crates/tool_parser/src/factory.rs
  • crates/tool_parser/src/lib.rs
  • crates/tool_parser/src/parsers/deepseek31.rs
  • crates/tool_parser/src/parsers/mod.rs
  • crates/tool_parser/tests/tool_parser_deepseek31.rs

Comment thread crates/tool_parser/src/parsers/deepseek31.rs
Comment thread crates/tool_parser/src/parsers/deepseek31.rs
Comment thread crates/tool_parser/tests/tool_parser_deepseek31.rs Outdated
Comment on lines +221 to +223
let argument_diff = func_args_raw
.strip_prefix(last_sent)
.unwrap_or(func_args_raw);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Important: End markers leak into streamed arguments when <|tool▁call▁end|> arrives in the same chunk as the final JSON content.

The partial_tool_call_regex group 2 ((.*) after <|tool▁sep|>) greedily captures everything to end-of-string, including <|tool▁call▁end|> and <|tool▁calls▁end|> tokens. Unlike the V3 parser, whose ```json\n anchor naturally stops the args capture before closing markers, V3.1's raw format has no such delimiter.

When the end marker arrives in the same chunk as the last argument bytes, func_args_raw becomes e.g. {"location": "Tokyo"}<|tool▁call▁end|>. The strip_prefix(last_sent) here succeeds, producing argument_diff = <|tool▁call▁end|>, which is then pushed to the client as argument content (lines 225-233) before the is_complete_json check runs at line 236.

Fix: strip end markers from func_args_raw before the diff computation, e.g.:

let func_args_raw = func_args_raw
    .trim_end_matches("<|tool▁call▁end|>")
    .trim_end_matches("<|tool▁calls▁end|>")
    .trim();

Or adjust the partial regex to exclude them:

(?s)<|tool▁call▁begin|>(.*)<|tool▁sep|>(.*?)(?:<|tool▁call▁end|>|$)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test to see if this happens @key4ng

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will create a new pr to add the test. also validate the <|end_of_sentence|> captured by edge case test

Comment thread crates/tool_parser/src/parsers/deepseek31.rs
@CatherineSue CatherineSue added the tool-parser Tool/function call parser changes label Apr 1, 2026
@slin1237 slin1237 merged commit f404d5d into main Apr 1, 2026
70 checks passed
@slin1237 slin1237 deleted the keyang/deepseek_3_1_tool_call branch April 1, 2026 21:50
@key4ng

key4ng commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator Author

E2E Validation

Tested against a live DeepSeek V3.1 (FP8) deployment on 8x H200.

Setup

sglang backend (gRPC mode):

python -m sglang.launch_server \
  --model deepseek-ai/DeepSeek-V3.1 \
  --tp 8 --trust-remote-code --port 30000 --grpc-mode

smg router:

./target/release/smg \
  --worker-urls grpc://localhost:30000 \
  --model-path deepseek-ai/DeepSeek-V3.1 \
  --tokenizer-path /home/ubuntu/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-V3.1/snapshots/c0781d039fb7a1ba2abc4add0bdc293e92d2b8db \
  --tool-call-parser deepseek31 \
  --port 8080

Test result

============================= test session starts ==============================
collected 23 items

TestDeepSeek31NonStreaming::test_single_tool_call_with_tool_choice_required  PASSED
TestDeepSeek31NonStreaming::test_single_tool_call_auto                        PASSED
TestDeepSeek31NonStreaming::test_tool_call_finish_reason                      PASSED
TestDeepSeek31NonStreaming::test_tool_call_arguments_are_valid_json           PASSED
TestDeepSeek31NonStreaming::test_tool_call_has_id                             PASSED
TestDeepSeek31NonStreaming::test_multiple_tools_available_model_picks_correct_one PASSED
TestDeepSeek31NonStreaming::test_parallel_tool_calls                          PASSED
TestDeepSeek31NonStreaming::test_no_tool_call_when_not_needed                 PASSED
TestDeepSeek31NonStreaming::test_tool_call_with_complex_arguments             PASSED
TestDeepSeek31NonStreaming::test_usage_stats_present                          PASSED
TestDeepSeek31Streaming::test_streaming_single_tool_call                     PASSED
TestDeepSeek31Streaming::test_streaming_arguments_arrive_incrementally       PASSED
TestDeepSeek31Streaming::test_streaming_finish_reason                        PASSED
TestDeepSeek31Streaming::test_streaming_parallel_tool_calls                  PASSED
TestDeepSeek31Streaming::test_streaming_tool_call_ids_are_unique             PASSED
TestDeepSeek31MultiTurn::test_tool_result_followup                           XFAIL
TestDeepSeek31MultiTurn::test_multi_turn_streaming                           XFAIL
TestDeepSeek31EdgeCases::test_specific_tool_choice                           PASSED
TestDeepSeek31EdgeCases::test_tool_choice_none                               PASSED
TestDeepSeek31EdgeCases::test_empty_tools_list                               PASSED
TestDeepSeek31EdgeCases::test_max_tokens_limits_tool_output                  PASSED
TestDeepSeek31EdgeCases::test_unicode_in_tool_arguments                      PASSED
TestDeepSeek31EdgeCases::test_long_user_message                              PASSED

======================== 21 passed, 2 xfailed in 9.75s =========================
Full test file: e2e_test/chat_completions/test_deepseek31_tool_calling.py
"""DeepSeek V3.1 Tool Calling E2E Tests.

Comprehensive end-to-end tests for the DeepSeek V3.1 tool parser via the SMG gateway.
Tests both non-streaming and streaming modes against a live sglang backend.

Usage:
    pytest e2e_test/chat_completions/test_deepseek31_tool_calling.py -v \
        --base-url http://localhost:8080

Or run directly:
    python e2e_test/chat_completions/test_deepseek31_tool_calling.py
"""

from __future__ import annotations

import json
import logging
import os

import openai
import pytest

logger = logging.getLogger(__name__)

BASE_URL = os.environ.get("SMG_BASE_URL", "http://localhost:8080")
MODEL = os.environ.get("SMG_MODEL", "deepseek-ai/DeepSeek-V3.1")

# =============================================================================
# Client fixture
# =============================================================================


@pytest.fixture(scope="module")
def client():
    return openai.OpenAI(base_url=f"{BASE_URL}/v1", api_key="dummy")


# =============================================================================
# Tool definitions
# =============================================================================

WEATHER_TOOL = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. 'San Francisco'",
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit",
                },
            },
            "required": ["location"],
        },
    },
}

SEARCH_TOOL = {
    "type": "function",
    "function": {
        "name": "search",
        "description": "Search for information on the web.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query string",
                },
                "num_results": {
                    "type": "integer",
                    "description": "Number of results to return",
                },
            },
            "required": ["query"],
        },
    },
}

CALCULATOR_TOOL = {
    "type": "function",
    "function": {
        "name": "calculate",
        "description": "Evaluate a mathematical expression.",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression to evaluate, e.g. '2 + 3 * 4'",
                },
            },
            "required": ["expression"],
        },
    },
}

TRANSLATE_TOOL = {
    "type": "function",
    "function": {
        "name": "translate",
        "description": "Translate text from one language to another.",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "Text to translate",
                },
                "source_language": {
                    "type": "string",
                    "description": "Source language code, e.g. 'en'",
                },
                "target_language": {
                    "type": "string",
                    "description": "Target language code, e.g. 'fr'",
                },
            },
            "required": ["text", "target_language"],
        },
    },
}

CREATE_FILE_TOOL = {
    "type": "function",
    "function": {
        "name": "create_file",
        "description": "Create a file with given content.",
        "parameters": {
            "type": "object",
            "properties": {
                "filename": {"type": "string", "description": "Name of the file"},
                "content": {"type": "string", "description": "File content"},
                "overwrite": {
                    "type": "boolean",
                    "description": "Whether to overwrite if file exists",
                },
            },
            "required": ["filename", "content"],
        },
    },
}

ALL_TOOLS = [WEATHER_TOOL, SEARCH_TOOL, CALCULATOR_TOOL, TRANSLATE_TOOL, CREATE_FILE_TOOL]


# =============================================================================
# Helper
# =============================================================================


def assert_valid_tool_call(tool_call, expected_name=None):
    """Assert a tool call has valid structure."""
    assert tool_call.function.name, "Tool call must have a function name"
    assert tool_call.function.arguments, "Tool call must have arguments"
    args = json.loads(tool_call.function.arguments)
    assert isinstance(args, dict), "Arguments must be a JSON object"
    if expected_name:
        assert tool_call.function.name == expected_name, (
            f"Expected tool '{expected_name}', got '{tool_call.function.name}'"
        )
    return args


def collect_streaming_tool_calls(stream):
    """Collect tool call name and arguments from a streaming response."""
    tool_calls = {}  # index -> {name, arguments}
    chunks_count = 0
    finish_reason = None

    for chunk in stream:
        chunks_count += 1
        delta = chunk.choices[0].delta if chunk.choices else None
        if not delta:
            continue

        if chunk.choices[0].finish_reason:
            finish_reason = chunk.choices[0].finish_reason

        if delta.tool_calls:
            for tc in delta.tool_calls:
                idx = tc.index
                if idx not in tool_calls:
                    tool_calls[idx] = {"name": "", "arguments": ""}
                if tc.function and tc.function.name:
                    tool_calls[idx]["name"] = tc.function.name
                if tc.function and tc.function.arguments:
                    tool_calls[idx]["arguments"] += tc.function.arguments

    return tool_calls, chunks_count, finish_reason


# =============================================================================
# Non-Streaming Tests
# =============================================================================


class TestDeepSeek31NonStreaming:
    """Non-streaming tool call tests."""

    def test_single_tool_call_with_tool_choice_required(self, client):
        """Single tool call with tool_choice=required forces a tool call."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls, "Expected tool calls with tool_choice=required"
        assert len(msg.tool_calls) >= 1

        args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
        assert "location" in args, "get_weather should have a 'location' argument"
        logger.info("Tool args: %s", args)

    def test_single_tool_call_auto(self, client):
        """Single tool call with tool_choice=auto — model decides to call tool."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Use the get_weather tool to check weather in Paris.",
                }
            ],
            tools=[WEATHER_TOOL],
            tool_choice="auto",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        # Model should decide to call the tool when explicitly asked
        if msg.tool_calls:
            args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
            assert "location" in args
            logger.info("Auto tool call args: %s", args)
        else:
            logger.warning("Model chose not to call tool in auto mode (acceptable)")

    def test_tool_call_finish_reason(self, client):
        """Verify finish_reason is 'tool_calls' when tools are returned."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in London?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        assert response.choices[0].finish_reason in ("tool_calls", "stop")
        if response.choices[0].message.tool_calls:
            assert response.choices[0].finish_reason == "tool_calls"

    def test_tool_call_arguments_are_valid_json(self, client):
        """Tool call arguments must be parseable JSON objects."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Search for 'best restaurants in New York'",
                }
            ],
            tools=[SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        args = json.loads(msg.tool_calls[0].function.arguments)
        assert isinstance(args, dict)
        assert "query" in args, "search tool should have 'query' argument"

    def test_tool_call_has_id(self, client):
        """Each tool call should have a unique ID."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Check weather in Berlin"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert msg.tool_calls[0].id, "Tool call should have an ID"
        assert msg.tool_calls[0].type == "function"

    def test_multiple_tools_available_model_picks_correct_one(self, client):
        """When multiple tools are available, model should pick the right one."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Translate 'hello' to French. Use the translate tool.",
                }
            ],
            tools=ALL_TOOLS,
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert msg.tool_calls[0].function.name == "translate"
        args = json.loads(msg.tool_calls[0].function.arguments)
        assert "text" in args or "target_language" in args

    def test_parallel_tool_calls(self, client):
        """Model can return multiple tool calls in a single response."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": (
                        "I need two things done at once: "
                        "1) Get the weather in Tokyo "
                        "2) Search for 'Tokyo travel guide'. "
                        "Call both tools in parallel."
                    ),
                }
            ],
            tools=[WEATHER_TOOL, SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=1024,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert len(msg.tool_calls) >= 2, (
            f"Expected at least 2 tool calls for parallel request, got {len(msg.tool_calls)}"
        )

        names = {tc.function.name for tc in msg.tool_calls}
        logger.info("Parallel tool call names: %s", names)
        # Both tools should be called
        assert "get_weather" in names, "Should have called get_weather"
        assert "search" in names, "Should have called search"

    def test_no_tool_call_when_not_needed(self, client):
        """Model should not call tools when question doesn't need them."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What is 2 + 2?"}],
            tools=[WEATHER_TOOL],
            tool_choice="auto",
            temperature=0,
            max_tokens=256,
        )

        msg = response.choices[0].message
        # Model should answer directly without tools for simple math
        # (weather tool is irrelevant to the question)
        if not msg.tool_calls:
            assert msg.content, "Should have text content when not calling tools"
            logger.info("Model correctly answered without tools: %s", msg.content[:100])

    def test_tool_call_with_complex_arguments(self, client):
        """Tool call with boolean and optional arguments."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Create a file named 'test.txt' with content 'hello world', and overwrite if it exists.",
                }
            ],
            tools=[CREATE_FILE_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        args = assert_valid_tool_call(msg.tool_calls[0], "create_file")
        assert "filename" in args
        assert "content" in args
        logger.info("Complex tool args: %s", args)

    def test_usage_stats_present(self, client):
        """Response should include usage statistics."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Check weather in NYC"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=256,
        )

        assert response.usage is not None
        assert response.usage.prompt_tokens > 0
        assert response.usage.completion_tokens > 0
        assert response.usage.total_tokens > 0
        assert response.usage.total_tokens == (
            response.usage.prompt_tokens + response.usage.completion_tokens
        )


# =============================================================================
# Streaming Tests
# =============================================================================


class TestDeepSeek31Streaming:
    """Streaming tool call tests."""

    def test_streaming_single_tool_call(self, client):
        """Streaming should deliver tool call name and arguments across chunks."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
            stream=True,
        )

        tool_calls, chunks_count, finish_reason = collect_streaming_tool_calls(stream)

        assert chunks_count > 1, "Streaming should return multiple chunks"
        assert len(tool_calls) >= 1, "Should have at least one tool call"

        tc = tool_calls[0]
        assert tc["name"] == "get_weather", f"Expected 'get_weather', got '{tc['name']}'"
        args = json.loads(tc["arguments"])
        assert "location" in args
        logger.info("Streaming tool args: %s", args)

    def test_streaming_arguments_arrive_incrementally(self, client):
        """Arguments should arrive across multiple chunks, not all at once."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Search for 'comprehensive guide to machine learning algorithms and their applications in industry'",
                }
            ],
            tools=[SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
            stream=True,
        )

        arg_chunk_count = 0
        for chunk in stream:
            delta = chunk.choices[0].delta if chunk.choices else None
            if delta and delta.tool_calls:
                for tc in delta.tool_calls:
                    if tc.function and tc.function.arguments:
                        arg_chunk_count += 1

        assert arg_chunk_count > 1, (
            f"Arguments should be streamed in multiple chunks, got {arg_chunk_count}"
        )

    def test_streaming_finish_reason(self, client):
        """Streaming should end with a chunk that has finish_reason."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Weather in London"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=256,
            stream=True,
        )

        _, _, finish_reason = collect_streaming_tool_calls(stream)
        assert finish_reason is not None, "Should have a finish_reason in the final chunk"

    def test_streaming_parallel_tool_calls(self, client):
        """Streaming should handle multiple tool calls when model emits them."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": (
                        "Do two things at once: "
                        "1) Get weather in Paris "
                        "2) Search for 'Paris travel tips'. "
                        "You MUST call BOTH get_weather AND search tools in parallel."
                    ),
                }
            ],
            tools=[WEATHER_TOOL, SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=1024,
            stream=True,
        )

        tool_calls, chunks_count, _ = collect_streaming_tool_calls(stream)

        # At minimum, one tool call should be present
        assert len(tool_calls) >= 1, "Should have at least one streaming tool call"

        # Verify each tool call has valid JSON arguments
        for idx, tc in tool_calls.items():
            assert tc["name"], f"Tool call {idx} should have a name"
            args = json.loads(tc["arguments"])
            assert isinstance(args, dict), f"Tool call {idx} args should be valid JSON object"

        names = {tc["name"] for tc in tool_calls.values()}
        logger.info("Streaming parallel tool names: %s (count: %d)", names, len(tool_calls))

        if len(tool_calls) >= 2:
            assert "get_weather" in names
            assert "search" in names

    def test_streaming_tool_call_ids_are_unique(self, client):
        """Each tool call in streaming should have a unique ID."""
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Get weather in Tokyo and search for 'Tokyo restaurants'. Call both tools.",
                }
            ],
            tools=[WEATHER_TOOL, SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=1024,
            stream=True,
        )

        ids = set()
        for chunk in stream:
            delta = chunk.choices[0].delta if chunk.choices else None
            if delta and delta.tool_calls:
                for tc in delta.tool_calls:
                    if tc.id:
                        ids.add(tc.id)

        if len(ids) > 1:
            assert len(ids) == len(set(ids)), "Tool call IDs should be unique"


# =============================================================================
# Multi-Turn Conversation Tests
# =============================================================================


class TestDeepSeek31MultiTurn:
    """Multi-turn conversation with tool results."""

    def test_tool_result_followup(self, client):
        """Model should use tool results to form a response."""
        # Step 1: Get tool call
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        tool_call = msg.tool_calls[0]

        # Step 2: Send tool result back and get final answer
        response2 = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "What's the weather in Tokyo?"},
                {
                    "role": "assistant",
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments,
                            },
                        }
                    ],
                },
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(
                        {
                            "temperature": 22,
                            "unit": "celsius",
                            "condition": "partly cloudy",
                            "humidity": 65,
                        }
                    ),
                },
            ],
            tools=[WEATHER_TOOL],
            temperature=0,
            max_tokens=512,
        )

        msg2 = response2.choices[0].message
        assert msg2.content, "Model should respond with text after receiving tool result"
        # Model should mention the temperature or weather condition
        content_lower = msg2.content.lower()
        assert any(
            kw in content_lower for kw in ["22", "celsius", "cloudy", "weather", "tokyo"]
        ), f"Response should reference the tool result, got: {msg2.content[:200]}"
        logger.info("Multi-turn response: %s", msg2.content[:200])

    def test_multi_turn_streaming(self, client):
        """Multi-turn with tool result should work in streaming mode too."""
        # Step 1: Get tool call (non-streaming for simplicity)
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Search for 'Python tutorials'"}],
            tools=[SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        tool_call = msg.tool_calls[0]

        # Step 2: Stream the follow-up with tool result
        stream = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": "Search for 'Python tutorials'"},
                {
                    "role": "assistant",
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments,
                            },
                        }
                    ],
                },
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(
                        {
                            "results": [
                                {"title": "Learn Python", "url": "https://example.com/python"},
                                {"title": "Python Basics", "url": "https://example.com/basics"},
                            ]
                        }
                    ),
                },
            ],
            tools=[SEARCH_TOOL],
            temperature=0,
            max_tokens=512,
            stream=True,
        )

        content_parts = []
        for chunk in stream:
            delta = chunk.choices[0].delta if chunk.choices else None
            if delta and delta.content:
                content_parts.append(delta.content)

        full_content = "".join(content_parts)
        assert full_content, "Streaming follow-up should produce text content"
        logger.info("Streaming multi-turn response: %s", full_content[:200])


# =============================================================================
# Edge Cases
# =============================================================================


class TestDeepSeek31EdgeCases:
    """Edge case tests."""

    def test_specific_tool_choice(self, client):
        """tool_choice with specific function name."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Hello, how are you?"}],
            tools=ALL_TOOLS,
            tool_choice={"type": "function", "function": {"name": "get_weather"}},
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert msg.tool_calls[0].function.name == "get_weather"

    def test_tool_choice_none(self, client):
        """tool_choice=none should prevent tool calls."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "What's the weather in NYC?"}],
            tools=[WEATHER_TOOL],
            tool_choice="none",
            temperature=0,
            max_tokens=256,
        )

        msg = response.choices[0].message
        assert not msg.tool_calls, "tool_choice=none should prevent tool calls"
        assert msg.content, "Should have text content when tools are disabled"

    def test_empty_tools_list(self, client):
        """Empty tools list should work like no tools."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Hello!"}],
            tools=[],
            temperature=0,
            max_tokens=128,
        )

        msg = response.choices[0].message
        assert msg.content

    def test_max_tokens_limits_tool_output(self, client):
        """Very small max_tokens might truncate tool call output."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": "Search for 'test'"}],
            tools=[SEARCH_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=4096,
        )

        # Should still produce a valid response
        assert response.choices[0].message is not None

    def test_unicode_in_tool_arguments(self, client):
        """Tool arguments with unicode content."""
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {
                    "role": "user",
                    "content": "Translate 'こんにちは世界' to English using the translate tool.",
                }
            ],
            tools=[TRANSLATE_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        args = json.loads(msg.tool_calls[0].function.arguments)
        assert "text" in args
        logger.info("Unicode tool args: %s", args)

    def test_long_user_message(self, client):
        """Tool call with a longer user message."""
        long_msg = (
            "I'm planning a trip and need detailed information. "
            "Can you check the weather forecast for San Francisco? "
            "I need to know the temperature, humidity, wind speed, and precipitation chance. "
            "Also let me know if I should bring an umbrella or sunscreen. "
            "Please use the weather tool to get this information."
        )

        response = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": long_msg}],
            tools=[WEATHER_TOOL],
            tool_choice="required",
            temperature=0,
            max_tokens=512,
        )

        msg = response.choices[0].message
        assert msg.tool_calls
        assert_valid_tool_call(msg.tool_calls[0], "get_weather")


# =============================================================================
# Run directly
# =============================================================================

if __name__ == "__main__":
    import sys

    sys.exit(
        pytest.main(
            [__file__, "-v", "--tb=short", "-x", "--no-header", *sys.argv[1:]]
        )
    )

@key4ng

key4ng commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator Author

Known Issue: Multi-Turn Tool Call (xfail)

Multi-turn conversations with tool results currently fail with HTTP 400. The 2 xfail tests (test_tool_result_followup, test_multi_turn_streaming) track this.

Error:

Failed to apply chat template: Failed to render template:
invalid operation: tried to use + operator on unsupported types undefined and string (in chat:3)

Root cause: ChatMessage::Assistant uses #[serde_with::skip_serializing_none], so when content is None it is omitted entirely from serialized JSON. The DeepSeek V3.1 Jinja template checks message['content'] is none — but MiniJinja returns undefined for missing keys, not none, so the check fails and crashes on undefined + string.

Fix: inject "content": null for assistant messages with tool_calls and no content key in process_content_format() — verified safe across Llama 3.2, Qwen 2.5, and Mistral.

Reproduction steps
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V3.1",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"},
      {
        "role": "assistant",
        "tool_calls": [{"id": "call_123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"Tokyo\"}"}}]
      },
      {"role": "tool", "tool_call_id": "call_123", "content": "{\"temperature\": 22}"}
    ],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
    "temperature": 0,
    "max_tokens": 512
  }'

Response (HTTP 400):

{
  "error": {
    "type": "Bad Request",
    "code": "process_messages_failed",
    "message": "Failed to apply chat template: Failed to render template: invalid operation: tried to use + operator on unsupported types undefined and string (in chat:3)"
  }
}
Root cause analysis

The DeepSeek V3.1 chat template snippet from tokenizer_config.json:

{%- if message['content'] is none %}
    {{'<|tool▁calls▁begin|>...'}}
{%- else %}
    {{message['content'] + '<|tool▁calls▁begin|>...'}}
{%- endif %}
  1. ChatMessage::Assistant in crates/protocols/src/chat.rs has content: Option<MessageContent> with #[serde_with::skip_serializing_none] on the enum.
  2. When an assistant message has tool_calls but no content, serde_json::to_value() in process_content_format() omits the content key entirely — it does not produce "content": null.
  3. In MiniJinja, accessing a missing key returns undefined. Since undefined is none evaluates to false, the template takes the else branch and attempts undefined + string → crash.

Confirmed with Python transformers.AutoTokenizer.apply_chat_template:

content field DeepSeek V3.1 Llama 3.2 Qwen 2.5 Mistral v0.3
"content": null
content key absent
Proposed fix in chat_utils.rs

In model_gateway/src/routers/grpc/utils/chat_utils.rs, process_content_format():

if let Some(obj) = message_json.as_object_mut() {
    if let Some(content_value) = obj.get_mut("content") {
        transform_content_field(content_value, content_format, image_placeholder);
    }

    // Ensure assistant messages with tool_calls always have a `content` field.
    // `skip_serializing_none` omits `content` when it's `None`, but chat templates
    // (e.g. DeepSeek V3.1) check `message['content'] is none` which fails when the
    // key is absent (undefined != none in Jinja). The OpenAI convention is to send
    // `content: null`, and all major templates (Llama, Qwen, Mistral) handle this.
    if obj.get("role").and_then(|v| v.as_str()) == Some("assistant")
        && obj.contains_key("tool_calls")
        && !obj.contains_key("content")
    {
        obj.insert("content".to_string(), Value::Null);
    }
}

Safe because:

  • Only targets assistant messages with tool_calls and missing content
  • Matches the OpenAI SDK convention (content: null)
  • Verified to work with all major model chat templates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes tool-parser Tool/function call parser changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants