feat(tool_parser): add DeepSeek V3.1 tool call parser#1006
Conversation
Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Add streaming and factory registration tests to complement existing parse_complete coverage. Signed-off-by: key4ng <rukeyang@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new DeepSeek31Parser for DeepSeek v3.1 tool-call parsing: implements complete and streaming parsing, registers and re-exports the parser, and includes integration tests covering parsing behaviors and factory/model routing. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the DeepSeek31Parser to support the DeepSeek V3.1 tool call format, including its registration in the ParserFactory and comprehensive integration tests. The review feedback identifies several critical improvements for the parser's robustness and performance: refining regex patterns to use non-greedy matching and lookaheads to prevent parsing errors, optimizing buffer management by using drain to avoid data loss and unnecessary reallocations, ensuring that parse_complete correctly captures text following tool call blocks, and avoiding redundant string cloning during incremental parsing.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/parsers/deepseek31.rs`:
- Around line 162-181: The current logic sets has_tool_call true for any chunk
containing a tool marker and returns normal_text = "" which drops any plain-text
prefix that appears before the first marker; update the code in the parser
method that builds current_text (the block using self.buffer.push_str,
current_text, has_tool_call and the StreamingParseResult return) to detect the
index of the first tool marker (use has_tool_markers or search for marker
strings like "<|tool▁calls▁begin|>" / "<|tool▁call▁begin|>"), split off and
remove the bytes before that first marker into normal_text to emit immediately,
leave the remainder (starting at the first marker) in self.buffer for tool
parsing, and then proceed to parse tool calls as before; also add a regression
test feeding a chunk shaped like
"prefix<|tool▁calls▁begin|><|tool▁call▁begin|>..." to ensure the prefix is
preserved and emitted and the tool parsing still proceeds.
- Around line 186-188: The code is using
self.partial_tool_call_regex.captures(¤t_text) against the entire buffer
which lets later/embedded calls bleed into the current match; change the logic
to only parse the front-most (earliest) call by locating the first match
anchored at the buffer start (use a regex find/anchored match or captures_iter
and pick the lowest start index) and then consume exactly that matched slice
from self.buffer before returning; apply the same front-most-consume fix to the
other block (the code around the other captures usage at lines 246-259) and add
a regression test that feeds a single chunk containing two coalesced tool calls
(e.g., "...<tool_call_end><tool_call_begin>next<tool_sep>...") to ensure both
complete calls are drained and emitted in sequence.
In `@crates/tool_parser/tests/tool_parser_deepseek31.rs`:
- Around line 224-243: The test currently only calls factory.has_parser and
registry().has_parser_for_model which can pass even if "deepseek-v3.1" routes to
the old parser; replace or augment those checks by using
ParserFactory::create_for_model("deepseek-v3.1") (or
registry().get_parser("deepseek-v3.1")) to obtain the actual parser instance and
then assert its identity or behavior (e.g., assert the parser's identifier
equals "deepseek31" or run a small V3.1-specific sample input through the
returned parser and assert the expected V3.1-specific output), ensuring the
mapping is behavioral rather than just present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 576c2cd4-0677-4a7c-aa22-12ddec381c96
📒 Files selected for processing (5)
crates/tool_parser/src/factory.rscrates/tool_parser/src/lib.rscrates/tool_parser/src/parsers/deepseek31.rscrates/tool_parser/src/parsers/mod.rscrates/tool_parser/tests/tool_parser_deepseek31.rs
Signed-off-by: key4ng <rukeyang@gmail.com>
| let argument_diff = func_args_raw | ||
| .strip_prefix(last_sent) | ||
| .unwrap_or(func_args_raw); |
There was a problem hiding this comment.
🔴 Important: End markers leak into streamed arguments when <|tool▁call▁end|> arrives in the same chunk as the final JSON content.
The partial_tool_call_regex group 2 ((.*) after <|tool▁sep|>) greedily captures everything to end-of-string, including <|tool▁call▁end|> and <|tool▁calls▁end|> tokens. Unlike the V3 parser, whose ```json\n anchor naturally stops the args capture before closing markers, V3.1's raw format has no such delimiter.
When the end marker arrives in the same chunk as the last argument bytes, func_args_raw becomes e.g. {"location": "Tokyo"}<|tool▁call▁end|>. The strip_prefix(last_sent) here succeeds, producing argument_diff = <|tool▁call▁end|>, which is then pushed to the client as argument content (lines 225-233) before the is_complete_json check runs at line 236.
Fix: strip end markers from func_args_raw before the diff computation, e.g.:
let func_args_raw = func_args_raw
.trim_end_matches("<|tool▁call▁end|>")
.trim_end_matches("<|tool▁calls▁end|>")
.trim();Or adjust the partial regex to exclude them:
(?s)<|tool▁call▁begin|>(.*)<|tool▁sep|>(.*?)(?:<|tool▁call▁end|>|$)
There was a problem hiding this comment.
will create a new pr to add the test. also validate the <|end_of_sentence|> captured by edge case test
E2E ValidationTested against a live DeepSeek V3.1 (FP8) deployment on 8x H200. Setupsglang backend (gRPC mode): python -m sglang.launch_server \
--model deepseek-ai/DeepSeek-V3.1 \
--tp 8 --trust-remote-code --port 30000 --grpc-modesmg router: ./target/release/smg \
--worker-urls grpc://localhost:30000 \
--model-path deepseek-ai/DeepSeek-V3.1 \
--tokenizer-path /home/ubuntu/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-V3.1/snapshots/c0781d039fb7a1ba2abc4add0bdc293e92d2b8db \
--tool-call-parser deepseek31 \
--port 8080Test resultFull test file: e2e_test/chat_completions/test_deepseek31_tool_calling.py"""DeepSeek V3.1 Tool Calling E2E Tests.
Comprehensive end-to-end tests for the DeepSeek V3.1 tool parser via the SMG gateway.
Tests both non-streaming and streaming modes against a live sglang backend.
Usage:
pytest e2e_test/chat_completions/test_deepseek31_tool_calling.py -v \
--base-url http://localhost:8080
Or run directly:
python e2e_test/chat_completions/test_deepseek31_tool_calling.py
"""
from __future__ import annotations
import json
import logging
import os
import openai
import pytest
logger = logging.getLogger(__name__)
BASE_URL = os.environ.get("SMG_BASE_URL", "http://localhost:8080")
MODEL = os.environ.get("SMG_MODEL", "deepseek-ai/DeepSeek-V3.1")
# =============================================================================
# Client fixture
# =============================================================================
@pytest.fixture(scope="module")
def client():
return openai.OpenAI(base_url=f"{BASE_URL}/v1", api_key="dummy")
# =============================================================================
# Tool definitions
# =============================================================================
WEATHER_TOOL = {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'San Francisco'",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit",
},
},
"required": ["location"],
},
},
}
SEARCH_TOOL = {
"type": "function",
"function": {
"name": "search",
"description": "Search for information on the web.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query string",
},
"num_results": {
"type": "integer",
"description": "Number of results to return",
},
},
"required": ["query"],
},
},
}
CALCULATOR_TOOL = {
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a mathematical expression.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression to evaluate, e.g. '2 + 3 * 4'",
},
},
"required": ["expression"],
},
},
}
TRANSLATE_TOOL = {
"type": "function",
"function": {
"name": "translate",
"description": "Translate text from one language to another.",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "Text to translate",
},
"source_language": {
"type": "string",
"description": "Source language code, e.g. 'en'",
},
"target_language": {
"type": "string",
"description": "Target language code, e.g. 'fr'",
},
},
"required": ["text", "target_language"],
},
},
}
CREATE_FILE_TOOL = {
"type": "function",
"function": {
"name": "create_file",
"description": "Create a file with given content.",
"parameters": {
"type": "object",
"properties": {
"filename": {"type": "string", "description": "Name of the file"},
"content": {"type": "string", "description": "File content"},
"overwrite": {
"type": "boolean",
"description": "Whether to overwrite if file exists",
},
},
"required": ["filename", "content"],
},
},
}
ALL_TOOLS = [WEATHER_TOOL, SEARCH_TOOL, CALCULATOR_TOOL, TRANSLATE_TOOL, CREATE_FILE_TOOL]
# =============================================================================
# Helper
# =============================================================================
def assert_valid_tool_call(tool_call, expected_name=None):
"""Assert a tool call has valid structure."""
assert tool_call.function.name, "Tool call must have a function name"
assert tool_call.function.arguments, "Tool call must have arguments"
args = json.loads(tool_call.function.arguments)
assert isinstance(args, dict), "Arguments must be a JSON object"
if expected_name:
assert tool_call.function.name == expected_name, (
f"Expected tool '{expected_name}', got '{tool_call.function.name}'"
)
return args
def collect_streaming_tool_calls(stream):
"""Collect tool call name and arguments from a streaming response."""
tool_calls = {} # index -> {name, arguments}
chunks_count = 0
finish_reason = None
for chunk in stream:
chunks_count += 1
delta = chunk.choices[0].delta if chunk.choices else None
if not delta:
continue
if chunk.choices[0].finish_reason:
finish_reason = chunk.choices[0].finish_reason
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
if idx not in tool_calls:
tool_calls[idx] = {"name": "", "arguments": ""}
if tc.function and tc.function.name:
tool_calls[idx]["name"] = tc.function.name
if tc.function and tc.function.arguments:
tool_calls[idx]["arguments"] += tc.function.arguments
return tool_calls, chunks_count, finish_reason
# =============================================================================
# Non-Streaming Tests
# =============================================================================
class TestDeepSeek31NonStreaming:
"""Non-streaming tool call tests."""
def test_single_tool_call_with_tool_choice_required(self, client):
"""Single tool call with tool_choice=required forces a tool call."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls, "Expected tool calls with tool_choice=required"
assert len(msg.tool_calls) >= 1
args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
assert "location" in args, "get_weather should have a 'location' argument"
logger.info("Tool args: %s", args)
def test_single_tool_call_auto(self, client):
"""Single tool call with tool_choice=auto — model decides to call tool."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Use the get_weather tool to check weather in Paris.",
}
],
tools=[WEATHER_TOOL],
tool_choice="auto",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
# Model should decide to call the tool when explicitly asked
if msg.tool_calls:
args = assert_valid_tool_call(msg.tool_calls[0], "get_weather")
assert "location" in args
logger.info("Auto tool call args: %s", args)
else:
logger.warning("Model chose not to call tool in auto mode (acceptable)")
def test_tool_call_finish_reason(self, client):
"""Verify finish_reason is 'tool_calls' when tools are returned."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in London?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
assert response.choices[0].finish_reason in ("tool_calls", "stop")
if response.choices[0].message.tool_calls:
assert response.choices[0].finish_reason == "tool_calls"
def test_tool_call_arguments_are_valid_json(self, client):
"""Tool call arguments must be parseable JSON objects."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Search for 'best restaurants in New York'",
}
],
tools=[SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
args = json.loads(msg.tool_calls[0].function.arguments)
assert isinstance(args, dict)
assert "query" in args, "search tool should have 'query' argument"
def test_tool_call_has_id(self, client):
"""Each tool call should have a unique ID."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Check weather in Berlin"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
assert msg.tool_calls[0].id, "Tool call should have an ID"
assert msg.tool_calls[0].type == "function"
def test_multiple_tools_available_model_picks_correct_one(self, client):
"""When multiple tools are available, model should pick the right one."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Translate 'hello' to French. Use the translate tool.",
}
],
tools=ALL_TOOLS,
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
assert msg.tool_calls[0].function.name == "translate"
args = json.loads(msg.tool_calls[0].function.arguments)
assert "text" in args or "target_language" in args
def test_parallel_tool_calls(self, client):
"""Model can return multiple tool calls in a single response."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": (
"I need two things done at once: "
"1) Get the weather in Tokyo "
"2) Search for 'Tokyo travel guide'. "
"Call both tools in parallel."
),
}
],
tools=[WEATHER_TOOL, SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=1024,
)
msg = response.choices[0].message
assert msg.tool_calls
assert len(msg.tool_calls) >= 2, (
f"Expected at least 2 tool calls for parallel request, got {len(msg.tool_calls)}"
)
names = {tc.function.name for tc in msg.tool_calls}
logger.info("Parallel tool call names: %s", names)
# Both tools should be called
assert "get_weather" in names, "Should have called get_weather"
assert "search" in names, "Should have called search"
def test_no_tool_call_when_not_needed(self, client):
"""Model should not call tools when question doesn't need them."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What is 2 + 2?"}],
tools=[WEATHER_TOOL],
tool_choice="auto",
temperature=0,
max_tokens=256,
)
msg = response.choices[0].message
# Model should answer directly without tools for simple math
# (weather tool is irrelevant to the question)
if not msg.tool_calls:
assert msg.content, "Should have text content when not calling tools"
logger.info("Model correctly answered without tools: %s", msg.content[:100])
def test_tool_call_with_complex_arguments(self, client):
"""Tool call with boolean and optional arguments."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Create a file named 'test.txt' with content 'hello world', and overwrite if it exists.",
}
],
tools=[CREATE_FILE_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
args = assert_valid_tool_call(msg.tool_calls[0], "create_file")
assert "filename" in args
assert "content" in args
logger.info("Complex tool args: %s", args)
def test_usage_stats_present(self, client):
"""Response should include usage statistics."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Check weather in NYC"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=256,
)
assert response.usage is not None
assert response.usage.prompt_tokens > 0
assert response.usage.completion_tokens > 0
assert response.usage.total_tokens > 0
assert response.usage.total_tokens == (
response.usage.prompt_tokens + response.usage.completion_tokens
)
# =============================================================================
# Streaming Tests
# =============================================================================
class TestDeepSeek31Streaming:
"""Streaming tool call tests."""
def test_streaming_single_tool_call(self, client):
"""Streaming should deliver tool call name and arguments across chunks."""
stream = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
stream=True,
)
tool_calls, chunks_count, finish_reason = collect_streaming_tool_calls(stream)
assert chunks_count > 1, "Streaming should return multiple chunks"
assert len(tool_calls) >= 1, "Should have at least one tool call"
tc = tool_calls[0]
assert tc["name"] == "get_weather", f"Expected 'get_weather', got '{tc['name']}'"
args = json.loads(tc["arguments"])
assert "location" in args
logger.info("Streaming tool args: %s", args)
def test_streaming_arguments_arrive_incrementally(self, client):
"""Arguments should arrive across multiple chunks, not all at once."""
stream = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Search for 'comprehensive guide to machine learning algorithms and their applications in industry'",
}
],
tools=[SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
stream=True,
)
arg_chunk_count = 0
for chunk in stream:
delta = chunk.choices[0].delta if chunk.choices else None
if delta and delta.tool_calls:
for tc in delta.tool_calls:
if tc.function and tc.function.arguments:
arg_chunk_count += 1
assert arg_chunk_count > 1, (
f"Arguments should be streamed in multiple chunks, got {arg_chunk_count}"
)
def test_streaming_finish_reason(self, client):
"""Streaming should end with a chunk that has finish_reason."""
stream = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Weather in London"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=256,
stream=True,
)
_, _, finish_reason = collect_streaming_tool_calls(stream)
assert finish_reason is not None, "Should have a finish_reason in the final chunk"
def test_streaming_parallel_tool_calls(self, client):
"""Streaming should handle multiple tool calls when model emits them."""
stream = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": (
"Do two things at once: "
"1) Get weather in Paris "
"2) Search for 'Paris travel tips'. "
"You MUST call BOTH get_weather AND search tools in parallel."
),
}
],
tools=[WEATHER_TOOL, SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=1024,
stream=True,
)
tool_calls, chunks_count, _ = collect_streaming_tool_calls(stream)
# At minimum, one tool call should be present
assert len(tool_calls) >= 1, "Should have at least one streaming tool call"
# Verify each tool call has valid JSON arguments
for idx, tc in tool_calls.items():
assert tc["name"], f"Tool call {idx} should have a name"
args = json.loads(tc["arguments"])
assert isinstance(args, dict), f"Tool call {idx} args should be valid JSON object"
names = {tc["name"] for tc in tool_calls.values()}
logger.info("Streaming parallel tool names: %s (count: %d)", names, len(tool_calls))
if len(tool_calls) >= 2:
assert "get_weather" in names
assert "search" in names
def test_streaming_tool_call_ids_are_unique(self, client):
"""Each tool call in streaming should have a unique ID."""
stream = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Get weather in Tokyo and search for 'Tokyo restaurants'. Call both tools.",
}
],
tools=[WEATHER_TOOL, SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=1024,
stream=True,
)
ids = set()
for chunk in stream:
delta = chunk.choices[0].delta if chunk.choices else None
if delta and delta.tool_calls:
for tc in delta.tool_calls:
if tc.id:
ids.add(tc.id)
if len(ids) > 1:
assert len(ids) == len(set(ids)), "Tool call IDs should be unique"
# =============================================================================
# Multi-Turn Conversation Tests
# =============================================================================
class TestDeepSeek31MultiTurn:
"""Multi-turn conversation with tool results."""
def test_tool_result_followup(self, client):
"""Model should use tool results to form a response."""
# Step 1: Get tool call
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
tool_call = msg.tool_calls[0]
# Step 2: Send tool result back and get final answer
response2 = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{
"role": "assistant",
"tool_calls": [
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments,
},
}
],
},
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(
{
"temperature": 22,
"unit": "celsius",
"condition": "partly cloudy",
"humidity": 65,
}
),
},
],
tools=[WEATHER_TOOL],
temperature=0,
max_tokens=512,
)
msg2 = response2.choices[0].message
assert msg2.content, "Model should respond with text after receiving tool result"
# Model should mention the temperature or weather condition
content_lower = msg2.content.lower()
assert any(
kw in content_lower for kw in ["22", "celsius", "cloudy", "weather", "tokyo"]
), f"Response should reference the tool result, got: {msg2.content[:200]}"
logger.info("Multi-turn response: %s", msg2.content[:200])
def test_multi_turn_streaming(self, client):
"""Multi-turn with tool result should work in streaming mode too."""
# Step 1: Get tool call (non-streaming for simplicity)
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Search for 'Python tutorials'"}],
tools=[SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
tool_call = msg.tool_calls[0]
# Step 2: Stream the follow-up with tool result
stream = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "Search for 'Python tutorials'"},
{
"role": "assistant",
"tool_calls": [
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments,
},
}
],
},
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(
{
"results": [
{"title": "Learn Python", "url": "https://example.com/python"},
{"title": "Python Basics", "url": "https://example.com/basics"},
]
}
),
},
],
tools=[SEARCH_TOOL],
temperature=0,
max_tokens=512,
stream=True,
)
content_parts = []
for chunk in stream:
delta = chunk.choices[0].delta if chunk.choices else None
if delta and delta.content:
content_parts.append(delta.content)
full_content = "".join(content_parts)
assert full_content, "Streaming follow-up should produce text content"
logger.info("Streaming multi-turn response: %s", full_content[:200])
# =============================================================================
# Edge Cases
# =============================================================================
class TestDeepSeek31EdgeCases:
"""Edge case tests."""
def test_specific_tool_choice(self, client):
"""tool_choice with specific function name."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Hello, how are you?"}],
tools=ALL_TOOLS,
tool_choice={"type": "function", "function": {"name": "get_weather"}},
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
assert msg.tool_calls[0].function.name == "get_weather"
def test_tool_choice_none(self, client):
"""tool_choice=none should prevent tool calls."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=[WEATHER_TOOL],
tool_choice="none",
temperature=0,
max_tokens=256,
)
msg = response.choices[0].message
assert not msg.tool_calls, "tool_choice=none should prevent tool calls"
assert msg.content, "Should have text content when tools are disabled"
def test_empty_tools_list(self, client):
"""Empty tools list should work like no tools."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Hello!"}],
tools=[],
temperature=0,
max_tokens=128,
)
msg = response.choices[0].message
assert msg.content
def test_max_tokens_limits_tool_output(self, client):
"""Very small max_tokens might truncate tool call output."""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Search for 'test'"}],
tools=[SEARCH_TOOL],
tool_choice="required",
temperature=0,
max_tokens=4096,
)
# Should still produce a valid response
assert response.choices[0].message is not None
def test_unicode_in_tool_arguments(self, client):
"""Tool arguments with unicode content."""
response = client.chat.completions.create(
model=MODEL,
messages=[
{
"role": "user",
"content": "Translate 'こんにちは世界' to English using the translate tool.",
}
],
tools=[TRANSLATE_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
args = json.loads(msg.tool_calls[0].function.arguments)
assert "text" in args
logger.info("Unicode tool args: %s", args)
def test_long_user_message(self, client):
"""Tool call with a longer user message."""
long_msg = (
"I'm planning a trip and need detailed information. "
"Can you check the weather forecast for San Francisco? "
"I need to know the temperature, humidity, wind speed, and precipitation chance. "
"Also let me know if I should bring an umbrella or sunscreen. "
"Please use the weather tool to get this information."
)
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": long_msg}],
tools=[WEATHER_TOOL],
tool_choice="required",
temperature=0,
max_tokens=512,
)
msg = response.choices[0].message
assert msg.tool_calls
assert_valid_tool_call(msg.tool_calls[0], "get_weather")
# =============================================================================
# Run directly
# =============================================================================
if __name__ == "__main__":
import sys
sys.exit(
pytest.main(
[__file__, "-v", "--tb=short", "-x", "--no-header", *sys.argv[1:]]
)
) |
Known Issue: Multi-Turn Tool Call (xfail)Multi-turn conversations with tool results currently fail with HTTP 400. The 2 xfail tests ( Error: Root cause: Fix: inject Reproduction stepscurl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-V3.1",
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"},
{
"role": "assistant",
"tool_calls": [{"id": "call_123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"Tokyo\"}"}}]
},
{"role": "tool", "tool_call_id": "call_123", "content": "{\"temperature\": 22}"}
],
"tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
"temperature": 0,
"max_tokens": 512
}'Response (HTTP 400): {
"error": {
"type": "Bad Request",
"code": "process_messages_failed",
"message": "Failed to apply chat template: Failed to render template: invalid operation: tried to use + operator on unsupported types undefined and string (in chat:3)"
}
}Root cause analysisThe DeepSeek V3.1 chat template snippet from {%- if message['content'] is none %}
{{'<|tool▁calls▁begin|>...'}}
{%- else %}
{{message['content'] + '<|tool▁calls▁begin|>...'}}
{%- endif %}
Confirmed with Python
Proposed fix in chat_utils.rsIn if let Some(obj) = message_json.as_object_mut() {
if let Some(content_value) = obj.get_mut("content") {
transform_content_field(content_value, content_format, image_placeholder);
}
// Ensure assistant messages with tool_calls always have a `content` field.
// `skip_serializing_none` omits `content` when it's `None`, but chat templates
// (e.g. DeepSeek V3.1) check `message['content'] is none` which fails when the
// key is absent (undefined != none in Jinja). The OpenAI convention is to send
// `content: null`, and all major templates (Llama, Qwen, Mistral) handle this.
if obj.get("role").and_then(|v| v.as_str()) == Some("assistant")
&& obj.contains_key("tool_calls")
&& !obj.contains_key("content")
{
obj.insert("content".to_string(), Value::Null);
}
}Safe because:
|
Description
Problem
DeepSeek V3.1 uses a different tool call format than V3 — no
functiontype prefix and no markdown code blocks around JSON arguments. The existingdeepseekparser cannot parse V3.1 tool calls.Solution
Add a new
deepseek31tool call parser that handles the V3.1 format:<|tool▁call▁begin|>{name}<|tool▁sep|>{json_args}<|tool▁call▁end|>Changes
DeepSeek31Parserwithparse_completeand streamingparse_incrementalsupportdeepseek31parser in the factory with model mappings fordeepseek-v3.1*anddeepseek-ai/DeepSeek-V3.1*Test Plan
E2E validated against live DeepSeek V3.1 (FP8) on 8x H200 via sglang gRPC + smg gateway — both non-streaming and streaming tool calls work correctly.
Checklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
New Features
Tests