Skip to content

Broken tokenization for tool calls using --tool-call-parser qwen3_xml #506

@shashwatj07

Description

@shashwatj07

EPP logs show the following error:

[pod/qwen-qwe-60b1c5b9-uct-2507-gaie-epp-6bdf64bf7c-kbs9v/epp] {"level":"error","ts":"2026-04-10T01:31:41Z","caller":"tokenization/pool.go:118","msg":"Dropping tokenization task after max retries","prompt":"","retries":3,"error":"gRPC RenderChatCompletion request failed: rpc error: code = Internal desc = Render failed: \"auto\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set","stacktrace":"github.com/llm-d/llm-d-kv-cache/pkg/tokenization.(*Pool).workerLoop\n\t/go/pkg/mod/github.com/llm-d/llm-d-kv-cache@v0.7.1/pkg/tokenization/pool.go:118"}

The vLLM serve command includes --enable-auto-tool-choice and --tool-call-parser qwen3_xml

Scenario file: https://gist.github.com/shashwatj07/16cdda06e4e623587dbb046fe30728c2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions