feat: Support reasoning_content in OpenAI chat completions streaming response by tc3oliver · Pull Request #100 · ray-project/llmperf

tc3oliver · 2025-11-28T02:35:34Z

Summary

Add support for reasoning_content field in streaming responses from OpenAI-compatible APIs.

Problem

Some LLM inference engines (e.g., vLLM with reasoning models like DeepSeek-R1 or QwQ)
return streaming content in the reasoning_content field instead of content.
This causes the benchmark to incorrectly report output_tokens = 1 because
the actual generated text is not captured.

Solution

Check both content and reasoning_content fields when processing streaming chunks.
This maintains backward compatibility while adding support for reasoning models.

Testing

Tested with:

vLLM serving a 120B reasoning model (uses reasoning_content)
Ollama serving llama3:70b (uses standard content)

Both scenarios now correctly capture output tokens and metrics.

feat: Support reasoning_content in streaming response

a68c8aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support reasoning_content in OpenAI chat completions streaming response#100

feat: Support reasoning_content in OpenAI chat completions streaming response#100
tc3oliver wants to merge 1 commit intoray-project:mainfrom
tc3oliver:feat/support-reasoning-content

tc3oliver commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tc3oliver commented Nov 28, 2025

Summary

Problem

Solution

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant