fix-openai-toolcall-after-thinking #20333#20725
fix-openai-toolcall-after-thinking #20333#20725martinalupini wants to merge 5 commits intorun-llama:mainfrom
Conversation
There was a problem hiding this comment.
I was taking a look at the OpenAI responses API reference for creating a response, and a ResponseReasoningItem is supported as an input. I would prefer that we adapt the store=True/False behavior and use the ResponseInputItem as input if store = True, rather than dropping the reasoning :)
|
Hi @AstraBert Thanks for the suggestion. After investigating further, the issue turns out not to be only related to Even when [
{ "type": "reasoning", ... },
{ "role": "assistant", "content": ... }
]However, our current implementation, when both reasoning and tool calls are present, returns: [
{ "type": "reasoning", ... },
{ "type": "function_call", ... }
]The problem is that a
So the root cause is the structure of the returned sequence, not just persistence. A structurally valid alternative would be to return an assistant message that contains the tool calls, instead of returning standalone elif tool_calls:
assistant_message = {
"role": "assistant",
"content": None,
"tool_calls": tool_calls
}
if reasoning:
return [*reasoning, assistant_message]
return assistant_messageinstead of the current: elif tool_calls:
return [*reasoning, *tool_calls]But this seems more extreme and invasive than dropping the reasoning, which is useless in conversational history. Let me know what you think about this alternative or if you have other suggestions. Thank you for your suggestions tho! :) |
|
The thing I am not so sure about is that reasoning items are useless to the conversation history. OpenAI API reference (linked above) describes the ReasoningItem as: "A description of the chain of thought used by a reasoning model while generating a response. Be sure to include these items in your input to the Responses API for subsequent turns of a conversation if you are manually managing context". My feeling (I did the implementation of the reasoning-to-thinking-block and vice versa in the first place) is that we are missing some critical pieces when we collect outputs from the Responses API, critical pieces that would solve this issue without having to completely drop the thinking: if you want to try and take a look at that, feel free, otherwise I am more than happy to take this on :) |
|
Thank you again for your feedback. Your point that we are missing some pieces when collecting the outputs from the Responses API definitely makes sense. I also understand your perspective on not removing the reasoning. In the meantime, I updated the code to propagate the Regarding the case where both reasoning and tool calls are present, I found an example in the official documentation: [
{
"id": "rs_6890ed2b6374819dbbff5353e6664ef103f4db9848be4829",
"type": "reasoning",
"content": [],
"summary": []
},
{
"id": "ctc_6890ed2f32e8819daa62bef772b8c15503f4db9848be4829",
"type": "custom_tool_call",
"status": "completed",
"call_id": "call_pmlLjmvG33KJdyVdC4MVdk5N",
"input": "4 + 4",
"name": "math_exp"
}
] |
Description
Fixes #20333
This PR fixes an issue in
OpenAIResponseswhere reasoning items were serializedas ID references inside
to_openai_responses_message_dict().When
store=False, reasoning items are not persisted server-side,causing subsequent tool calls referencing those IDs to fail with a 400 error.
When
store=True, reasoning items were not structured according to theResponses API requirements, leading to validation errors.
The fix omits reasoning items when converting a ChatMessage to an OpenAI message dict,
preventing invalid ID references and allowing tool calls to work correctly
after a reasoning step. The motivation behind this choice is that reasoning items represent internal model artifacts and are
not part of the conversational history. They should not be propagated
across requests or re-injected into the input history.
Behavior Change
This PR updates the serialization logic so that reasoning blocks are
completely ignored during conversion to the OpenAI message dict.
New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods