fix-openai-toolcall-after-thinking #20333 by martinalupini · Pull Request #20725 · run-llama/llama_index

martinalupini · 2026-02-17T22:51:16Z

Description

Fixes #20333

This PR fixes an issue in OpenAIResponses where reasoning items were serialized
as ID references inside to_openai_responses_message_dict().

When store=False, reasoning items are not persisted server-side,
causing subsequent tool calls referencing those IDs to fail with a 400 error.
When store=True, reasoning items were not structured according to the
Responses API requirements, leading to validation errors.

The fix omits reasoning items when converting a ChatMessage to an OpenAI message dict,
preventing invalid ID references and allowing tool calls to work correctly
after a reasoning step. The motivation behind this choice is that reasoning items represent internal model artifacts and are
not part of the conversational history. They should not be propagated
across requests or re-injected into the input history.

Behavior Change

This PR updates the serialization logic so that reasoning blocks are
completely ignored during conversion to the OpenAI message dict.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

AstraBert

I was taking a look at the OpenAI responses API reference for creating a response, and a ResponseReasoningItem is supported as an input. I would prefer that we adapt the store=True/False behavior and use the ResponseInputItem as input if store = True, rather than dropping the reasoning :)

martinalupini · 2026-02-18T14:25:34Z

Hi @AstraBert

Thanks for the suggestion.

After investigating further, the issue turns out not to be only related to store=True/False, but to the sequence requirements imposed by the Responses API when reasoning and tool calls are combined.

Even when store=True, the problem remains because a reasoning item must be immediately followed by the assistant item it refers to. The API expects a structure like:

[
{ "type": "reasoning", ... },
{ "role": "assistant", "content": ... }
]

However, our current implementation, when both reasoning and tool calls are present, returns:

[
{ "type": "reasoning", ... },
{ "type": "function_call", ... }
]

The problem is that a function_call item is not considered a valid assistant message following a reasoning item. As a result, even with store=True, the API raises:

"Item 'rs_...' of type 'reasoning' was provided without its required following item."

So the root cause is the structure of the returned sequence, not just persistence.

A structurally valid alternative would be to return an assistant message that contains the tool calls, instead of returning standalone function_call items. For example, in the function to_openai_responses_message_dicts() from llama_index.llms.openai.utils:

elif tool_calls:
    assistant_message = {
        "role": "assistant",
        "content": None,
        "tool_calls": tool_calls
    }
    if reasoning:
        return [*reasoning, assistant_message]
    return assistant_message

instead of the current:

elif tool_calls:
    return [*reasoning, *tool_calls]

But this seems more extreme and invasive than dropping the reasoning, which is useless in conversational history.

Let me know what you think about this alternative or if you have other suggestions. Thank you for your suggestions tho! :)

AstraBert · 2026-02-19T11:12:27Z

The thing I am not so sure about is that reasoning items are useless to the conversation history. OpenAI API reference (linked above) describes the ReasoningItem as: "A description of the chain of thought used by a reasoning model while generating a response. Be sure to include these items in your input to the Responses API for subsequent turns of a conversation if you are manually managing context". My feeling (I did the implementation of the reasoning-to-thinking-block and vice versa in the first place) is that we are missing some critical pieces when we collect outputs from the Responses API, critical pieces that would solve this issue without having to completely drop the thinking: if you want to try and take a look at that, feel free, otherwise I am more than happy to take this on :)

martinalupini · 2026-02-19T14:08:54Z

Thank you again for your feedback.

Your point that we are missing some pieces when collecting the outputs from the Responses API definitely makes sense. I also understand your perspective on not removing the reasoning.
Another aspect that might be worth considering is a possible structural issue related to how items are returned and in which sequence, as I mentioned in my previous message. What do you think about that?

In the meantime, I updated the code to propagate the store information and to explicitly omit reasoning items when store=False, to avoid the 400 error caused by non-persisted IDs. I left the code unchanged in its original form when store=True.

Regarding the case where both reasoning and tool calls are present, I found an example in the official documentation:

[
  {
    "id": "rs_6890ed2b6374819dbbff5353e6664ef103f4db9848be4829",
    "type": "reasoning",
    "content": [],
    "summary": []
  },
  {
    "id": "ctc_6890ed2f32e8819daa62bef772b8c15503f4db9848be4829",
    "type": "custom_tool_call",
    "status": "completed",
    "call_id": "call_pmlLjmvG33KJdyVdC4MVdk5N",
    "input": "4 + 4",
    "name": "math_exp"
  }
]

martinalupini added 3 commits February 17, 2026 23:10

fix run-llama#20333

52874f7

fix run-llama#20333

b6951b0

Merge remote-tracking branch 'origin/main'

968bf04

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 17, 2026

AstraBert reviewed Feb 18, 2026

View reviewed changes

martinalupini and others added 2 commits February 19, 2026 14:13

Merge branch 'run-llama:main' into main

9406bfa

fix run-llama#20333 when store=False

ad80229

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix-openai-toolcall-after-thinking #20333#20725

fix-openai-toolcall-after-thinking #20333#20725
martinalupini wants to merge 5 commits intorun-llama:mainfrom
martinalupini:main

martinalupini commented Feb 17, 2026

Uh oh!

AstraBert left a comment •

edited

Loading

Uh oh!

martinalupini commented Feb 18, 2026

Uh oh!

AstraBert commented Feb 19, 2026 •

edited

Loading

Uh oh!

martinalupini commented Feb 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

martinalupini commented Feb 17, 2026

Description

Behavior Change

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

AstraBert left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinalupini commented Feb 18, 2026

Uh oh!

AstraBert commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinalupini commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

AstraBert left a comment •

edited

Loading

AstraBert commented Feb 19, 2026 •

edited

Loading

martinalupini commented Feb 19, 2026 •

edited

Loading