Skip to content

[Question]: Inconsistent thinking streaming pattern between Ollama and Anthropic integrations #20063

@RakeshReddyKondeti

Description

@RakeshReddyKondeti

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hi

While testing stream_chat, I noticed thinking is exposed very differently across providers.

# Ollama
resp_gen = ollama_llm.stream_chat(messages)
for r in resp_gen:
    thinking_delta = r.additional_kwargs["thinking_delta"]  # incremental
    print(thinking_delta)

# Anthropic
from anthropic.types import ThinkingDelta

resp_gen = anthropic_llm.stream_chat(messages)
for r in resp_gen:
    delta = r.raw.get("delta")
    if delta and isinstance(delta, ThinkingDelta):
        print(delta.thinking)  # incremental
    # r.message.additional_kwargs["thinking"] -> full accumulated thinking

Key differences:

  • Ollama: incremental thinking is in r.additional_kwargs["thinking_delta"].
  • Anthropic: one must inspect r.raw["delta"] for a ThinkingDelta;
    r.message.additional_kwargs["thinking"] holds accumulated reasoning text.

Could the API be made consistent — or could you clarify the intended pattern for streaming both text and thinking?
That would simplify unified handling across LLMs.


llamaindex version - 0.14.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions