-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Question Validation
- I have searched both the documentation and discord for an answer.
Question
Hi
While testing stream_chat
, I noticed thinking is exposed very differently across providers.
# Ollama
resp_gen = ollama_llm.stream_chat(messages)
for r in resp_gen:
thinking_delta = r.additional_kwargs["thinking_delta"] # incremental
print(thinking_delta)
# Anthropic
from anthropic.types import ThinkingDelta
resp_gen = anthropic_llm.stream_chat(messages)
for r in resp_gen:
delta = r.raw.get("delta")
if delta and isinstance(delta, ThinkingDelta):
print(delta.thinking) # incremental
# r.message.additional_kwargs["thinking"] -> full accumulated thinking
Key differences:
- Ollama: incremental thinking is in r.additional_kwargs["thinking_delta"].
- Anthropic: one must inspect
r.raw["delta"]
for aThinkingDelta
;
r.message.additional_kwargs["thinking"]
holds accumulated reasoning text.
Could the API be made consistent — or could you clarify the intended pattern for streaming both text and thinking?
That would simplify unified handling across LLMs.
llamaindex version - 0.14.4
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested