[Question]: Inconsistent thinking streaming pattern between Ollama and Anthropic integrations

### Question Validation

- [x] I have searched both the documentation and discord for an answer.

### Question

Hi 

While testing `stream_chat`, I noticed thinking is exposed very differently across providers.

```python
# Ollama
resp_gen = ollama_llm.stream_chat(messages)
for r in resp_gen:
    thinking_delta = r.additional_kwargs["thinking_delta"]  # incremental
    print(thinking_delta)

# Anthropic
from anthropic.types import ThinkingDelta

resp_gen = anthropic_llm.stream_chat(messages)
for r in resp_gen:
    delta = r.raw.get("delta")
    if delta and isinstance(delta, ThinkingDelta):
        print(delta.thinking)  # incremental
    # r.message.additional_kwargs["thinking"] -> full accumulated thinking
```

Key differences:

- **Ollama**: incremental thinking is in r.additional_kwargs["thinking_delta"].
- **Anthropic**: one must inspect `r.raw["delta"]` for a `ThinkingDelta`;
         `r.message.additional_kwargs["thinking"]` holds accumulated reasoning text.             

Could the API be made consistent — or could you clarify the intended pattern for streaming both text and thinking?
That would simplify unified handling across LLMs.

---
llamaindex version - 0.14.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: Inconsistent thinking streaming pattern between Ollama and Anthropic integrations #20063

Question Validation

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Inconsistent thinking streaming pattern between Ollama and Anthropic integrations #20063

Description

Question Validation

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions