[improvement][Agent][stream thinking tokens through streaming_callback and arun_stream]

## Summary

`Agent` currently swallows reasoning/thinking deltas from the LLM stream and only flushes them to the console as a single rich panel after the thinking phase ends. Integrators using `streaming_callback`, `arun_stream`, or `run_stream` cannot surface thinking tokens in real time — they only ever see content tokens.

We should stream thinking tokens through the same callback surface as content tokens, with a clear way to distinguish them.

## Today's behavior

`swarms/structs/agent.py:4036-4090` — `_yield_only_content_chunks`:

```python
reasoning = getattr(delta, \"reasoning_content\", None)
if reasoning:
    thinking_parts.append(reasoning)
    continue  # swallow the thinking chunk; don't pass to content stream

# First non-thinking chunk — flush accumulated thinking
if thinking_parts and not thinking_displayed:
    if self.print_on:
        formatter.print_thinking_panel(\"\".join(thinking_parts), title=...)
    thinking_displayed = True
```

The reasoning deltas:
- Never reach `streaming_callback`.
- Never reach `arun_stream` / `run_stream` consumers.
- Are batched, so even console users see the thinking as one block at the end of the thinking phase, not as it's produced.

This means a dashboard, web UI, or terminal renderer integrating against \`Agent\` cannot show \"thinking in progress\" the way Claude.ai / OpenAI playground / Anthropic Console do.

## Repro

```python
from swarms import Agent

agent = Agent(
    agent_name=\"Reasoner\",
    model_name=\"claude-sonnet-4-6\",
    thinking_tokens=2000,
    streaming_callback=lambda tok: print(repr(tok)),
)
agent.run(\"Solve: a chicken and a half lays an egg and a half in a day and a half.\")
# Expected: callback fires for thinking tokens AND content tokens, distinguishably.
# Actual:   callback fires only for content tokens. Thinking is invisible to the callback.
```

Same gap exists for `arun_stream` / `run_stream` — they only yield content tokens.

## Proposed design

**Option A (preferred): tagged events.** Change `streaming_callback` to optionally accept a structured event dict, and switch `arun_stream` / `run_stream` to yield events by default when an opt-in flag is set:

```python
# Token event
{\"type\": \"thinking\", \"token\": \"...\"}
{\"type\": \"content\",  \"token\": \"...\"}

# Phase boundaries (optional but useful)
{\"type\": \"thinking_start\"}
{\"type\": \"thinking_end\",   \"text\": \"<full thinking>\"}
{\"type\": \"content_start\"}
{\"type\": \"content_end\",    \"text\": \"<full content>\"}
```

Preserve back-compat: if the callback signature is `Callable[[str], None]`, keep delivering only content tokens (today's behavior). If it's `Callable[[dict], None]` (detect via `inspect.signature`) or the user passes `streaming_events=True`, deliver tagged events.

**Option B: separate `thinking_callback`.** Add a second kwarg:

```python
agent = Agent(
    ...,
    streaming_callback=on_content_token,
    thinking_callback=on_thinking_token,
)
```

Simpler to add, no signature detection, but doesn't generalize to `arun_stream`/`run_stream` cleanly.

I lean toward **Option A** because it composes with the existing `arun_stream(with_events=True)` pattern already established in `AgentRearrange` (`swarms/structs/agent_rearrange.py:1105-1129`) — same event shape, just add `thinking` / `thinking_start` / `thinking_end` types.

## Acceptance criteria

- A reasoning model (`claude-sonnet-4-6` with `thinking_tokens=...`, or an OpenAI o-series model) streams thinking deltas to the registered callback in real time, one chunk at a time, before the first content token arrives.
- Thinking tokens are distinguishable from content tokens in the callback payload.
- `arun_stream(with_events=True)` yields `{\"type\": \"thinking\", \"token\": ...}` events for reasoning deltas alongside the existing content events.
- The console rich-panel UX for `print_on=True` is preserved (or rendered incrementally — bonus).
- Back-compat: existing `streaming_callback=lambda tok: ...` integrations that only care about content keep working without code changes.

## Notes

- `_yield_only_content_chunks` (`agent.py:4036`) is the natural place to fire thinking events before swallowing the chunk. Pass the callback / event-sink through from `call_llm` (`agent.py:4092`).
- Reasoning content lives at `delta.reasoning_content` per LiteLLM; same accessor already used at L4056.
- `AgentRearrange.arun_stream(with_events=True)` already returns `agent_start` / `token` / `agent_end` events — extending the same shape with `thinking_start` / `thinking` / `thinking_end` keeps the multi-agent streaming layer consistent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[improvement][Agent][stream thinking tokens through streaming_callback and arun_stream] #1621

Summary

Today's behavior

Repro

Proposed design

Acceptance criteria

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[improvement][Agent][stream thinking tokens through streaming_callback and arun_stream] #1621

Description

Summary

Today's behavior

Repro

Proposed design

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions