Skip to content

Conversation

@aantn
Copy link
Collaborator

@aantn aantn commented Dec 21, 2025

Summary

  • add a dedicated AI reasoning stream event for chat SSE payloads
  • emit reasoning content from streaming LLM responses before returning chat results

Testing

  • Not run (not requested)

Codex Task

Summary by CodeRabbit

  • New Features
    • Streaming now exposes the AI's intermediate reasoning in real time alongside final messages, so users can observe the model's thought process as responses generate.
    • Streaming order improved so reasoning appears earlier and is included with final answers for clearer, more coherent output during live responses.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 21, 2025

Walkthrough

Adds extraction and streaming emission of LLM "reasoning" content: a new AI_REASONING stream event is introduced and call_stream now yields reasoning alongside or before AI messages and tool-call events, using extracted message content when emitting ANSWER_END.

Changes

Cohort / File(s) Summary
Stream event type addition
holmes/utils/stream.py
Added AI_REASONING = "ai_reasoning" enum member to StreamEvents.
Reasoning extraction & streaming
holmes/core/tool_calling_llm.py
Updated call_stream to extract reasoning_content, emit a new AI_REASONING stream event with metadata, emit AI_MESSAGE including both content and reasoning when present, and use the extracted message for ANSWER_END when no tool calls occur.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ToolCallingLLM
  participant LLM
  participant StreamEmitter
  participant ToolRunner

  Client->>ToolCallingLLM: request call_stream(...)
  ToolCallingLLM->>LLM: send prompt / await streaming response
  LLM-->>ToolCallingLLM: stream chunks (content, reasoning_content, tool_call hints)
  alt reasoning_content present
    ToolCallingLLM->>StreamEmitter: emit AI_REASONING(reasoning_content, metadata)
  end
  alt content present
    ToolCallingLLM->>StreamEmitter: emit AI_MESSAGE(content + reasoning, metadata)
  end
  alt tool call required
    ToolCallingLLM->>StreamEmitter: emit TOOL_CALL(...)
    ToolCallingLLM->>ToolRunner: invoke tool(...)
    ToolRunner-->>ToolCallingLLM: tool result
    ToolCallingLLM->>StreamEmitter: emit TOOL_RESPONSE(...)
  else no tool call
    ToolCallingLLM->>StreamEmitter: emit ANSWER_END(final_message)
  end
  StreamEmitter-->>Client: forward stream events
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • moshemorad
  • arikalon1

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Add reasoning streaming event' directly and clearly summarizes the main change: adding a new AI_REASONING stream event to handle reasoning content in streaming LLM responses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Comment @coderabbitai help to get the list of available commands and usage tips.

@aantn aantn marked this pull request as draft December 25, 2025 05:33
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
holmes/core/tool_calling_llm.py (1)

938-946: LGTM: AI_MESSAGE event properly handles reasoning and content together.

The event correctly includes both content and reasoning fields, allowing the frontend to display them together when tool calls are present. The condition if reasoning or message: appropriately prevents emitting empty events.

Optional: Consider consistency with non-streaming version

The non-streaming version (Line 448) checks text_response and text_response.strip() to avoid logging whitespace-only content. For consistency, consider:

-if reasoning or message:
+if reasoning or (message and message.strip()):

This prevents emitting AI_MESSAGE events when message contains only whitespace. However, this is a minor edge case and the current implementation is acceptable.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd8ee21 and 425c09e.

📒 Files selected for processing (1)
  • holmes/core/tool_calling_llm.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

  • holmes/core/tool_calling_llm.py
holmes/core/tool_calling_llm.py

📄 CodeRabbit inference engine (CLAUDE.md)

holmes/core/tool_calling_llm.py: Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration
Implement structured tool calling with automatic retry and error handling

Files:

  • holmes/core/tool_calling_llm.py
🧠 Learnings (2)
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration

Applied to files:

  • holmes/core/tool_calling_llm.py
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Implement structured tool calling with automatic retry and error handling

Applied to files:

  • holmes/core/tool_calling_llm.py
🧬 Code graph analysis (1)
holmes/core/tool_calling_llm.py (1)
holmes/utils/stream.py (2)
  • StreamMessage (28-30)
  • StreamEvents (16-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: llm_evals
  • GitHub Check: build
🔇 Additional comments (3)
holmes/core/tool_calling_llm.py (3)

915-916: LGTM: Clear variable extraction for reasoning and message content.

Extracting reasoning_content and message content into local variables improves readability and prepares them for use in multiple stream events. The use of getattr with a default value safely handles cases where reasoning_content may not be present.


927-936: LGTM: Proper use of extracted message in ANSWER_END event.

Using the extracted message variable instead of accessing response_message.content directly improves code readability and is consistent with the non-streaming implementation.


918-925: LGTM: Correctly implements AI reasoning stream event with cross-provider safety.

The AI_REASONING event emission uses defensive attribute access via getattr(response_message, "reasoning_content", None), ensuring compatibility across all LiteLLM providers. The event is emitted only when reasoning exists, gracefully degrading for providers that don't support reasoning content. This pattern aligns with the non-streaming implementation and requires no additional provider-specific handling.

@github-actions
Copy link
Contributor

Results of HolmesGPT evals

  • ask_holmes: 26/37 test cases were successful, 2 regressions, 6 setup failures, 3 mock failures
Test suite Test case Status
ask 01_how_many_pods 🚧
ask 02_what_is_wrong_with_pod 🚧
ask 04_related_k8s_events
ask 05_image_version
ask 09_crashpod
ask 10_image_pull_backoff
ask 110_k8s_events_image_pull
ask 11_init_containers
ask 13a_pending_node_selector_basic
ask 14_pending_resources
ask 15_failed_readiness_probe
ask 163_compaction_follow_up
ask 17_oom_kill 🚧
ask 18_oom_kill_from_issues_history
ask 19_detect_missing_app_details
ask 20_long_log_file_search
ask 24_misconfigured_pvc
ask 24a_misconfigured_pvc_basic
ask 28_permissions_error 🚧
ask 39_failed_toolset
ask 41_setup_argo
ask 42_dns_issues_steps_new_tools
ask 43_current_datetime_from_prompt
ask 45_fetch_deployment_logs_simple
ask 51_logs_summarize_errors
ask 53_logs_find_term
ask 54_not_truncated_when_getting_pods
ask 59_label_based_counting 🚧
ask 60_count_less_than
ask 61_exact_match_counting 🚧
ask 63_fetch_error_logs_no_errors
ask 79_configmap_mount_issue
ask 83_secret_not_found
ask 86_configmap_like_but_secret
ask 93_calling_datadog[0] 🔧
ask 93_calling_datadog[1] 🔧
ask 93_calling_datadog[2] 🔧

Legend

  • ✅ the test was successful
  • :minus: the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • 🚫 the test was throttled by API rate limits/overload
  • ❌ the test failed and should be fixed before merging the PR

@github-actions
Copy link
Contributor

Dev Docker images are ready for this commit:

Use this tag to pull the image for testing.

⚠️ Temporary images are deleted after 30 days. Copy to a permanent registry before using them:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:425c09e
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:425c09e me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:425c09e
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:425c09e

Patch Helm values in one line (choose the chart you use):

  • HolmesGPT chart:
helm upgrade --install holmesgpt ./helm/holmes \
  --set registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set image=holmes-dev:425c09e
  • Robusta wrapper chart:
helm upgrade --install robusta robusta/robusta \
  --reuse-values \
  --set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.image=holmes-dev:425c09e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants