Skip to content

feat(sdk): add ask_oracle tool#3673

Open
enyst wants to merge 3 commits into
mainfrom
feat/ask-oracle-tool
Open

feat(sdk): add ask_oracle tool#3673
enyst wants to merge 3 commits into
mainfrom
feat/ask-oracle-tool

Conversation

@enyst

@enyst enyst commented Jun 11, 2026

Copy link
Copy Markdown
Member

HUMAN:
This PR proposes an Oracle tool, for the agent to ask a more capable LLM when it encounters a difficulty, when it needs a second opinion, or when the user tells it to.

  • A human has tested these changes.

AGENT:

Why

Agents sometimes need a second opinion from a stronger or more specialized saved LLM profile without permanently switching the active conversation profile. This adds a minimal ask_oracle tool powered by an OpenHandsAgentSettings.oracle_llm_profile profile name so the agent can consult that Oracle profile statelessly and continue with its current LLM.

Summary

  • Add a built-in ask_oracle tool that asks a saved Oracle LLM profile for stateless second-opinion guidance.
  • Add OpenHandsAgentSettings.oracle_llm_profile so users can declare the saved profile that powers the Oracle tool.
  • Add unit coverage, settings schema coverage, an example, and temporary .pr/ evidence for reviewers.

Closes #3672.

Evidence

The .pr/ directory is intentional. Per this repository's PR artifact policy, temporary design notes, live-test evidence, JSON results, and reviewer-facing validation summaries that should not merge to main belong under .pr/ during review. The source for that policy is the repository guidance in AGENTS.md, section PR_ARTIFACTS: it says .pr/ is for PR-specific documents/scripts/artifacts, that reviewers are notified when it exists, and that the directory is automatically removed by workflow when the PR is approved. In other words, approving this PR will not merge the .pr/ artifacts; approval triggers the cleanup workflow to remove them before merge.

What the evidence proves:

  • .pr/ask_oracle_live_validation.json shows a live run where the regular profile used OpenAI direct openai/gpt-5-nano, the Oracle profile used litellm_proxy/openai/gpt-5-mini through https://llm-proxy.eval.all-hands.dev, and ask_oracle returned a successful non-error response from the Oracle.
  • .pr/ask_oracle_live_validation.py is the exact script used to create that JSON. It creates an isolated temporary profile store, saves an oracle profile, builds OpenHandsAgentSettings(oracle_llm_profile="oracle"), executes ask_oracle, records the response, and removes the temporary profile store in finally.
  • .pr/ask_oracle_test_results.json records the targeted pytest, example pytest, pre-commit, and live-validation commands/results.
  • .pr/ask_oracle_validation_summary.md summarizes the behavior for reviewers: the tool consults the saved Oracle profile statelessly, sends only the Oracle system prompt plus the agent's question/context (no conversation history or tools), and does not switch the active conversation LLM.

How to Test

  • uv run pre-commit run --files openhands-sdk/openhands/sdk/settings/model.py openhands-sdk/openhands/sdk/tool/builtins/__init__.py openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py tests/sdk/tool/test_ask_oracle.py tests/sdk/test_settings.py tests/examples/test_examples.py examples/01_standalone_sdk/54_ask_oracle_tool/main.py .pr/ask_oracle_live_validation.py .pr/ask_oracle_live_validation.json .pr/ask_oracle_test_results.json .pr/ask_oracle_validation_summary.md
  • uv run pytest tests/sdk/tool/test_ask_oracle.py tests/sdk/tool/test_builtins.py tests/sdk/test_settings.py::test_llm_agent_settings_export_schema_groups_sections tests/examples/test_examples.py::test_directory_example_is_discovered
  • uv run pytest tests/examples/test_examples.py --run-examples -k 54_ask_oracle_tool
  • CI=true uv run python -m pytest -q tests/sdk
  • Live validation in .pr/ask_oracle_live_validation.json using openai/gpt-5-nano as the regular profile and litellm_proxy/openai/gpt-5-mini through the eval proxy as the Oracle profile.

This PR was created by an AI agent (OpenHands) on behalf of the user.

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:ed274a8-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-ed274a8-python \
  ghcr.io/openhands/agent-server:ed274a8-python

All tags pushed for this build

ghcr.io/openhands/agent-server:ed274a8-golang-amd64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-golang-amd64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-golang-amd64
ghcr.io/openhands/agent-server:ed274a8-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:ed274a8-golang-arm64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-golang-arm64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-golang-arm64
ghcr.io/openhands/agent-server:ed274a8-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:ed274a8-java-amd64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-java-amd64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-java-amd64
ghcr.io/openhands/agent-server:ed274a8-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:ed274a8-java-arm64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-java-arm64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-java-arm64
ghcr.io/openhands/agent-server:ed274a8-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:ed274a8-python-amd64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-python-amd64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-python-amd64
ghcr.io/openhands/agent-server:ed274a8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:ed274a8-python-arm64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-python-arm64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-python-arm64
ghcr.io/openhands/agent-server:ed274a8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:ed274a8-golang
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-golang
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-golang
ghcr.io/openhands/agent-server:ed274a8-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:ed274a8-java
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-java
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-java
ghcr.io/openhands/agent-server:ed274a8-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:ed274a8-python
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-python
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-python
ghcr.io/openhands/agent-server:ed274a8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., ed274a8-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., ed274a8-python-amd64) are also available if needed

@enyst enyst added the integration-test Runs the integration tests and comments the results label Jun 11, 2026 — with OpenHands AI
@github-actions

Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions

Copy link
Copy Markdown
Contributor

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $1.51
Models Tested: 4
Timestamp: 2026-06-11 20:49:27 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_minimax_MiniMax_M2.7 100.0% 8/8 1 9 $0.00 355,592
litellm_proxy_openai_gpt_5.5 100.0% 9/9 0 9 $0.87 279,716
litellm_proxy_gemini_3.1_pro_preview 100.0% 9/9 0 9 $0.62 339,013
litellm_proxy_deepseek_deepseek_v4_flash 100.0% 8/8 1 9 $0.03 373,884

📋 Detailed Results

litellm_proxy_minimax_MiniMax_M2.7

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.00
  • Token Usage: prompt: 351,045, completion: 4,547, cache_read: 255,135, reasoning: 162
  • Run Suffix: litellm_proxy_minimax_MiniMax_M2.7_08d4edd_minimax_m2_7_run_N9_20260611_204738
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_openai_gpt_5.5

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.87
  • Token Usage: prompt: 275,027, completion: 4,689, cache_read: 144,384, reasoning: 1,674
  • Run Suffix: litellm_proxy_openai_gpt_5.5_08d4edd_gpt_5_5_run_N9_20260611_204729

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.62
  • Token Usage: prompt: 334,391, completion: 4,622, cache_read: 60,428, reasoning: 2,946
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_08d4edd_gemini_3_1_pro_run_N9_20260611_204711

litellm_proxy_deepseek_deepseek_v4_flash

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.03
  • Token Usage: prompt: 369,912, completion: 3,972, cache_read: 194,560, reasoning: 914
  • Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_08d4edd_deepseek_v4_flash_run_N9_20260611_204714
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Manual functional QA passes: the PR adds a usable ask_oracle SDK tool backed by a saved LLM profile, but CI is not currently green.

Does this PR achieve its stated goal?

Yes. Before the PR, the SDK did not expose oracle_llm_profile or AskOracleAction; on the PR branch, a user-style script saved an Oracle profile, configured OpenHandsAgentSettings(oracle_llm_profile=...), saw ask_oracle registered, executed it against a real LLM endpoint, and received the expected ORACLE_QA_OK response. A sentinel primary model remained unchanged after the tool call, confirming the Oracle consultation did not switch the active conversation LLM.

Phase Result
Environment Setup make build completed successfully; no tests, linters, or pre-commit hooks were run.
CI Status ⚠️ Current checks are not green: 3 failures (Validate PR description, check-examples, sdk-tests), 8 in progress, 1 queued, 28 successful, 12 skipped.
Functional Verification ✅ SDK user flow and the new example both executed successfully with real LLM credentials.
Functional Verification

Test 1: Baseline confirms the feature is new

Step 1 — Reproduce / establish baseline without the PR:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_ask_oracle_user_flow.py:

{
  "settings_has_oracle_llm_profile": false,
  "ask_oracle_import": "unavailable: ImportError: cannot import name 'AskOracleAction' from 'openhands.sdk.tool.builtins' ..."
}

This shows the base branch does not provide the settings field or built-in action/tool API.

Step 2 — Apply the PR's changes:
Checked out feat/ask-oracle-tool at 08d4eddcb85cd18d6c59f3f5c4b2d18f1fc5430b.

Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_ask_oracle_user_flow.py with LLM_API_KEY, LLM_MODEL, and LLM_BASE_URL:

{
  "settings_has_oracle_llm_profile": true,
  "ask_oracle_import": "available",
  "saved_profiles": ["qa-oracle-profile.json"],
  "tool_without_profile_present": false,
  "configured_tools": ["ask_oracle", "finish", "think"],
  "observation_is_error": false,
  "observation_profile_name": "qa-oracle-profile",
  "observation_oracle_model": "litellm_proxy/openai/gpt-5.5",
  "observation_text": "ORACLE_QA_OK",
  "active_agent_model_after_tool": "qa-primary-model-not-called"
}

This confirms the settings field registers the tool only when configured, the tool loads the saved profile and calls the Oracle LLM successfully, and the active primary LLM remains unchanged after the consultation.

Test 2: New example runs as a user-facing entry point

Ran OPENHANDS_SUPPRESS_BANNER=1 OPENAI_API_KEY="$LLM_API_KEY" LITELLM_API_KEY="$LLM_API_KEY" ASK_ORACLE_MODEL="$LLM_MODEL" ASK_ORACLE_BASE_URL="$LLM_BASE_URL" uv run python examples/01_standalone_sdk/54_ask_oracle_tool/main.py:

Configured tools: ['ask_oracle', 'finish', 'switch_llm', 'think']
Oracle said:
Use one nullable string setting (where `None`/unset means the feature is disabled and a non-empty value enables it with that value), unless you need to distinguish “enabled with no value” from “disabled,” because it is simpler and more backwards-compatible than adding a separate boolean plus string.
EXAMPLE_COST: 0

This shows the committed example performs the saved-profile setup, exposes ask_oracle, receives an Oracle recommendation, and completes normally.

Issues Found

  • 🟡 CI status: Manual QA found no functional issues, but the PR currently has failing CI checks (Validate PR description, check-examples, and sdk-tests) that should be resolved or explained before merge.

This review was created by an AI agent (OpenHands) on behalf of the user.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst force-pushed the feat/ask-oracle-tool branch from 08d4edd to 9c2b227 Compare June 11, 2026 21:06
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/settings
   model.py6984693%99, 398, 416, 596, 606–609, 612, 625, 629, 635, 645, 651, 656, 869, 894, 896, 898, 900, 902, 904, 906, 1198, 1200, 1614, 1634, 1797, 1926, 1965, 1991, 2127–2129, 2131, 2185, 2217, 2227, 2229, 2234, 2252, 2265, 2267, 2269, 2271, 2278
openhands-sdk/openhands/sdk/tool/builtins
   ask_oracle.py721677%42–48, 63, 106, 124–125, 129–130, 157–158, 195
TOTAL30600845372% 

@enyst enyst added the review-this This label triggers a PR review by OpenHands label Jun 11, 2026

all-hands-bot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(sdk): add ask_oracle tool

🟢 Good taste — Elegant, simple solution that solves a real problem with minimal complexity.

Summary

This PR adds an ask_oracle tool that allows an agent to consult a saved LLM profile for a stateless second-opinion recommendation. The implementation:

  • Solves a genuine problem: agents benefit from higher-quality guidance when stuck
  • Uses existing infrastructure: leverages LLMProfileStore rather than reinventing profile management
  • Is purely additive: no breaking changes to existing APIs or behavior
  • Follows established patterns: matches the structure of SwitchLLMTool

Critical Issues

None.

Minor Observations

🟡 Cleanup before merge: The .pr/ directory contains temporary evidence files (.pr/ask_oracle_live_validation.json, .pr/ask_oracle_live_validation.py, etc.) that appear to be validation artifacts. Consider whether these should be removed before merging, or if they're intentionally kept as documentation.

Testing

Comprehensive test coverage with:

  • Unit tests for the AskOracleTool class
  • Integration tests verifying the tool works end-to-end
  • Settings schema tests
  • Example discovery tests

All relevant CI checks pass (sdk-tests, tools-tests, check-examples).

Risk Assessment

🟢 LOW — The feature is opt-in via oracle_llm_profile setting, has no breaking changes, and is thoroughly tested. The tool is correctly annotated as read-only.

Verdict

Worth merging — Solid implementation with good test coverage. The .pr/ directory is the only cleanup item worth considering.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Updated the description of the Oracle to clarify its purpose and capabilities.
@enyst enyst added review-this This label triggers a PR review by OpenHands and removed review-this This label triggers a PR review by OpenHands labels Jun 11, 2026

all-hands-bot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(sdk): add ask_oracle tool

🟡 Acceptable — Solid implementation with one functional inconsistency to address.


[CRITICAL ISSUES]

None identified.


[IMPROVEMENT OPPORTUNITIES]

[openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, Line ~170] Bug: Profile name not embedded in description

The _DESCRIPTION_TEMPLATE is formatted with profile_name but contains no {profile_name} placeholder, so the call is a no-op:

_DESCRIPTION_TEMPLATE.format(profile_name=profile_display)  # does nothing

The test test_ask_oracle_tool_description_names_configured_profile() asserts "Configured Oracle profile: oracle" in tool.description, but the current template produces only the static text. Either the template is missing a placeholder like:

"...the Oracle is available when configured with profile **{profile_name}**."

or the test assertion needs adjustment.


[openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, Line ~195] Defensive response content handling

The text extraction loops over all response content items expecting TextContent:

oracle_text = "".join(
    content.text
    for content in response.message.content
    if isinstance(content, TextContent)
).strip()

If the Oracle returns a non-TextContent item (unlikely given the prompt, but not impossible), accessing .text would raise AttributeError. Consider filtering explicitly:

oracle_text = "".join(
    getattr(content, 'text', '')  # or use .get() pattern
    for content in response.message.content
    if isinstance(content, TextContent)
).strip()

[STYLE NOTES]

Skipped — code quality is good overall.


[TESTING GAPS]

The test suite is comprehensive. The live validation evidence in .pr/ask_oracle_validation_summary.md demonstrates end-to-end functionality. Good practice.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
    The feature adds a new read-only tool that doesn't modify state. Error paths are well-handled. No breaking changes to existing APIs.

[VERDICT]

Worth merging — Core logic is sound. Address the description template inconsistency before merging.


KEY INSIGHT:
The ask_oracle tool is a well-designed read-only consultation mechanism that doesn't affect the primary conversation LLM, keeping concerns cleanly separated.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Comment thread openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py Outdated
Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ask_oracle tool backed by a configured LLM profile

3 participants