feat(sdk): add ask_oracle tool by enyst · Pull Request #3673 · OpenHands/software-agent-sdk

enyst · 2026-06-11T20:45:32Z

HUMAN:
This PR proposes an Oracle tool, for the agent to ask a more capable LLM when it encounters a difficulty, when it needs a second opinion, or when the user tells it to.

A human has tested these changes.

AGENT:

Why

Agents sometimes need a second opinion from a stronger or more specialized saved LLM profile without permanently switching the active conversation profile. This adds a minimal ask_oracle tool powered by an OpenHandsAgentSettings.oracle_llm_profile profile name so the agent can consult that Oracle profile statelessly and continue with its current LLM.

Summary

Add a built-in ask_oracle tool that asks a saved Oracle LLM profile for stateless second-opinion guidance.
Add OpenHandsAgentSettings.oracle_llm_profile so users can declare the saved profile that powers the Oracle tool.
Add unit coverage, settings schema coverage, an example, and temporary .pr/ evidence for reviewers.

Closes #3672.

Evidence

The .pr/ directory is intentional. Per this repository's PR artifact policy, temporary design notes, live-test evidence, JSON results, and reviewer-facing validation summaries that should not merge to main belong under .pr/ during review. The source for that policy is the repository guidance in AGENTS.md, section PR_ARTIFACTS: it says .pr/ is for PR-specific documents/scripts/artifacts, that reviewers are notified when it exists, and that the directory is automatically removed by workflow when the PR is approved. In other words, approving this PR will not merge the .pr/ artifacts; approval triggers the cleanup workflow to remove them before merge.

What the evidence proves:

.pr/ask_oracle_live_validation.json shows a live run where the regular profile used OpenAI direct openai/gpt-5-nano, the Oracle profile used litellm_proxy/openai/gpt-5-mini through https://llm-proxy.eval.all-hands.dev, and ask_oracle returned a successful non-error response from the Oracle.
.pr/ask_oracle_live_validation.py is the exact script used to create that JSON. It creates an isolated temporary profile store, saves an oracle profile, builds OpenHandsAgentSettings(oracle_llm_profile="oracle"), executes ask_oracle, records the response, and removes the temporary profile store in finally.
.pr/ask_oracle_test_results.json records the targeted pytest, example pytest, pre-commit, and live-validation commands/results.
.pr/ask_oracle_validation_summary.md summarizes the behavior for reviewers: the tool consults the saved Oracle profile statelessly, sends only the Oracle system prompt plus the agent's question/context (no conversation history or tools), and does not switch the active conversation LLM.

How to Test

uv run pre-commit run --files openhands-sdk/openhands/sdk/settings/model.py openhands-sdk/openhands/sdk/tool/builtins/__init__.py openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py tests/sdk/tool/test_ask_oracle.py tests/sdk/test_settings.py tests/examples/test_examples.py examples/01_standalone_sdk/54_ask_oracle_tool/main.py .pr/ask_oracle_live_validation.py .pr/ask_oracle_live_validation.json .pr/ask_oracle_test_results.json .pr/ask_oracle_validation_summary.md
uv run pytest tests/sdk/tool/test_ask_oracle.py tests/sdk/tool/test_builtins.py tests/sdk/test_settings.py::test_llm_agent_settings_export_schema_groups_sections tests/examples/test_examples.py::test_directory_example_is_discovered
uv run pytest tests/examples/test_examples.py --run-examples -k 54_ask_oracle_tool
CI=true uv run python -m pytest -q tests/sdk
Live validation in .pr/ask_oracle_live_validation.json using openai/gpt-5-nano as the regular profile and litellm_proxy/openai/gpt-5-mini through the eval proxy as the Oracle profile.

This PR was created by an AI agent (OpenHands) on behalf of the user.

@enyst can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:ed274a8-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-ed274a8-python \
  ghcr.io/openhands/agent-server:ed274a8-python

All tags pushed for this build

ghcr.io/openhands/agent-server:ed274a8-golang-amd64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-golang-amd64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-golang-amd64
ghcr.io/openhands/agent-server:ed274a8-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:ed274a8-golang-arm64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-golang-arm64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-golang-arm64
ghcr.io/openhands/agent-server:ed274a8-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:ed274a8-java-amd64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-java-amd64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-java-amd64
ghcr.io/openhands/agent-server:ed274a8-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:ed274a8-java-arm64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-java-arm64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-java-arm64
ghcr.io/openhands/agent-server:ed274a8-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:ed274a8-python-amd64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-python-amd64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-python-amd64
ghcr.io/openhands/agent-server:ed274a8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:ed274a8-python-arm64
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-python-arm64
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-python-arm64
ghcr.io/openhands/agent-server:ed274a8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:ed274a8-golang
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-golang
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-golang
ghcr.io/openhands/agent-server:ed274a8-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:ed274a8-java
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-java
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-java
ghcr.io/openhands/agent-server:ed274a8-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:ed274a8-python
ghcr.io/openhands/agent-server:ed274a8d7fa1fdb02d267abfa31b665327ad2d87-python
ghcr.io/openhands/agent-server:feat-ask-oracle-tool-python
ghcr.io/openhands/agent-server:ed274a8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., ed274a8-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., ed274a8-python-amd64) are also available if needed

github-actions · 2026-06-11T20:45:40Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-06-11T20:45:44Z

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

github-actions · 2026-06-11T20:46:05Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-11T20:46:09Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-11T20:49:36Z

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $1.51
Models Tested: 4
Timestamp: 2026-06-11 20:49:27 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_minimax_MiniMax_M2.7: 📥 View & Download Logs
litellm_proxy_openai_gpt_5.5: 📥 View & Download Logs
litellm_proxy_gemini_3.1_pro_preview: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_v4_flash: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_minimax_MiniMax_M2.7	100.0%	8/8	1	9	$0.00	355,592
litellm_proxy_openai_gpt_5.5	100.0%	9/9	0	9	$0.87	279,716
litellm_proxy_gemini_3.1_pro_preview	100.0%	9/9	0	9	$0.62	339,013
litellm_proxy_deepseek_deepseek_v4_flash	100.0%	8/8	1	9	$0.03	373,884

📋 Detailed Results

litellm_proxy_minimax_MiniMax_M2.7

Success Rate: 100.0% (8/8)
Total Cost: $0.00
Token Usage: prompt: 351,045, completion: 4,547, cache_read: 255,135, reasoning: 162
Run Suffix: litellm_proxy_minimax_MiniMax_M2.7_08d4edd_minimax_m2_7_run_N9_20260611_204738
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_openai_gpt_5.5

Success Rate: 100.0% (9/9)
Total Cost: $0.87
Token Usage: prompt: 275,027, completion: 4,689, cache_read: 144,384, reasoning: 1,674
Run Suffix: litellm_proxy_openai_gpt_5.5_08d4edd_gpt_5_5_run_N9_20260611_204729

litellm_proxy_gemini_3.1_pro_preview

Success Rate: 100.0% (9/9)
Total Cost: $0.62
Token Usage: prompt: 334,391, completion: 4,622, cache_read: 60,428, reasoning: 2,946
Run Suffix: litellm_proxy_gemini_3.1_pro_preview_08d4edd_gemini_3_1_pro_run_N9_20260611_204711

litellm_proxy_deepseek_deepseek_v4_flash

Success Rate: 100.0% (8/8)
Total Cost: $0.03
Token Usage: prompt: 369,912, completion: 3,972, cache_read: 194,560, reasoning: 914
Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_08d4edd_deepseek_v4_flash_run_N9_20260611_204714
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Manual functional QA passes: the PR adds a usable ask_oracle SDK tool backed by a saved LLM profile, but CI is not currently green.

Does this PR achieve its stated goal?

Yes. Before the PR, the SDK did not expose oracle_llm_profile or AskOracleAction; on the PR branch, a user-style script saved an Oracle profile, configured OpenHandsAgentSettings(oracle_llm_profile=...), saw ask_oracle registered, executed it against a real LLM endpoint, and received the expected ORACLE_QA_OK response. A sentinel primary model remained unchanged after the tool call, confirming the Oracle consultation did not switch the active conversation LLM.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no tests, linters, or pre-commit hooks were run.
CI Status	⚠️ Current checks are not green: 3 failures (`Validate PR description`, `check-examples`, `sdk-tests`), 8 in progress, 1 queued, 28 successful, 12 skipped.
Functional Verification	✅ SDK user flow and the new example both executed successfully with real LLM credentials.

Functional Verification

Test 1: Baseline confirms the feature is new

Step 1 — Reproduce / establish baseline without the PR:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_ask_oracle_user_flow.py:

{
  "settings_has_oracle_llm_profile": false,
  "ask_oracle_import": "unavailable: ImportError: cannot import name 'AskOracleAction' from 'openhands.sdk.tool.builtins' ..."
}

This shows the base branch does not provide the settings field or built-in action/tool API.

Step 2 — Apply the PR's changes:
Checked out feat/ask-oracle-tool at 08d4eddcb85cd18d6c59f3f5c4b2d18f1fc5430b.

Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_ask_oracle_user_flow.py with LLM_API_KEY, LLM_MODEL, and LLM_BASE_URL:

{
  "settings_has_oracle_llm_profile": true,
  "ask_oracle_import": "available",
  "saved_profiles": ["qa-oracle-profile.json"],
  "tool_without_profile_present": false,
  "configured_tools": ["ask_oracle", "finish", "think"],
  "observation_is_error": false,
  "observation_profile_name": "qa-oracle-profile",
  "observation_oracle_model": "litellm_proxy/openai/gpt-5.5",
  "observation_text": "ORACLE_QA_OK",
  "active_agent_model_after_tool": "qa-primary-model-not-called"
}

This confirms the settings field registers the tool only when configured, the tool loads the saved profile and calls the Oracle LLM successfully, and the active primary LLM remains unchanged after the consultation.

Test 2: New example runs as a user-facing entry point

Ran OPENHANDS_SUPPRESS_BANNER=1 OPENAI_API_KEY="$LLM_API_KEY" LITELLM_API_KEY="$LLM_API_KEY" ASK_ORACLE_MODEL="$LLM_MODEL" ASK_ORACLE_BASE_URL="$LLM_BASE_URL" uv run python examples/01_standalone_sdk/54_ask_oracle_tool/main.py:

Configured tools: ['ask_oracle', 'finish', 'switch_llm', 'think']
Oracle said:
Use one nullable string setting (where `None`/unset means the feature is disabled and a non-empty value enables it with that value), unless you need to distinguish “enabled with no value” from “disabled,” because it is simpler and more backwards-compatible than adding a separate boolean plus string.
EXAMPLE_COST: 0

This shows the committed example performs the saved-profile setup, exposes ask_oracle, receives an Oracle recommendation, and completes normally.

Issues Found

🟡 CI status: Manual QA found no functional issues, but the PR currently has failing CI checks (Validate PR description, check-examples, and sdk-tests) that should be resolved or explained before merge.

This review was created by an AI agent (OpenHands) on behalf of the user.

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-11T21:15:05Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/settings
model.py	698	46	93%	99, 398, 416, 596, 606–609, 612, 625, 629, 635, 645, 651, 656, 869, 894, 896, 898, 900, 902, 904, 906, 1198, 1200, 1614, 1634, 1797, 1926, 1965, 1991, 2127–2129, 2131, 2185, 2217, 2227, 2229, 2234, 2252, 2265, 2267, 2269, 2271, 2278
openhands-sdk/openhands/sdk/tool/builtins
ask_oracle.py	72	16	77%	42–48, 63, 106, 124–125, 129–130, 157–158, 195
TOTAL	30600	8453	72%

all-hands-bot · 2026-06-11T21:43:03Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review: feat(sdk): add ask_oracle tool

🟢 Good taste — Elegant, simple solution that solves a real problem with minimal complexity.

Summary

This PR adds an ask_oracle tool that allows an agent to consult a saved LLM profile for a stateless second-opinion recommendation. The implementation:

Solves a genuine problem: agents benefit from higher-quality guidance when stuck
Uses existing infrastructure: leverages LLMProfileStore rather than reinventing profile management
Is purely additive: no breaking changes to existing APIs or behavior
Follows established patterns: matches the structure of SwitchLLMTool

Critical Issues

None.

Minor Observations

🟡 Cleanup before merge: The .pr/ directory contains temporary evidence files (.pr/ask_oracle_live_validation.json, .pr/ask_oracle_live_validation.py, etc.) that appear to be validation artifacts. Consider whether these should be removed before merging, or if they're intentionally kept as documentation.

Testing

Comprehensive test coverage with:

Unit tests for the AskOracleTool class
Integration tests verifying the tool works end-to-end
Settings schema tests
Example discovery tests

All relevant CI checks pass (sdk-tests, tools-tests, check-examples).

Risk Assessment

🟢 LOW — The feature is opt-in via oracle_llm_profile setting, has no breaking changes, and is thoroughly tested. The tool is correctly annotated as read-only.

Verdict

✅ Worth merging — Solid implementation with good test coverage. The .pr/ directory is the only cleanup item worth considering.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Updated the description of the Oracle to clarify its purpose and capabilities.

all-hands-bot · 2026-06-11T23:24:42Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review: feat(sdk): add ask_oracle tool

🟡 Acceptable — Solid implementation with one functional inconsistency to address.

[CRITICAL ISSUES]

None identified.

[IMPROVEMENT OPPORTUNITIES]

[openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, Line ~170] Bug: Profile name not embedded in description

The _DESCRIPTION_TEMPLATE is formatted with profile_name but contains no {profile_name} placeholder, so the call is a no-op:

_DESCRIPTION_TEMPLATE.format(profile_name=profile_display)  # does nothing

The test test_ask_oracle_tool_description_names_configured_profile() asserts "Configured Oracle profile: oracle" in tool.description, but the current template produces only the static text. Either the template is missing a placeholder like:

"...the Oracle is available when configured with profile **{profile_name}**."

or the test assertion needs adjustment.

[openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, Line ~195] Defensive response content handling

The text extraction loops over all response content items expecting TextContent:

oracle_text = "".join(
    content.text
    for content in response.message.content
    if isinstance(content, TextContent)
).strip()

If the Oracle returns a non-TextContent item (unlikely given the prompt, but not impossible), accessing .text would raise AttributeError. Consider filtering explicitly:

oracle_text = "".join(
    getattr(content, 'text', '')  # or use .get() pattern
    for content in response.message.content
    if isinstance(content, TextContent)
).strip()

[STYLE NOTES]

Skipped — code quality is good overall.

[TESTING GAPS]

The test suite is comprehensive. The live validation evidence in .pr/ask_oracle_validation_summary.md demonstrates end-to-end functionality. Good practice.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW
The feature adds a new read-only tool that doesn't modify state. Error paths are well-handled. No breaking changes to existing APIs.

[VERDICT]

✅ Worth merging — Core logic is sound. Address the description template inconsistency before merging.

KEY INSIGHT:
The ask_oracle tool is a well-designed read-only consultation mechanism that doesn't affect the primary conversation LLM, keeping concerns cleanly separated.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Co-authored-by: openhands <openhands@all-hands.dev>

enyst added the integration-test Runs the integration tests and comments the results label Jun 11, 2026 — with OpenHands AI

all-hands-bot reviewed Jun 11, 2026

View reviewed changes

enyst mentioned this pull request Jun 11, 2026

docs(sdk): document ask_oracle tool OpenHands/docs#566

Open

feat(sdk): add ask oracle tool

9c2b227

Co-authored-by: openhands <openhands@all-hands.dev>

enyst force-pushed the feat/ask-oracle-tool branch from 08d4edd to 9c2b227 Compare June 11, 2026 21:06

enyst added the review-this This label triggers a PR review by OpenHands label Jun 11, 2026

all-hands-bot reviewed Jun 11, 2026

View reviewed changes

Revise Oracle description for clarity and intent

3daa1ce

Updated the description of the Oracle to clarify its purpose and capabilities.

enyst added review-this This label triggers a PR review by OpenHands and removed review-this This label triggers a PR review by OpenHands labels Jun 11, 2026

all-hands-bot reviewed Jun 11, 2026

View reviewed changes