feat(sdk): add ask_oracle tool#3673
Conversation
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
📁 PR Artifacts Notice This PR contains a
|
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_minimax_MiniMax_M2.7
Skipped Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Manual functional QA passes: the PR adds a usable ask_oracle SDK tool backed by a saved LLM profile, but CI is not currently green.
Does this PR achieve its stated goal?
Yes. Before the PR, the SDK did not expose oracle_llm_profile or AskOracleAction; on the PR branch, a user-style script saved an Oracle profile, configured OpenHandsAgentSettings(oracle_llm_profile=...), saw ask_oracle registered, executed it against a real LLM endpoint, and received the expected ORACLE_QA_OK response. A sentinel primary model remained unchanged after the tool call, confirming the Oracle consultation did not switch the active conversation LLM.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; no tests, linters, or pre-commit hooks were run. |
| CI Status | Validate PR description, check-examples, sdk-tests), 8 in progress, 1 queued, 28 successful, 12 skipped. |
| Functional Verification | ✅ SDK user flow and the new example both executed successfully with real LLM credentials. |
Functional Verification
Test 1: Baseline confirms the feature is new
Step 1 — Reproduce / establish baseline without the PR:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_ask_oracle_user_flow.py:
{
"settings_has_oracle_llm_profile": false,
"ask_oracle_import": "unavailable: ImportError: cannot import name 'AskOracleAction' from 'openhands.sdk.tool.builtins' ..."
}This shows the base branch does not provide the settings field or built-in action/tool API.
Step 2 — Apply the PR's changes:
Checked out feat/ask-oracle-tool at 08d4eddcb85cd18d6c59f3f5c4b2d18f1fc5430b.
Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_ask_oracle_user_flow.py with LLM_API_KEY, LLM_MODEL, and LLM_BASE_URL:
{
"settings_has_oracle_llm_profile": true,
"ask_oracle_import": "available",
"saved_profiles": ["qa-oracle-profile.json"],
"tool_without_profile_present": false,
"configured_tools": ["ask_oracle", "finish", "think"],
"observation_is_error": false,
"observation_profile_name": "qa-oracle-profile",
"observation_oracle_model": "litellm_proxy/openai/gpt-5.5",
"observation_text": "ORACLE_QA_OK",
"active_agent_model_after_tool": "qa-primary-model-not-called"
}This confirms the settings field registers the tool only when configured, the tool loads the saved profile and calls the Oracle LLM successfully, and the active primary LLM remains unchanged after the consultation.
Test 2: New example runs as a user-facing entry point
Ran OPENHANDS_SUPPRESS_BANNER=1 OPENAI_API_KEY="$LLM_API_KEY" LITELLM_API_KEY="$LLM_API_KEY" ASK_ORACLE_MODEL="$LLM_MODEL" ASK_ORACLE_BASE_URL="$LLM_BASE_URL" uv run python examples/01_standalone_sdk/54_ask_oracle_tool/main.py:
Configured tools: ['ask_oracle', 'finish', 'switch_llm', 'think']
Oracle said:
Use one nullable string setting (where `None`/unset means the feature is disabled and a non-empty value enables it with that value), unless you need to distinguish “enabled with no value” from “disabled,” because it is simpler and more backwards-compatible than adding a separate boolean plus string.
EXAMPLE_COST: 0
This shows the committed example performs the saved-profile setup, exposes ask_oracle, receives an Oracle recommendation, and completes normally.
Issues Found
- 🟡 CI status: Manual QA found no functional issues, but the PR currently has failing CI checks (
Validate PR description,check-examples, andsdk-tests) that should be resolved or explained before merge.
This review was created by an AI agent (OpenHands) on behalf of the user.
Co-authored-by: openhands <openhands@all-hands.dev>
08d4edd to
9c2b227
Compare
Coverage Report •
|
||||||||||||||||||||||||||||||
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: feat(sdk): add ask_oracle tool
🟢 Good taste — Elegant, simple solution that solves a real problem with minimal complexity.
Summary
This PR adds an ask_oracle tool that allows an agent to consult a saved LLM profile for a stateless second-opinion recommendation. The implementation:
- Solves a genuine problem: agents benefit from higher-quality guidance when stuck
- Uses existing infrastructure: leverages
LLMProfileStorerather than reinventing profile management - Is purely additive: no breaking changes to existing APIs or behavior
- Follows established patterns: matches the structure of
SwitchLLMTool
Critical Issues
None.
Minor Observations
🟡 Cleanup before merge: The .pr/ directory contains temporary evidence files (.pr/ask_oracle_live_validation.json, .pr/ask_oracle_live_validation.py, etc.) that appear to be validation artifacts. Consider whether these should be removed before merging, or if they're intentionally kept as documentation.
Testing
Comprehensive test coverage with:
- Unit tests for the
AskOracleToolclass - Integration tests verifying the tool works end-to-end
- Settings schema tests
- Example discovery tests
All relevant CI checks pass (sdk-tests, tools-tests, check-examples).
Risk Assessment
🟢 LOW — The feature is opt-in via oracle_llm_profile setting, has no breaking changes, and is thoroughly tested. The tool is correctly annotated as read-only.
Verdict
✅ Worth merging — Solid implementation with good test coverage. The .pr/ directory is the only cleanup item worth considering.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Updated the description of the Oracle to clarify its purpose and capabilities.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: feat(sdk): add ask_oracle tool
🟡 Acceptable — Solid implementation with one functional inconsistency to address.
[CRITICAL ISSUES]
None identified.
[IMPROVEMENT OPPORTUNITIES]
[openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, Line ~170] Bug: Profile name not embedded in description
The _DESCRIPTION_TEMPLATE is formatted with profile_name but contains no {profile_name} placeholder, so the call is a no-op:
_DESCRIPTION_TEMPLATE.format(profile_name=profile_display) # does nothingThe test test_ask_oracle_tool_description_names_configured_profile() asserts "Configured Oracle profile: oracle" in tool.description, but the current template produces only the static text. Either the template is missing a placeholder like:
"...the Oracle is available when configured with profile **{profile_name}**."or the test assertion needs adjustment.
[openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, Line ~195] Defensive response content handling
The text extraction loops over all response content items expecting TextContent:
oracle_text = "".join(
content.text
for content in response.message.content
if isinstance(content, TextContent)
).strip()If the Oracle returns a non-TextContent item (unlikely given the prompt, but not impossible), accessing .text would raise AttributeError. Consider filtering explicitly:
oracle_text = "".join(
getattr(content, 'text', '') # or use .get() pattern
for content in response.message.content
if isinstance(content, TextContent)
).strip()[STYLE NOTES]
Skipped — code quality is good overall.
[TESTING GAPS]
The test suite is comprehensive. The live validation evidence in .pr/ask_oracle_validation_summary.md demonstrates end-to-end functionality. Good practice.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
The feature adds a new read-only tool that doesn't modify state. Error paths are well-handled. No breaking changes to existing APIs.
[VERDICT]
✅ Worth merging — Core logic is sound. Address the description template inconsistency before merging.
KEY INSIGHT:
The ask_oracle tool is a well-designed read-only consultation mechanism that doesn't affect the primary conversation LLM, keeping concerns cleanly separated.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Co-authored-by: openhands <openhands@all-hands.dev>
HUMAN:
This PR proposes an
Oracletool, for the agent to ask a more capable LLM when it encounters a difficulty, when it needs a second opinion, or when the user tells it to.AGENT:
Why
Agents sometimes need a second opinion from a stronger or more specialized saved LLM profile without permanently switching the active conversation profile. This adds a minimal
ask_oracletool powered by anOpenHandsAgentSettings.oracle_llm_profileprofile name so the agent can consult that Oracle profile statelessly and continue with its current LLM.Summary
ask_oracletool that asks a saved Oracle LLM profile for stateless second-opinion guidance.OpenHandsAgentSettings.oracle_llm_profileso users can declare the saved profile that powers the Oracle tool..pr/evidence for reviewers.Closes #3672.
Evidence
The
.pr/directory is intentional. Per this repository's PR artifact policy, temporary design notes, live-test evidence, JSON results, and reviewer-facing validation summaries that should not merge tomainbelong under.pr/during review. The source for that policy is the repository guidance inAGENTS.md, sectionPR_ARTIFACTS: it says.pr/is for PR-specific documents/scripts/artifacts, that reviewers are notified when it exists, and that the directory is automatically removed by workflow when the PR is approved. In other words, approving this PR will not merge the.pr/artifacts; approval triggers the cleanup workflow to remove them before merge.What the evidence proves:
.pr/ask_oracle_live_validation.jsonshows a live run where the regular profile used OpenAI directopenai/gpt-5-nano, the Oracle profile usedlitellm_proxy/openai/gpt-5-minithroughhttps://llm-proxy.eval.all-hands.dev, andask_oraclereturned a successful non-errorresponsefrom the Oracle..pr/ask_oracle_live_validation.pyis the exact script used to create that JSON. It creates an isolated temporary profile store, saves anoracleprofile, buildsOpenHandsAgentSettings(oracle_llm_profile="oracle"), executesask_oracle, records the response, and removes the temporary profile store infinally..pr/ask_oracle_test_results.jsonrecords the targeted pytest, example pytest, pre-commit, and live-validation commands/results..pr/ask_oracle_validation_summary.mdsummarizes the behavior for reviewers: the tool consults the saved Oracle profile statelessly, sends only the Oracle system prompt plus the agent's question/context (no conversation history or tools), and does not switch the active conversation LLM.How to Test
uv run pre-commit run --files openhands-sdk/openhands/sdk/settings/model.py openhands-sdk/openhands/sdk/tool/builtins/__init__.py openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py tests/sdk/tool/test_ask_oracle.py tests/sdk/test_settings.py tests/examples/test_examples.py examples/01_standalone_sdk/54_ask_oracle_tool/main.py .pr/ask_oracle_live_validation.py .pr/ask_oracle_live_validation.json .pr/ask_oracle_test_results.json .pr/ask_oracle_validation_summary.mduv run pytest tests/sdk/tool/test_ask_oracle.py tests/sdk/tool/test_builtins.py tests/sdk/test_settings.py::test_llm_agent_settings_export_schema_groups_sections tests/examples/test_examples.py::test_directory_example_is_discovereduv run pytest tests/examples/test_examples.py --run-examples -k 54_ask_oracle_toolCI=true uv run python -m pytest -q tests/sdk.pr/ask_oracle_live_validation.jsonusingopenai/gpt-5-nanoas the regular profile andlitellm_proxy/openai/gpt-5-minithrough the eval proxy as the Oracle profile.This PR was created by an AI agent (OpenHands) on behalf of the user.
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:ed274a8-pythonRun
All tags pushed for this build
About Multi-Architecture Support
ed274a8-python) is a multi-arch manifest supporting both amd64 and arm64ed274a8-python-amd64) are also available if needed