Release v1.20.0#3038
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
1 similar comment
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean, mechanical release version bump.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Pure version number updates for release v1.20.0. All packages consistently bumped from 1.19.1 to 1.20.0, eval workflow default updated, and lock file regenerated correctly. No code changes or behavioral modifications.
VERDICT:
✅ Worth merging: Standard release process, all version updates are consistent.
KEY INSIGHT:
Textbook release PR - mechanical version bumps with proper lock file maintenance.
Note: Cannot formally approve as this appears to be my own PR, but this is ready to merge from a technical review perspective.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Version bump from 1.19.1 to 1.20.0 is complete and consistent across all packages. Built packages correctly report version 1.20.0 and all artifacts are ready for release.
Does this PR achieve its stated goal?
Yes. This PR successfully prepares the repository for version 1.20.0 release. All version strings have been updated from 1.19.1 to 1.20.0 across all four packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server), the lockfile, and the workflow configuration. The packages build successfully and report the correct version through their metadata. The version bump is the only change required for this release PR, and it has been executed correctly.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev completed, all packages installed |
| CI Status | |
| Functional Verification | ✅ All version strings updated consistently, packages build successfully |
Functional Verification
Test 1: Version Consistency Across Configuration Files
Step 1 — Establish baseline (main branch):
Checked version in all pyproject.toml files on main branch:
openhands-sdk/pyproject.toml: version = "1.19.1"
openhands-tools/pyproject.toml: version = "1.19.1"
openhands-workspace/pyproject.toml: version = "1.19.1"
openhands-agent-server/pyproject.toml: version = "1.19.1"
.github/workflows/run-eval.yml: default: v1.19.1
This confirms the baseline version is 1.19.1 before this PR.
Step 2 — Apply the PR's changes:
Checked out the rel-1.20.0 branch (commit f1621e9).
Step 3 — Verify version updates:
Ran grep '^version' */pyproject.toml:
openhands-sdk/pyproject.toml: version = "1.20.0"
openhands-tools/pyproject.toml: version = "1.20.0"
openhands-workspace/pyproject.toml: version = "1.20.0"
openhands-agent-server/pyproject.toml: version = "1.20.0"
Checked uv.lock:
name = "openhands-agent-server"
version = "1.20.0"
name = "openhands-sdk"
version = "1.20.0"
name = "openhands-tools"
version = "1.20.0"
name = "openhands-workspace"
version = "1.20.0"
Checked workflow default:
sdk_ref:
description: SDK commit/ref to evaluate...
required: true
default: v1.20.0
This confirms all configuration files have been updated consistently to 1.20.0.
Test 2: Package Metadata Reports Correct Version
Step 1 — Verify installed packages report version 1.20.0:
Ran uv sync --dev to install packages, then checked versions:
from importlib.metadata import version
for pkg in ['openhands-sdk', 'openhands-tools', 'openhands-workspace', 'openhands-agent-server']:
print(f'{pkg}: {version(pkg)}')Output:
openhands-sdk: 1.20.0
openhands-tools: 1.20.0
openhands-workspace: 1.20.0
openhands-agent-server: 1.20.0
Imported the SDK and verified the banner shows the new version:
from openhands.sdk import Agent, LLM, ConversationBanner output:
+----------------------------------------------------------------------+
| OpenHands SDK v1.20.0 |
| |
| Report a bug: github.com/OpenHands/software-agent-sdk/issues |
| Get help: openhands.dev/joinslack |
| Scale up: openhands.dev/product/sdk |
+----------------------------------------------------------------------+
This confirms the installed packages correctly report version 1.20.0 through their metadata and user-facing version displays.
Test 3: Packages Build Successfully with Correct Version
Step 1 — Build openhands-sdk package:
Ran uv build openhands-sdk/ --out-dir /tmp/dist-test
Output:
Successfully built /tmp/dist-test/openhands_sdk-1.20.0.tar.gz
Successfully built /tmp/dist-test/openhands_sdk-1.20.0-py3-none-any.whl
Verified wheel metadata:
Name: openhands-sdk
Version: 1.20.0
Step 2 — Build openhands-agent-server package:
Ran uv build openhands-agent-server/ --out-dir /tmp/dist-test-server
Output:
Successfully built /tmp/dist-test-server/openhands_agent_server-1.20.0.tar.gz
Successfully built /tmp/dist-test-server/openhands_agent_server-1.20.0-py3-none-any.whl
This confirms all packages build successfully and their build artifacts (wheels, tarballs) contain the correct version 1.20.0 in their filenames and metadata. These are the artifacts that will be published to PyPI.
Test 4: No Unintended Version References
Searched for any remaining 1.19.1 references:
git grep -n "1\.19\.1" -- '*.py' '*.toml' '*.yml' '*.yaml' '*.md' '*.txt'Found only:
openhands-sdk/openhands/sdk/tool/registry.py:165: deprecated_in="1.19.1",
Verified context:
warn_deprecated(
"register_tool(callable_factory)",
deprecated_in="1.19.1",
removed_in="1.24.0",
...This is correct — it's deprecation metadata recording when a feature was deprecated, not the current version. No version strings were missed.
Issues Found
None related to the version bump. The PR achieves its stated goal.
Note on CI failures: Two checks are failing (agent-server-tests, check), but these appear to be pre-existing or unrelated to the version bump itself. The "Check package versions" CI check passes, confirming version consistency. The failing checks should be addressed as part of the release process per the PR checklist.
Coverage Report •
|
|||||||||||||||||||||||||||||||||||
Remove due LLM and agent-server deprecated APIs while extending the context.skills import shim for downstream migration. Co-authored-by: openhands <openhands@all-hands.dev>
Teach the REST API breakage check to allow OpenAPI schema property removals after their documented deprecation deadline, matching the release treatment for removed operations. Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww
left a comment
There was a problem hiding this comment.
Actually we haven't do test-examples/integration tests QAs
|
I might as well bring the latest updates from main in this release |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 97.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
|
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 31.0s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 22.2s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.1s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ❌ FAIL Exit code 1 |
40.1s | -- |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 14.0s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 45.0s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 43.2s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 10.1s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 37.1s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ❌ FAIL Exit code 1 |
15.1s | -- |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.7s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 16.5s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 18.1s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 26.1s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 24.4s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 22.7s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 2m 45s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 52s | $0.28 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 23s | $0.07 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 21.2s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 40.3s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 42.0s | $0.02 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.4s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 4m 29s | $0.31 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 33.5s | $0.04 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 17.6s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 3.9s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 51.2s | $0.04 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 21.7s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 33.1s | $0.14 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 35.2s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 39s | $0.09 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 6.3s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 16.8s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 3m 1s | $0.31 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 10.7s | $0.00 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.1s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ✅ PASS | 15.6s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 46.2s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 2m 0s | $0.07 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 21s | $0.06 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 50s | $0.03 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 42.2s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 38s | $0.02 |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 1m 1s | $0.11 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 29.0s | $0.05 |
| 02_remote_agent_server/11_conversation_fork.py | ✅ PASS | 1m 5s | $0.00 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 30.0s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 49.7s | $0.08 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 17.9s | $0.02 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 20.7s | $0.02 |
❌ Some tests failed
Total: 52 | Passed: 49 | Failed: 3 | Total Cost: $2.26
Failed examples:
- examples/01_standalone_sdk/07_mcp_integration.py: Exit code 1
- examples/01_standalone_sdk/14_context_condenser.py: Exit code 1
- examples/01_standalone_sdk/34_critic_example.py: Timed out after 600 seconds
|
@OpenHands investigate the failure in #3038 (comment) should i be worried? Can you only run those three failing example again and see if it is working? |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Investigation of Example Test FailuresThis comment was created by an AI agent (OpenHands). I investigated the 3 failing examples from the workflow run. Here's the analysis: 1.
|
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 32.5s | $0.04 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 33.0s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.0s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 40.3s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 12.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 34.0s | $0.03 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 47.0s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 9.0s | $0.00 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 35.4s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ❌ FAIL Exit code 1 |
3m 42s | -- |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.2s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 18.0s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 28.2s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 15.1s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 14.5s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 29.4s | $0.02 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 2m 4s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 22s | $0.33 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 15s | $0.07 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 17.1s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 38.5s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 47.5s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 8.6s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ❌ FAIL Exit code 1 |
7m 33s | -- |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 33.3s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 7m 59s | $0.68 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 26.6s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 7.9s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 31.3s | $0.03 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.7s | $0.00 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 33.6s | $0.13 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 29.3s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 49s | $0.08 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 4.9s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 7.6s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 3m 45s | $0.51 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 12.3s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.0s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ✅ PASS | 13.6s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 34.0s | $0.02 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 2m 2s | $0.07 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 17s | $0.09 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 23s | $0.03 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 29.9s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 46s | $0.02 |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 1m 8s | $0.11 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 34.1s | $0.05 |
| 02_remote_agent_server/11_conversation_fork.py | ✅ PASS | 59.1s | $0.00 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 27.0s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ❌ FAIL Exit code 1 |
32.2s | -- |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 24.6s | $0.02 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 22.9s | $0.02 |
❌ Some tests failed
Total: 52 | Passed: 49 | Failed: 3 | Total Cost: $2.79
Failed examples:
- examples/01_standalone_sdk/14_context_condenser.py: Exit code 1
- examples/01_standalone_sdk/31_iterative_refinement.py: Exit code 1
- examples/04_llm_specific_tools/02_gemini_file_tools.py: Exit code 1
Re-run Results UpdateThis comment was created by an AI agent (OpenHands). The re-run of the failed jobs has completed. Here's the comparison: Original 3 failures → Re-run results:
New failures in re-run (also transient):
Bottom line: No need to worry ✅All failures across both runs are transient LLM proxy authentication errors — a different subset of examples fails each time, confirming this is infrastructure flakiness, not a code regression. The originally-failing 49/52 examples pass consistently. The 3 failures in each run are random victims of proxy auth instability. |
SummaryThe user asked me to investigate the 3 failing examples from the test-examples workflow on PR #3038 and determine if they should be worried, then re-run those three failing examples. FindingsNo need to worry — all failures are transient infrastructure issues, not code bugs.
Key evidence it's flaky: The re-run produced a different set of 3 failures — previously-passing examples ( Actions Taken
|
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Release v1.20.0
This PR prepares the release for version 1.20.0.
Release Checklist
integration-test)behavior-test)test-examples)v1.20.0rel-1.20.0Next Steps
Once the release is published on GitHub, the PyPI packages will be automatically published via the
pypi-release.ymlworkflow.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:1635059-pythonRun
All tags pushed for this build
About Multi-Architecture Support
1635059-python) is a multi-arch manifest supporting both amd64 and arm641635059-python-amd64) are also available if needed