Release v1.20.1#3070
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste
Clean release PR with consistent version bumps across all packages. The uv.lock changes appear to be from a uv version update that shifts from absolute timestamps to relative span-based exclusions while maintaining the 7-day guardrail.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Version-only changes with no code logic modifications. Standard release process. The lockfile update maintains the workspace 7-day freshness guardrail via exclude-newer-span = "P7D".
VERDICT:
✅ Worth merging: Version bumps are correct and consistent. Complete the release checklist (tests, eval) before merging.
KEY INSIGHT:
The uv.lock exclude-newer change to epoch-zero is a uv implementation detail; the 7-day supply-chain guardrail remains enforced via the span field.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Version bump from 1.20.0 to 1.20.1 is correctly applied across all packages, builds succeed, and packages are functional.
Does this PR achieve its stated goal?
Yes. This PR successfully prepares version 1.20.1 for release. All four packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server) have been updated to version 1.20.1 in their pyproject.toml files, the GitHub workflow default value is updated, the lockfile reflects the new versions, and all packages build successfully into distributable wheels and source distributions. The packages can be imported and function correctly with the new version.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully, all dependencies installed |
| CI Status | ⏳ 22 passing, 12 pending, 0 failing — core checks (pre-commit, API breakage, tests) all green |
| Functional Verification | ✅ All version updates verified, packages build and import correctly |
Functional Verification
Test 1: Version Number Updates
Step 1 — Establish baseline (main branch):
Checked version on main branch:
git show origin/main:openhands-sdk/pyproject.toml | grep "^version"Output:
version = "1.20.0"
This confirms the current release version is 1.20.0.
Step 2 — Verify PR changes:
On the PR branch (rel-1.20.1), checked all package versions:
grep -n "^version" openhands-sdk/pyproject.toml openhands-tools/pyproject.toml \
openhands-workspace/pyproject.toml openhands-agent-server/pyproject.tomlOutput:
openhands-sdk/pyproject.toml:3:version = "1.20.1"
openhands-tools/pyproject.toml:3:version = "1.20.1"
openhands-workspace/pyproject.toml:3:version = "1.20.1"
openhands-agent-server/pyproject.toml:3:version = "1.20.1"
✅ Verdict: All four packages correctly updated from 1.20.0 to 1.20.1
Step 3 — Verify workflow default update:
grep -A3 "sdk_ref:" .github/workflows/run-eval.yml | grep -E "(default:|sdk_ref:)"Output:
sdk_ref:
default: v1.20.1
✅ Verdict: GitHub workflow default correctly updated to v1.20.1
Test 2: Package Build Verification
Step 1 — Set up development environment:
make buildOutput (excerpt):
Resolved 402 packages in 1ms
Built openhands-workspace @ file:///.../openhands-workspace
Built openhands-agent-server @ file:///.../openhands-agent-server
Built openhands-sdk @ file:///.../openhands-sdk
Built openhands-tools @ file:///.../openhands-tools
...
Installed 233 packages in 454ms
+ openhands-agent-server==1.20.1
+ openhands-sdk==1.20.1
+ openhands-tools==1.20.1
+ openhands-workspace==1.20.1
...
Build complete! Development environment is ready.
✅ Verdict: All packages install successfully with version 1.20.1
Step 2 — Build distribution packages:
uv build --all-packages -o /tmp/distOutput:
[openhands-agent-server] Building source distribution...
[openhands-sdk] Building source distribution...
[openhands-tools] Building source distribution...
[openhands-workspace] Building source distribution...
...
Successfully built /tmp/dist/openhands_agent_server-1.20.1.tar.gz
Successfully built /tmp/dist/openhands_agent_server-1.20.1-py3-none-any.whl
Successfully built /tmp/dist/openhands_sdk-1.20.1.tar.gz
Successfully built /tmp/dist/openhands_sdk-1.20.1-py3-none-any.whl
Successfully built /tmp/dist/openhands_tools-1.20.1.tar.gz
Successfully built /tmp/dist/openhands_tools-1.20.1-py3-none-any.whl
Successfully built /tmp/dist/openhands_workspace-1.20.1.tar.gz
Successfully built /tmp/dist/openhands_workspace-1.20.1-py3-none-any.whl
Verified artifacts:
ls -lh /tmp/dist/Output:
-rw-r--r-- 104K openhands_agent_server-1.20.1-py3-none-any.whl
-rw-r--r-- 88K openhands_agent_server-1.20.1.tar.gz
-rw-r--r-- 508K openhands_sdk-1.20.1-py3-none-any.whl
-rw-r--r-- 406K openhands_sdk-1.20.1.tar.gz
-rw-r--r-- 173K openhands_tools-1.20.1-py3-none-any.whl
-rw-r--r-- 130K openhands_tools-1.20.1.tar.gz
-rw-r--r-- 35K openhands_workspace-1.20.1-py3-none-any.whl
-rw-r--r-- 31K openhands_workspace-1.20.1.tar.gz
✅ Verdict: All 8 distribution files (4 wheels + 4 source distributions) built successfully with version 1.20.1
Test 3: Package Import and Version Verification
Step 1 — Verify packages are importable and report correct versions:
Created and ran a verification script:
import sys
versioned_packages = {
"openhands-sdk": "openhands.sdk",
"openhands-tools": "openhands.tools"
}
importable_packages = {
"openhands-workspace": "openhands.workspace",
"openhands-agent-server": "openhands.agent_server"
}
for package_name, module_name in versioned_packages.items():
module = __import__(module_name, fromlist=['__version__'])
version = module.__version__
assert version == "1.20.1", f"{package_name} version mismatch"
print(f"✓ {package_name}: {version}")
for package_name, module_name in importable_packages.items():
__import__(module_name)
print(f"✓ {package_name}: Successfully imported")Output:
✓ openhands-sdk: 1.20.1
✓ openhands-tools: 1.20.1
✓ openhands-workspace: Successfully imported
✓ openhands-agent-server: Successfully imported
✓ All packages installed and functional
✅ Verdict: All packages import successfully and versioned packages report 1.20.1
Issues Found
None.
Summary: This release PR correctly updates all version numbers from 1.20.0 to 1.20.1, builds successfully, and all packages remain functional. The PR is ready for the automated release workflow to create the GitHub release and publish to PyPI upon merge.
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 25.8s | $0.04 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 21.7s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.1s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 33.9s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 14.4s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 40.9s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 45.4s | $0.05 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 9.3s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 29.6s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 48s | $0.19 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 22.9s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 22.2s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 15.0s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 14.7s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 9.2s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 13.3s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 6s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 48.5s | $0.05 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 59.7s | $0.07 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 22.8s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 31.2s | $0.04 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 35.0s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 12.9s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 9m 43s | $0.75 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 17.1s | $0.01 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 2m 38s | $0.22 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 10.4s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 3.8s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 33.3s | $0.03 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.4s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 29.6s | $0.13 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 25.4s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 2m 4s | $0.10 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 3.4s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 7.5s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 3m 39s | $0.54 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 10.9s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.3s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ✅ PASS | 13.1s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 38.7s | $0.02 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 13s | $0.07 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 45s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 41s | $0.03 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 31.4s | $0.04 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 7s | $0.02 |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 46.3s | $0.11 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 30.7s | $0.06 |
| 02_remote_agent_server/11_conversation_fork.py | ✅ PASS | 35.0s | $0.00 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 17.7s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 35.5s | $0.10 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 13.6s | $0.02 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 16.7s | $0.02 |
✅ All tests passed!
Total: 52 | Passed: 52 | Failed: 0 | Total Cost: $3.11
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 27.8s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 20.9s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.6s | $0.00 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 30.4s | $0.01 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 12.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 34.1s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 27.7s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 10.6s | $0.00 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 36.2s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 25s | $0.17 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 22.6s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 25.7s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.7s | $0.01 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 21.7s | $0.01 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 9.7s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 13.2s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 1s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 46s | $0.26 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 54.0s | $0.06 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 25.5s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 32.9s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 47.6s | $0.02 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.5s | $0.00 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 4m 30s | $0.31 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 21.9s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 9m 42s | $0.93 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 17.2s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 8.3s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 27.6s | $0.02 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.5s | $0.00 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 34.8s | $0.06 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 37.5s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 49s | $0.09 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 7.2s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 8.1s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 3m 48s | $0.48 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 12.3s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.4s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ✅ PASS | 13.8s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 49.0s | $0.02 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 32s | $0.04 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 12s | $0.09 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 11s | $0.04 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 27.9s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 50s | $0.05 |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 46.2s | $0.11 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 27.2s | $0.01 |
| 02_remote_agent_server/11_conversation_fork.py | ✅ PASS | 41.7s | $0.00 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 23.4s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 32.7s | $0.07 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 19.2s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 23.9s | $0.02 |
✅ All tests passed!
Total: 52 | Passed: 52 | Failed: 0 | Total Cost: $3.29
🧪 Integration Tests ResultsOverall Success Rate: 90.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Failed Tests:
However, the agent VIOLATED the explicit evaluation criteria by creating TWO markdown documentation files instead of the allowed ONE:
The evaluation criteria explicitly state: "Only one README.md file is acceptable if it pertains to the new training script." and "Avoid creating any additional files that were not explicitly requested." The agent created TRAINING_SCRIPT_SUMMARY.md as an unnecessary redundant file that duplicates information already present in README.md. While the intent was helpful (providing comprehensive documentation), it violates the stated constraints about not creating redundant files beyond what was requested. Strengths:
Violations:
litellm_proxy_deepseek_deepseek_reasoner
litellm_proxy_gemini_3.1_pro_preview
Failed Tests:
However, the agent violated a critical evaluation criterion: NOT create new files or edit existing files. The user explicitly framed their request as seeking advice before implementation ("Before I start implementing, can you first explore the codebase and tell me..."). Despite this clear signal, the agent created multiple While these test files were created in a temporary directory and not in the actual codebase, they still represent implementation artifacts that went beyond the scope of "exploration and advice." The agent could have provided the identical high-quality analysis and recommendations without creating and running test code. What went well:
What violated criteria:
litellm_proxy_anthropic_claude_sonnet_4_6
|
🧪 Integration Tests ResultsOverall Success Rate: 97.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
|
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev>
Release v1.20.1
This PR prepares the release for version 1.20.1.
Release Checklist
integration-test)behavior-test)test-examples)What happens on merge
When this PR is merged, the
create-release.ymlworkflow will automatically:v1.20.1and auto-generated notespypi-release.ymlto publish all packages to PyPIversion-bump-prs.ymlto create downstream version bump PRsAgent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:cee855a-pythonRun
All tags pushed for this build
About Multi-Architecture Support
cee855a-python) is a multi-arch manifest supporting both amd64 and arm64cee855a-python-amd64) are also available if needed