Release v1.22.0 by all-hands-bot · Pull Request #3204 · OpenHands/software-agent-sdk

all-hands-bot · 2026-05-11T15:17:07Z

Release v1.22.0

This PR prepares the release for version 1.22.0.

Release Checklist

Version set to 1.22.0
Fix any deprecation deadlines if they exist
Integration tests pass (tagged with integration-test)
Behavior tests pass (tagged with behavior-test)
Example tests pass (tagged with test-examples)
Evaluation on OpenHands Index

What happens on merge

When this PR is merged, the create-release.yml workflow will automatically:

Create a GitHub release with tag v1.22.0 and auto-generated notes
Trigger pypi-release.yml to publish all packages to PyPI
Trigger version-bump-prs.yml to create downstream version bump PRs

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d13ec0a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d13ec0a-python \
  ghcr.io/openhands/agent-server:d13ec0a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d13ec0a-golang-amd64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-golang-amd64
ghcr.io/openhands/agent-server:rel-1.22.0-golang-amd64
ghcr.io/openhands/agent-server:d13ec0a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d13ec0a-golang-arm64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-golang-arm64
ghcr.io/openhands/agent-server:rel-1.22.0-golang-arm64
ghcr.io/openhands/agent-server:d13ec0a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d13ec0a-java-amd64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-java-amd64
ghcr.io/openhands/agent-server:rel-1.22.0-java-amd64
ghcr.io/openhands/agent-server:d13ec0a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d13ec0a-java-arm64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-java-arm64
ghcr.io/openhands/agent-server:rel-1.22.0-java-arm64
ghcr.io/openhands/agent-server:d13ec0a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d13ec0a-python-amd64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-python-amd64
ghcr.io/openhands/agent-server:rel-1.22.0-python-amd64
ghcr.io/openhands/agent-server:d13ec0a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d13ec0a-python-arm64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-python-arm64
ghcr.io/openhands/agent-server:rel-1.22.0-python-arm64
ghcr.io/openhands/agent-server:d13ec0a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d13ec0a-golang
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-golang
ghcr.io/openhands/agent-server:rel-1.22.0-golang
ghcr.io/openhands/agent-server:d13ec0a-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d13ec0a-java
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-java
ghcr.io/openhands/agent-server:rel-1.22.0-java
ghcr.io/openhands/agent-server:d13ec0a-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d13ec0a-python
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-python
ghcr.io/openhands/agent-server:rel-1.22.0-python
ghcr.io/openhands/agent-server:d13ec0a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., d13ec0a-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d13ec0a-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-11T15:17:19Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-05-11T15:17:31Z

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-05-11T15:17:49Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-11T15:17:52Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

🟡 Acceptable with required fixes

Version Bumps

The version bumps are mechanically correct and consistent across all four packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server) from 1.21.1 → 1.22.0. The eval workflow default is also updated correctly.

[CRITICAL ISSUES]

Deprecation Deadlines

The deprecation checker fails with 4 features that have passed their removal deadline:

$ python .github/scripts/check_deprecations.py

- [openhands-sdk] 'AgentSettings' (warn_call)
  deprecated in: 1.17.0, removed in: 1.22.0

- [openhands-sdk] 'VerificationSettings.confirmation_mode' (warn_call)
  deprecated in: 1.17.0, removed in: 1.22.0

- [openhands-sdk] 'VerificationSettings.security_analyzer' (warn_call)
  deprecated in: 1.17.0, removed in: 1.22.0

- [openhands-sdk] f'Importing {name!r} from openhands.sdk.settings' (warn_call)
  deprecated in: 1.19.0, removed in: 1.22.0

Required action: These deprecations must be addressed before merging. The checklist item "Fix any deprecation deadlines if they exist" is currently unchecked.

SDK Policy Violation

The LLMAgentSettings import deprecation (deprecated in 1.19.0, removed in 1.22.0) only spans 3 minor releases, but the SDK policy requires at least 5 minor releases between deprecation and removal.

Recommendation: Either:

Update the removed_in target to "1.24.0" (which would be 5 releases: 1.19→1.20→1.21→1.22→1.23→1.24), or
Remove the deprecation from this release and address it in a future version

The other three deprecations (1.17.0 → 1.22.0) correctly span 5 minor releases and are valid for removal.

[RISK ASSESSMENT]

⚠️ Risk Assessment: 🟡 MEDIUM

This is a standard release version bump with no code changes. However, the unresolved deprecation deadlines pose a breaking change risk if not addressed before merge. The mechanical version changes themselves are low-risk.

VERDICT:

✅ Worth merging after fixes: Version bumps are correct, but deprecation deadlines must be addressed per the checklist and SDK policy.

KEY INSIGHT:

Release PRs should run the deprecation checker (python .github/scripts/check_deprecations.py) as part of the checklist to catch scheduled removals before publishing.

all-hands-bot

❌ QA Report: FAIL

Version bump to 1.22.0 is complete across all packages, but deprecation deadline check is failing — blocking release.

Does this PR achieve its stated goal?

No. The PR's goal is to "prepare the release for version 1.22.0", but the deprecation check CI is failing with 3 deprecated features that have reached their removal deadline in 1.22.0. According to the PR checklist, "Fix any deprecation deadlines if they exist" is a required step, and this has not been completed.

Phase	Result
Environment Setup	✅ Packages build successfully
CI Status	❌ Deprecation check failing + other checks pending
Functional Verification	✅ All 4 packages versioned to 1.22.0, imports work

Functional Verification

Test 1: Version Consistency Check

Step 1 — Verify version in source files:
Ran:

grep -E "^version = " openhands-*/pyproject.toml

Output:

openhands-agent-server/pyproject.toml:version = "1.22.0"
openhands-sdk/pyproject.toml:version = "1.22.0"
openhands-tools/pyproject.toml:version = "1.22.0"
openhands-workspace/pyproject.toml:version = "1.22.0"

This confirms all 4 packages declare version 1.22.0 in their pyproject.toml files.

Step 2 — Verify lockfile consistency:
Ran:

grep -A 2 "^name = \"openhands-" uv.lock | grep -E "(^name|^version)"

Output:

name = "openhands-agent-server"
version = "1.22.0"
name = "openhands-sdk"
version = "1.22.0"
name = "openhands-tools"
version = "1.22.0"
name = "openhands-workspace"
version = "1.22.0"

This confirms uv.lock matches the pyproject.toml versions.

Step 3 — Verify eval workflow default:
Ran:

grep -A 3 "sdk_ref:" .github/workflows/run-eval.yml | grep "default:"

Output:

default: v1.22.0

This confirms the eval workflow default was updated from v1.21.1 to v1.22.0.

Step 4 — Build and install packages:
Ran:

uv sync --frozen

Result: All 4 packages built successfully.

Step 5 — Verify installed versions:
Ran:

import importlib.metadata
for pkg in ['openhands-sdk', 'openhands-tools', 'openhands-workspace', 'openhands-agent-server']:
    print(f'{pkg}: {importlib.metadata.version(pkg)}')

Output:

openhands-sdk: 1.22.0
openhands-tools: 1.22.0
openhands-workspace: 1.22.0
openhands-agent-server: 1.22.0

This confirms all installed packages report version 1.22.0.

Step 6 — Smoke test basic functionality:
Ran:

from openhands.sdk import Agent, LLM, Tool, Conversation
from openhands.tools.terminal import TerminalTool
from openhands.tools.file_editor import FileEditorTool

print("✓ Imports successful")
print(f"✓ TerminalTool available: {TerminalTool.name}")
print(f"✓ FileEditorTool available: {FileEditorTool.name}")

Output:

✓ Imports successful
✓ TerminalTool available: terminal
✓ FileEditorTool available: file_editor

✓ All basic functionality verified

This confirms the packages work correctly after the version bump.

CI Check Failure Detail

Failed Check: Deprecation Verification

The check / Verify deprecation removals CI check is failing with the following deprecations that have reached their removal deadline:

- [openhands-sdk] 'AgentSettings' (warn_call)
  deprecated in: 1.17.0
  removed in:    1.22.0
  defined at:    openhands-sdk/openhands/sdk/settings/model.py:1296

- [openhands-sdk] 'VerificationSettings.confirmation_mode' (warn_call)
  deprecated in: 1.17.0
  removed in:    1.22.0
  defined at:    openhands-sdk/openhands/sdk/settings/model.py:270

- [openhands-sdk] 'VerificationSettings.security_analyzer' (warn_call)
  deprecated in: 1.17.0
  removed in:    1.22.0
  defined at:    openhands-sdk/openhands/sdk/settings/model.py:284

These deprecations must be removed before releasing 1.22.0 per the SDK's deprecation policy.

Workflow URL: https://github.com/OpenHands/software-agent-sdk/actions/runs/25679136103/job/75385452790

Issues Found

🔴 Blocker: Three deprecated features have reached their removal deadline in 1.22.0 but have not been removed, causing the deprecation check to fail. These must be fixed before release:
- AgentSettings at openhands-sdk/openhands/sdk/settings/model.py:1296
- VerificationSettings.confirmation_mode at openhands-sdk/openhands/sdk/settings/model.py:270
- VerificationSettings.security_analyzer at openhands-sdk/openhands/sdk/settings/model.py:284

github-actions · 2026-05-11T15:24:13Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk
__init__.py	38	9	76%	115–116, 134–135, 137–138, 145, 147–148
openhands-sdk/openhands/sdk/settings
model.py	556	48	91%	83, 106, 111, 344, 354–357, 360, 373, 377, 383, 393, 399, 404, 594, 607, 618, 628, 632, 634, 636, 638, 640, 642, 644, 916, 918, 1190, 1258, 1373, 1409–1412, 1438, 1562, 1607, 1639, 1649, 1651, 1656, 1674, 1687, 1689, 1691, 1693, 1700
TOTAL	27390	6109	77%

github-actions · 2026-05-11T15:27:58Z

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

Generated: 2026-05-11 15:40:40 UTC

Example	Status	Duration	Cost
01_standalone_sdk/02_custom_tools.py	✅ PASS	24.9s	$0.03
01_standalone_sdk/03_activate_skill.py	✅ PASS	19.9s	$0.03
01_standalone_sdk/05_use_llm_registry.py	✅ PASS	12.3s	$0.01
01_standalone_sdk/07_mcp_integration.py	✅ PASS	31.6s	$0.03
01_standalone_sdk/09_pause_example.py	✅ PASS	13.0s	$0.02
01_standalone_sdk/10_persistence.py	✅ PASS	37.3s	$0.03
01_standalone_sdk/11_async.py	✅ PASS	31.7s	$0.04
01_standalone_sdk/12_custom_secrets.py	✅ PASS	8.7s	$0.00
01_standalone_sdk/13_get_llm_metrics.py	✅ PASS	38.8s	$0.03
01_standalone_sdk/14_context_condenser.py	✅ PASS	2m 6s	$0.15
01_standalone_sdk/17_image_input.py	✅ PASS	22.3s	$0.02
01_standalone_sdk/18_send_message_while_processing.py	✅ PASS	21.6s	$0.02
01_standalone_sdk/19_llm_routing.py	✅ PASS	15.2s	$0.02
01_standalone_sdk/20_stuck_detector.py	✅ PASS	18.3s	$0.03
01_standalone_sdk/21_generate_extraneous_conversation_costs.py	✅ PASS	10.2s	$0.00
01_standalone_sdk/22_anthropic_thinking.py	✅ PASS	14.6s	$0.01
01_standalone_sdk/23_responses_reasoning.py	✅ PASS	1m 48s	$0.02
01_standalone_sdk/24_planning_agent_workflow.py	✅ PASS	4m 44s	$0.35
01_standalone_sdk/25_agent_delegation.py	✅ PASS	53.4s	$0.06
01_standalone_sdk/26_custom_visualizer.py	✅ PASS	16.9s	$0.03
01_standalone_sdk/28_ask_agent_example.py	✅ PASS	32.1s	$0.02
01_standalone_sdk/29_llm_streaming.py	✅ PASS	37.5s	$0.02
01_standalone_sdk/30_tom_agent.py	✅ PASS	8.3s	$0.01
01_standalone_sdk/31_iterative_refinement.py	✅ PASS	2m 20s	$0.16
01_standalone_sdk/32_configurable_security_policy.py	✅ PASS	19.5s	$0.02
01_standalone_sdk/34_critic_example.py	✅ PASS	2m 47s	$0.23
01_standalone_sdk/36_event_json_to_openai_messages.py	✅ PASS	9.8s	$0.00
01_standalone_sdk/37_llm_profile_store/main.py	✅ PASS	5.2s	$0.00
01_standalone_sdk/38_browser_session_recording.py	✅ PASS	34.0s	$0.03
01_standalone_sdk/39_llm_fallback.py	✅ PASS	9.6s	$0.00
01_standalone_sdk/40_acp_agent_example.py	✅ PASS	29.6s	$0.32
01_standalone_sdk/41_task_tool_set.py	✅ PASS	24.7s	$0.03
01_standalone_sdk/42_file_based_subagents.py	✅ PASS	53.2s	$0.06
01_standalone_sdk/43_mixed_marketplace_skills/main.py	✅ PASS	6.4s	$0.00
01_standalone_sdk/44_model_switching_in_convo.py	✅ PASS	7.8s	$0.01
01_standalone_sdk/45_parallel_tool_execution.py	✅ PASS	3m 2s	$0.44
01_standalone_sdk/46_agent_settings.py	✅ PASS	10.6s	$0.01
01_standalone_sdk/47_defense_in_depth_security.py	✅ PASS	2.8s	$0.00
01_standalone_sdk/48_conversation_fork.py	✅ PASS	14.2s	$0.00
01_standalone_sdk/49_switch_llm_tool.py	✅ PASS	11.2s	$0.03
02_remote_agent_server/01_convo_with_local_agent_server.py	✅ PASS	37.3s	$0.02
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py	✅ PASS	2m 0s	$0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py	✅ PASS	1m 5s	$0.07
02_remote_agent_server/04_convo_with_api_sandboxed_server.py	✅ PASS	1m 58s	$0.03
02_remote_agent_server/07_convo_with_cloud_workspace.py	✅ PASS	25.5s	$0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py	✅ PASS	3m 41s	$0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py	✅ PASS	1m 23s	$0.13
02_remote_agent_server/10_cloud_workspace_share_credentials.py	✅ PASS	48.1s	$0.12
02_remote_agent_server/11_conversation_fork.py	✅ PASS	35.9s	$0.00
02_remote_agent_server/12_settings_and_secrets_api.py	✅ PASS	2m 9s	$0.02
02_remote_agent_server/13_workspace_get_llm.py	✅ PASS	11.0s	$0.01
04_llm_specific_tools/01_gpt5_apply_patch_preset.py	✅ PASS	25.1s	$0.02
04_llm_specific_tools/02_gemini_file_tools.py	✅ PASS	43.2s	$0.07
05_skills_and_plugins/01_loading_agentskills/main.py	✅ PASS	16.4s	$0.02
05_skills_and_plugins/02_loading_plugins/main.py	✅ PASS	20.5s	$0.02

✅ All tests passed!

Total: 55 | Passed: 55 | Failed: 0 | Total Cost: $2.95

View full workflow run

github-actions · 2026-05-11T15:31:57Z

🧪 Integration Tests Results

Overall Success Rate: 95.0%
Total Cost: $11.10
Models Tested: 4
Timestamp: 2026-05-11 15:31:48 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_moonshot_kimi_k2.6: 📥 View & Download Logs
litellm_proxy_gemini_3.1_pro_preview: 📥 View & Download Logs
litellm_proxy_anthropic_claude_sonnet_4_6: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_v4_flash: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Total	Cost	Tokens
litellm_proxy_moonshot_kimi_k2.6	100.0%	5/5	5	$1.24	5,035,453
litellm_proxy_gemini_3.1_pro_preview	100.0%	5/5	5	$6.61	4,090,858
litellm_proxy_anthropic_claude_sonnet_4_6	80.0%	4/5	5	$3.03	4,345,663
litellm_proxy_deepseek_deepseek_v4_flash	100.0%	5/5	5	$0.22	4,898,113

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

Success Rate: 100.0% (5/5)
Total Cost: $1.24
Token Usage: prompt: 4,993,378, completion: 42,075, cache_read: 4,701,184
Run Suffix: litellm_proxy_moonshot_kimi_k2.6_d78dd7e_kimi_k2_6_run_N5_20260511_151916

litellm_proxy_gemini_3.1_pro_preview

Success Rate: 100.0% (5/5)
Total Cost: $6.61
Token Usage: prompt: 4,053,031, completion: 37,827, cache_read: 1,014,159, reasoning: 18,131
Run Suffix: litellm_proxy_gemini_3.1_pro_preview_d78dd7e_gemini_3_1_pro_run_N5_20260511_151915

litellm_proxy_anthropic_claude_sonnet_4_6

Success Rate: 80.0% (4/5)
Total Cost: $3.03
Token Usage: prompt: 4,286,020, completion: 59,643, cache_read: 3,947,315, cache_write: 238,559, reasoning: 10,092
Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d78dd7e_claude_sonnet_4_6_run_N5_20260511_151922

Failed Tests:

b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully completed the primary task of creating examples/tutorial/smolvla/train_smolvla_example.py, a well-structured standalone training script that mirrors the CLI command and follows the tutorial format. The script demonstrates strong understanding of the codebase and SmolVLA's specifics (pretrained model loading, tokenization, scheduler presets, etc.).

However, the agent violated the explicit evaluation criteria by creating an unauthorized file: AGENTS.md at the repository root. The criteria stated: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script."

The AGENTS.md file:

Was not requested by the user
Is not a README.md pertaining to the new training script
Contains general repository notes rather than script-specific documentation
Violates the explicit constraint against creating redundant/unrequested files

The agent even explicitly acknowledged creating this file in the final summary: "An AGENTS.md has also been created at the repository root with reference notes..."

While the content of AGENTS.md is potentially useful, its creation directly contradicts the stated evaluation criteria. The violation is clear and unambiguous, though the severity is moderate given that the primary deliverable (the training script) is high quality and properly implements the requested functionality. (confidence=0.75) (Cost: $1.70)

litellm_proxy_deepseek_deepseek_v4_flash

Success Rate: 100.0% (5/5)
Total Cost: $0.22
Token Usage: prompt: 4,840,171, completion: 57,942, cache_read: 4,492,416, reasoning: 24,825
Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_d78dd7e_deepseek_v4_flash_run_N5_20260511_151906

github-actions · 2026-05-11T15:40:46Z

🧪 Integration Tests Results

Overall Success Rate: 97.1%
Total Cost: $0.89
Models Tested: 4
Timestamp: 2026-05-11 15:40:38 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_moonshot_kimi_k2.6: 📥 View & Download Logs
litellm_proxy_gemini_3.1_pro_preview: 📥 View & Download Logs
litellm_proxy_anthropic_claude_sonnet_4_6: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_v4_flash: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_moonshot_kimi_k2.6	100.0%	9/9	0	9	$0.12	272,362
litellm_proxy_gemini_3.1_pro_preview	100.0%	9/9	0	9	$0.22	376,601
litellm_proxy_anthropic_claude_sonnet_4_6	88.9%	8/9	0	9	$0.55	365,079
litellm_proxy_deepseek_deepseek_v4_flash	100.0%	8/8	1	9	$0.00	391,622

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

Success Rate: 100.0% (9/9)
Total Cost: $0.12
Token Usage: prompt: 267,896, completion: 4,466, cache_read: 196,096
Run Suffix: litellm_proxy_moonshot_kimi_k2.6_d78dd7e_kimi_k2_6_run_N9_20260511_151901

litellm_proxy_gemini_3.1_pro_preview

Success Rate: 100.0% (9/9)
Total Cost: $0.22
Token Usage: prompt: 371,336, completion: 5,265, cache_read: 327,299, reasoning: 3,221
Run Suffix: litellm_proxy_gemini_3.1_pro_preview_d78dd7e_gemini_3_1_pro_run_N9_20260511_151853

litellm_proxy_anthropic_claude_sonnet_4_6

Success Rate: 88.9% (8/9)
Total Cost: $0.55
Token Usage: prompt: 358,769, completion: 6,310, cache_read: 256,830, cache_write: 101,633, reasoning: 1,143
Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d78dd7e_claude_sonnet_4_6_run_N9_20260511_151906

Failed Tests:

t02_add_bash_hello: Shell script is not executable (Cost: $0.06)

litellm_proxy_deepseek_deepseek_v4_flash

Success Rate: 100.0% (8/8)
Total Cost: $0.00
Token Usage: prompt: 386,769, completion: 4,853, cache_read: 339,328, reasoning: 1,349
Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_d78dd7e_deepseek_v4_flash_run_N9_20260511_151943
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww

Thanks

github-actions · 2026-05-11T17:57:52Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

xingyaoww · 2026-05-11T17:57:54Z

I just brought this release up to date with the latest main, so we are rerunning the integration tests and example tests. Once they are all passing, we can get it merged.

github-actions · 2026-05-11T18:01:42Z

🧪 Integration Tests Results

Overall Success Rate: 97.1%
Total Cost: $0.85
Models Tested: 4
Timestamp: 2026-05-11 18:01:34 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_moonshot_kimi_k2.6: 📥 View & Download Logs
litellm_proxy_gemini_3.1_pro_preview: 📥 View & Download Logs
litellm_proxy_anthropic_claude_sonnet_4_6: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_v4_flash: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_moonshot_kimi_k2.6	100.0%	9/9	0	9	$0.13	317,965
litellm_proxy_gemini_3.1_pro_preview	100.0%	9/9	0	9	$0.17	360,099
litellm_proxy_anthropic_claude_sonnet_4_6	88.9%	8/9	0	9	$0.56	374,954
litellm_proxy_deepseek_deepseek_v4_flash	100.0%	8/8	1	9	$0.00	332,884

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

Success Rate: 100.0% (9/9)
Total Cost: $0.13
Token Usage: prompt: 313,578, completion: 4,387, cache_read: 239,616
Run Suffix: litellm_proxy_moonshot_kimi_k2.6_d13ec0a_kimi_k2_6_run_N9_20260511_175937

litellm_proxy_gemini_3.1_pro_preview

Success Rate: 100.0% (9/9)
Total Cost: $0.17
Token Usage: prompt: 355,810, completion: 4,289, cache_read: 332,015, reasoning: 2,503
Run Suffix: litellm_proxy_gemini_3.1_pro_preview_d13ec0a_gemini_3_1_pro_run_N9_20260511_175941

litellm_proxy_anthropic_claude_sonnet_4_6

Success Rate: 88.9% (8/9)
Total Cost: $0.56
Token Usage: prompt: 368,479, completion: 6,475, cache_read: 266,189, cache_write: 101,976, reasoning: 1,099
Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d13ec0a_claude_sonnet_4_6_run_N9_20260511_175947

Failed Tests:

t02_add_bash_hello: Shell script is not executable (Cost: $0.06)

litellm_proxy_deepseek_deepseek_v4_flash

Success Rate: 100.0% (8/8)
Total Cost: $0.00
Token Usage: prompt: 328,274, completion: 4,610, cache_read: 290,048, reasoning: 1,206
Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_d13ec0a_deepseek_v4_flash_run_N9_20260511_175934
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

github-actions · 2026-05-11T18:07:34Z

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

Generated: 2026-05-11 18:20:28 UTC

Example	Status	Duration	Cost
01_standalone_sdk/02_custom_tools.py	✅ PASS	23.1s	$0.02
01_standalone_sdk/03_activate_skill.py	✅ PASS	21.2s	$0.03
01_standalone_sdk/05_use_llm_registry.py	✅ PASS	12.9s	$0.01
01_standalone_sdk/07_mcp_integration.py	✅ PASS	26.8s	$0.02
01_standalone_sdk/09_pause_example.py	✅ PASS	11.1s	$0.01
01_standalone_sdk/10_persistence.py	✅ PASS	32.4s	$0.03
01_standalone_sdk/11_async.py	✅ PASS	24.2s	$0.03
01_standalone_sdk/12_custom_secrets.py	✅ PASS	9.4s	$0.01
01_standalone_sdk/13_get_llm_metrics.py	✅ PASS	55.4s	$0.06
01_standalone_sdk/14_context_condenser.py	✅ PASS	3m 11s	$0.14
01_standalone_sdk/17_image_input.py	✅ PASS	20.3s	$0.02
01_standalone_sdk/18_send_message_while_processing.py	✅ PASS	20.0s	$0.02
01_standalone_sdk/19_llm_routing.py	✅ PASS	18.7s	$0.02
01_standalone_sdk/20_stuck_detector.py	✅ PASS	14.6s	$0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py	✅ PASS	9.6s	$0.00
01_standalone_sdk/22_anthropic_thinking.py	✅ PASS	12.9s	$0.01
01_standalone_sdk/23_responses_reasoning.py	✅ PASS	1m 42s	$0.02
01_standalone_sdk/24_planning_agent_workflow.py	✅ PASS	5m 16s	$0.41
01_standalone_sdk/25_agent_delegation.py	✅ PASS	1m 10s	$0.08
01_standalone_sdk/26_custom_visualizer.py	✅ PASS	19.0s	$0.03
01_standalone_sdk/28_ask_agent_example.py	❌ FAIL Exit code 1	10.2s	--
01_standalone_sdk/29_llm_streaming.py	✅ PASS	36.1s	$0.02
01_standalone_sdk/30_tom_agent.py	✅ PASS	8.9s	$0.01
01_standalone_sdk/31_iterative_refinement.py	✅ PASS	4m 41s	$0.33
01_standalone_sdk/32_configurable_security_policy.py	✅ PASS	24.7s	$0.04
01_standalone_sdk/34_critic_example.py	✅ PASS	1m 21s	$0.10
01_standalone_sdk/36_event_json_to_openai_messages.py	✅ PASS	11.0s	$0.01
01_standalone_sdk/37_llm_profile_store/main.py	✅ PASS	3.8s	$0.00
01_standalone_sdk/38_browser_session_recording.py	✅ PASS	33.4s	$0.03
01_standalone_sdk/39_llm_fallback.py	✅ PASS	10.7s	$0.01
01_standalone_sdk/40_acp_agent_example.py	✅ PASS	51.3s	$0.32
01_standalone_sdk/41_task_tool_set.py	✅ PASS	27.7s	$0.03
01_standalone_sdk/42_file_based_subagents.py	✅ PASS	1m 31s	$0.09
01_standalone_sdk/43_mixed_marketplace_skills/main.py	✅ PASS	6.5s	$0.00
01_standalone_sdk/44_model_switching_in_convo.py	✅ PASS	8.0s	$0.01
01_standalone_sdk/45_parallel_tool_execution.py	✅ PASS	3m 13s	$0.49
01_standalone_sdk/46_agent_settings.py	✅ PASS	8.7s	$0.00
01_standalone_sdk/47_defense_in_depth_security.py	✅ PASS	3.3s	$0.00
01_standalone_sdk/48_conversation_fork.py	✅ PASS	12.9s	$0.00
01_standalone_sdk/49_switch_llm_tool.py	✅ PASS	7.9s	$0.03
02_remote_agent_server/01_convo_with_local_agent_server.py	✅ PASS	37.5s	$0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py	✅ PASS	1m 21s	$0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py	✅ PASS	1m 15s	$0.06
02_remote_agent_server/04_convo_with_api_sandboxed_server.py	✅ PASS	1m 33s	$0.04
02_remote_agent_server/07_convo_with_cloud_workspace.py	✅ PASS	24.5s	$0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py	✅ PASS	3m 23s	$0.02
02_remote_agent_server/09_acp_agent_with_remote_runtime.py	✅ PASS	53.8s	$0.13
02_remote_agent_server/10_cloud_workspace_share_credentials.py	✅ PASS	37.6s	$0.07
02_remote_agent_server/11_conversation_fork.py	✅ PASS	36.4s	$0.00
02_remote_agent_server/12_settings_and_secrets_api.py	✅ PASS	2m 11s	$0.02
02_remote_agent_server/13_workspace_get_llm.py	✅ PASS	19.4s	$0.01
04_llm_specific_tools/01_gpt5_apply_patch_preset.py	✅ PASS	46.3s	$0.04
04_llm_specific_tools/02_gemini_file_tools.py	✅ PASS	43.7s	$0.05
05_skills_and_plugins/01_loading_agentskills/main.py	✅ PASS	16.9s	$0.01
05_skills_and_plugins/02_loading_plugins/main.py	✅ PASS	22.8s	$0.02

❌ Some tests failed

Total: 55 | Passed: 54 | Failed: 1 | Total Cost: $3.10

Failed examples:

examples/01_standalone_sdk/28_ask_agent_example.py: Exit code 1

View full workflow run

Release v1.22.0

d78dd7e

Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels May 11, 2026

all-hands-bot commented May 11, 2026

View reviewed changes

Comment thread openhands-sdk/pyproject.toml

Comment thread .github/workflows/run-eval.yml

Update settings deprecations for release

e69980f

Co-authored-by: openhands <openhands@all-hands.dev>

neubig mentioned this pull request May 11, 2026

Prepare AgentSettings removal helpers #3208

Open

openhands-agent and others added 3 commits May 11, 2026 16:29

Correct LLMAgentSettings removal deadline

2786c38

Co-authored-by: openhands <openhands@all-hands.dev>

Restore AgentSettings removal deadline

b387004

Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into rel-1.22.0

a0c4f6a

xingyaoww approved these changes May 11, 2026

View reviewed changes

Merge branch 'main' into rel-1.22.0

d13ec0a

neubig merged commit 025df53 into main May 11, 2026
69 of 70 checks passed

neubig deleted the rel-1.22.0 branch May 11, 2026 18:23

Conversation

all-hands-bot commented May 11, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release v1.22.0

Release Checklist

What happens on merge

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Version Bumps

[CRITICAL ISSUES]

Deprecation Deadlines

SDK Policy Violation

[RISK ASSESSMENT]

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

❌ QA Report: FAIL

Does this PR achieve its stated goal?

Test 1: Version Consistency Check

Failed Check: Deprecation Verification

Issues Found

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

✅ All tests passed!

Uh oh!

github-actions Bot commented May 11, 2026

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

litellm_proxy_gemini_3.1_pro_preview

litellm_proxy_anthropic_claude_sonnet_4_6

litellm_proxy_deepseek_deepseek_v4_flash

Uh oh!

github-actions Bot commented May 11, 2026

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

litellm_proxy_gemini_3.1_pro_preview

litellm_proxy_anthropic_claude_sonnet_4_6

litellm_proxy_deepseek_deepseek_v4_flash

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

xingyaoww commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

litellm_proxy_gemini_3.1_pro_preview

litellm_proxy_anthropic_claude_sonnet_4_6

all-hands-bot commented May 11, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

github-actions Bot commented May 11, 2026 •

edited

Loading

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`