Skip to content

Release v1.22.0#3204

Merged
neubig merged 6 commits into
mainfrom
rel-1.22.0
May 11, 2026
Merged

Release v1.22.0#3204
neubig merged 6 commits into
mainfrom
rel-1.22.0

Conversation

@all-hands-bot
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot commented May 11, 2026

Release v1.22.0

This PR prepares the release for version 1.22.0.

Release Checklist

  • Version set to 1.22.0
  • Fix any deprecation deadlines if they exist
  • Integration tests pass (tagged with integration-test)
  • Behavior tests pass (tagged with behavior-test)
  • Example tests pass (tagged with test-examples)
  • Evaluation on OpenHands Index

What happens on merge

When this PR is merged, the create-release.yml workflow will automatically:

  1. Create a GitHub release with tag v1.22.0 and auto-generated notes
  2. Trigger pypi-release.yml to publish all packages to PyPI
  3. Trigger version-bump-prs.yml to create downstream version bump PRs

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d13ec0a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d13ec0a-python \
  ghcr.io/openhands/agent-server:d13ec0a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d13ec0a-golang-amd64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-golang-amd64
ghcr.io/openhands/agent-server:rel-1.22.0-golang-amd64
ghcr.io/openhands/agent-server:d13ec0a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d13ec0a-golang-arm64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-golang-arm64
ghcr.io/openhands/agent-server:rel-1.22.0-golang-arm64
ghcr.io/openhands/agent-server:d13ec0a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d13ec0a-java-amd64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-java-amd64
ghcr.io/openhands/agent-server:rel-1.22.0-java-amd64
ghcr.io/openhands/agent-server:d13ec0a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d13ec0a-java-arm64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-java-arm64
ghcr.io/openhands/agent-server:rel-1.22.0-java-arm64
ghcr.io/openhands/agent-server:d13ec0a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d13ec0a-python-amd64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-python-amd64
ghcr.io/openhands/agent-server:rel-1.22.0-python-amd64
ghcr.io/openhands/agent-server:d13ec0a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d13ec0a-python-arm64
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-python-arm64
ghcr.io/openhands/agent-server:rel-1.22.0-python-arm64
ghcr.io/openhands/agent-server:d13ec0a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d13ec0a-golang
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-golang
ghcr.io/openhands/agent-server:rel-1.22.0-golang
ghcr.io/openhands/agent-server:d13ec0a-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d13ec0a-java
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-java
ghcr.io/openhands/agent-server:rel-1.22.0-java
ghcr.io/openhands/agent-server:d13ec0a-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d13ec0a-python
ghcr.io/openhands/agent-server:d13ec0adf97dbfe6d5f8e4e7dff203b7ba011067-python
ghcr.io/openhands/agent-server:rel-1.22.0-python
ghcr.io/openhands/agent-server:d13ec0a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., d13ec0a-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d13ec0a-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
@all-hands-bot all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator Author

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable with required fixes

Version Bumps

The version bumps are mechanically correct and consistent across all four packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server) from 1.21.1 → 1.22.0. The eval workflow default is also updated correctly.

[CRITICAL ISSUES]

Deprecation Deadlines

The deprecation checker fails with 4 features that have passed their removal deadline:

$ python .github/scripts/check_deprecations.py

- [openhands-sdk] 'AgentSettings' (warn_call)
  deprecated in: 1.17.0, removed in: 1.22.0

- [openhands-sdk] 'VerificationSettings.confirmation_mode' (warn_call)
  deprecated in: 1.17.0, removed in: 1.22.0

- [openhands-sdk] 'VerificationSettings.security_analyzer' (warn_call)
  deprecated in: 1.17.0, removed in: 1.22.0

- [openhands-sdk] f'Importing {name!r} from openhands.sdk.settings' (warn_call)
  deprecated in: 1.19.0, removed in: 1.22.0

Required action: These deprecations must be addressed before merging. The checklist item "Fix any deprecation deadlines if they exist" is currently unchecked.

SDK Policy Violation

The LLMAgentSettings import deprecation (deprecated in 1.19.0, removed in 1.22.0) only spans 3 minor releases, but the SDK policy requires at least 5 minor releases between deprecation and removal.

Recommendation: Either:

  1. Update the removed_in target to "1.24.0" (which would be 5 releases: 1.19→1.20→1.21→1.22→1.23→1.24), or
  2. Remove the deprecation from this release and address it in a future version

The other three deprecations (1.17.0 → 1.22.0) correctly span 5 minor releases and are valid for removal.

[RISK ASSESSMENT]

⚠️ Risk Assessment: 🟡 MEDIUM

This is a standard release version bump with no code changes. However, the unresolved deprecation deadlines pose a breaking change risk if not addressed before merge. The mechanical version changes themselves are low-risk.

VERDICT:

Worth merging after fixes: Version bumps are correct, but deprecation deadlines must be addressed per the checklist and SDK policy.

KEY INSIGHT:

Release PRs should run the deprecation checker (python .github/scripts/check_deprecations.py) as part of the checklist to catch scheduled removals before publishing.

Copy link
Copy Markdown
Collaborator Author

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ QA Report: FAIL

Version bump to 1.22.0 is complete across all packages, but deprecation deadline check is failing — blocking release.

Does this PR achieve its stated goal?

No. The PR's goal is to "prepare the release for version 1.22.0", but the deprecation check CI is failing with 3 deprecated features that have reached their removal deadline in 1.22.0. According to the PR checklist, "Fix any deprecation deadlines if they exist" is a required step, and this has not been completed.

Phase Result
Environment Setup ✅ Packages build successfully
CI Status Deprecation check failing + other checks pending
Functional Verification ✅ All 4 packages versioned to 1.22.0, imports work
Functional Verification

Test 1: Version Consistency Check

Step 1 — Verify version in source files:
Ran:

grep -E "^version = " openhands-*/pyproject.toml

Output:

openhands-agent-server/pyproject.toml:version = "1.22.0"
openhands-sdk/pyproject.toml:version = "1.22.0"
openhands-tools/pyproject.toml:version = "1.22.0"
openhands-workspace/pyproject.toml:version = "1.22.0"

This confirms all 4 packages declare version 1.22.0 in their pyproject.toml files.

Step 2 — Verify lockfile consistency:
Ran:

grep -A 2 "^name = \"openhands-" uv.lock | grep -E "(^name|^version)"

Output:

name = "openhands-agent-server"
version = "1.22.0"
name = "openhands-sdk"
version = "1.22.0"
name = "openhands-tools"
version = "1.22.0"
name = "openhands-workspace"
version = "1.22.0"

This confirms uv.lock matches the pyproject.toml versions.

Step 3 — Verify eval workflow default:
Ran:

grep -A 3 "sdk_ref:" .github/workflows/run-eval.yml | grep "default:"

Output:

default: v1.22.0

This confirms the eval workflow default was updated from v1.21.1 to v1.22.0.

Step 4 — Build and install packages:
Ran:

uv sync --frozen

Result: All 4 packages built successfully.

Step 5 — Verify installed versions:
Ran:

import importlib.metadata
for pkg in ['openhands-sdk', 'openhands-tools', 'openhands-workspace', 'openhands-agent-server']:
    print(f'{pkg}: {importlib.metadata.version(pkg)}')

Output:

openhands-sdk: 1.22.0
openhands-tools: 1.22.0
openhands-workspace: 1.22.0
openhands-agent-server: 1.22.0

This confirms all installed packages report version 1.22.0.

Step 6 — Smoke test basic functionality:
Ran:

from openhands.sdk import Agent, LLM, Tool, Conversation
from openhands.tools.terminal import TerminalTool
from openhands.tools.file_editor import FileEditorTool

print("✓ Imports successful")
print(f"✓ TerminalTool available: {TerminalTool.name}")
print(f"✓ FileEditorTool available: {FileEditorTool.name}")

Output:

✓ Imports successful
✓ TerminalTool available: terminal
✓ FileEditorTool available: file_editor

✓ All basic functionality verified

This confirms the packages work correctly after the version bump.

CI Check Failure Detail

Failed Check: Deprecation Verification

The check / Verify deprecation removals CI check is failing with the following deprecations that have reached their removal deadline:

- [openhands-sdk] 'AgentSettings' (warn_call)
  deprecated in: 1.17.0
  removed in:    1.22.0
  defined at:    openhands-sdk/openhands/sdk/settings/model.py:1296

- [openhands-sdk] 'VerificationSettings.confirmation_mode' (warn_call)
  deprecated in: 1.17.0
  removed in:    1.22.0
  defined at:    openhands-sdk/openhands/sdk/settings/model.py:270

- [openhands-sdk] 'VerificationSettings.security_analyzer' (warn_call)
  deprecated in: 1.17.0
  removed in:    1.22.0
  defined at:    openhands-sdk/openhands/sdk/settings/model.py:284

These deprecations must be removed before releasing 1.22.0 per the SDK's deprecation policy.

Workflow URL: https://github.com/OpenHands/software-agent-sdk/actions/runs/25679136103/job/75385452790

Issues Found

  • 🔴 Blocker: Three deprecated features have reached their removal deadline in 1.22.0 but have not been removed, causing the deprecation check to fail. These must be fixed before release:
    • AgentSettings at openhands-sdk/openhands/sdk/settings/model.py:1296
    • VerificationSettings.confirmation_mode at openhands-sdk/openhands/sdk/settings/model.py:270
    • VerificationSettings.security_analyzer at openhands-sdk/openhands/sdk/settings/model.py:284

Comment thread openhands-sdk/pyproject.toml
Comment thread .github/workflows/run-eval.yml
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk
   __init__.py38976%115–116, 134–135, 137–138, 145, 147–148
openhands-sdk/openhands/sdk/settings
   model.py5564891%83, 106, 111, 344, 354–357, 360, 373, 377, 383, 393, 399, 404, 594, 607, 618, 628, 632, 634, 636, 638, 640, 642, 644, 916, 918, 1190, 1258, 1373, 1409–1412, 1438, 1562, 1607, 1639, 1649, 1651, 1656, 1674, 1687, 1689, 1691, 1693, 1700
TOTAL27390610977% 

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-05-11 15:40:40 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 24.9s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 19.9s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 12.3s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 31.6s $0.03
01_standalone_sdk/09_pause_example.py ✅ PASS 13.0s $0.02
01_standalone_sdk/10_persistence.py ✅ PASS 37.3s $0.03
01_standalone_sdk/11_async.py ✅ PASS 31.7s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 8.7s $0.00
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 38.8s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 6s $0.15
01_standalone_sdk/17_image_input.py ✅ PASS 22.3s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 21.6s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 15.2s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 18.3s $0.03
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.2s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 14.6s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 48s $0.02
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 4m 44s $0.35
01_standalone_sdk/25_agent_delegation.py ✅ PASS 53.4s $0.06
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 16.9s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 32.1s $0.02
01_standalone_sdk/29_llm_streaming.py ✅ PASS 37.5s $0.02
01_standalone_sdk/30_tom_agent.py ✅ PASS 8.3s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 2m 20s $0.16
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 19.5s $0.02
01_standalone_sdk/34_critic_example.py ✅ PASS 2m 47s $0.23
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 9.8s $0.00
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 5.2s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 34.0s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 9.6s $0.00
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 29.6s $0.32
01_standalone_sdk/41_task_tool_set.py ✅ PASS 24.7s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 53.2s $0.06
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 6.4s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 7.8s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 3m 2s $0.44
01_standalone_sdk/46_agent_settings.py ✅ PASS 10.6s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 2.8s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 14.2s $0.00
01_standalone_sdk/49_switch_llm_tool.py ✅ PASS 11.2s $0.03
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 37.3s $0.02
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 2m 0s $0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 5s $0.07
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 58s $0.03
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 25.5s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 41s $0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 1m 23s $0.13
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 48.1s $0.12
02_remote_agent_server/11_conversation_fork.py ✅ PASS 35.9s $0.00
02_remote_agent_server/12_settings_and_secrets_api.py ✅ PASS 2m 9s $0.02
02_remote_agent_server/13_workspace_get_llm.py ✅ PASS 11.0s $0.01
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 25.1s $0.02
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 43.2s $0.07
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 16.4s $0.02
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 20.5s $0.02

✅ All tests passed!

Total: 55 | Passed: 55 | Failed: 0 | Total Cost: $2.95

View full workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 95.0%
Total Cost: $11.10
Models Tested: 4
Timestamp: 2026-05-11 15:31:48 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2.6 100.0% 5/5 0 5 $1.24 5,035,453
litellm_proxy_gemini_3.1_pro_preview 100.0% 5/5 0 5 $6.61 4,090,858
litellm_proxy_anthropic_claude_sonnet_4_6 80.0% 4/5 0 5 $3.03 4,345,663
litellm_proxy_deepseek_deepseek_v4_flash 100.0% 5/5 0 5 $0.22 4,898,113

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

  • Success Rate: 100.0% (5/5)
  • Total Cost: $1.24
  • Token Usage: prompt: 4,993,378, completion: 42,075, cache_read: 4,701,184
  • Run Suffix: litellm_proxy_moonshot_kimi_k2.6_d78dd7e_kimi_k2_6_run_N5_20260511_151916

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 100.0% (5/5)
  • Total Cost: $6.61
  • Token Usage: prompt: 4,053,031, completion: 37,827, cache_read: 1,014,159, reasoning: 18,131
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_d78dd7e_gemini_3_1_pro_run_N5_20260511_151915

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 80.0% (4/5)
  • Total Cost: $3.03
  • Token Usage: prompt: 4,286,020, completion: 59,643, cache_read: 3,947,315, cache_write: 238,559, reasoning: 10,092
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d78dd7e_claude_sonnet_4_6_run_N5_20260511_151922

Failed Tests:

  • b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully completed the primary task of creating examples/tutorial/smolvla/train_smolvla_example.py, a well-structured standalone training script that mirrors the CLI command and follows the tutorial format. The script demonstrates strong understanding of the codebase and SmolVLA's specifics (pretrained model loading, tokenization, scheduler presets, etc.).

However, the agent violated the explicit evaluation criteria by creating an unauthorized file: AGENTS.md at the repository root. The criteria stated: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script."

The AGENTS.md file:

  1. Was not requested by the user
  2. Is not a README.md pertaining to the new training script
  3. Contains general repository notes rather than script-specific documentation
  4. Violates the explicit constraint against creating redundant/unrequested files

The agent even explicitly acknowledged creating this file in the final summary: "An AGENTS.md has also been created at the repository root with reference notes..."

While the content of AGENTS.md is potentially useful, its creation directly contradicts the stated evaluation criteria. The violation is clear and unambiguous, though the severity is moderate given that the primary deliverable (the training script) is high quality and properly implements the requested functionality. (confidence=0.75) (Cost: $1.70)

litellm_proxy_deepseek_deepseek_v4_flash

  • Success Rate: 100.0% (5/5)
  • Total Cost: $0.22
  • Token Usage: prompt: 4,840,171, completion: 57,942, cache_read: 4,492,416, reasoning: 24,825
  • Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_d78dd7e_deepseek_v4_flash_run_N5_20260511_151906

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 97.1%
Total Cost: $0.89
Models Tested: 4
Timestamp: 2026-05-11 15:40:38 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2.6 100.0% 9/9 0 9 $0.12 272,362
litellm_proxy_gemini_3.1_pro_preview 100.0% 9/9 0 9 $0.22 376,601
litellm_proxy_anthropic_claude_sonnet_4_6 88.9% 8/9 0 9 $0.55 365,079
litellm_proxy_deepseek_deepseek_v4_flash 100.0% 8/8 1 9 $0.00 391,622

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.12
  • Token Usage: prompt: 267,896, completion: 4,466, cache_read: 196,096
  • Run Suffix: litellm_proxy_moonshot_kimi_k2.6_d78dd7e_kimi_k2_6_run_N9_20260511_151901

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.22
  • Token Usage: prompt: 371,336, completion: 5,265, cache_read: 327,299, reasoning: 3,221
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_d78dd7e_gemini_3_1_pro_run_N9_20260511_151853

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 88.9% (8/9)
  • Total Cost: $0.55
  • Token Usage: prompt: 358,769, completion: 6,310, cache_read: 256,830, cache_write: 101,633, reasoning: 1,143
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d78dd7e_claude_sonnet_4_6_run_N9_20260511_151906

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.06)

litellm_proxy_deepseek_deepseek_v4_flash

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.00
  • Token Usage: prompt: 386,769, completion: 4,853, cache_read: 339,328, reasoning: 1,349
  • Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_d78dd7e_deepseek_v4_flash_run_N9_20260511_151943
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Co-authored-by: openhands <openhands@all-hands.dev>
openhands-agent and others added 3 commits May 11, 2026 16:29
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@xingyaoww xingyaoww added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. and removed integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. labels May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@xingyaoww
Copy link
Copy Markdown
Collaborator

I just brought this release up to date with the latest main, so we are rerunning the integration tests and example tests. Once they are all passing, we can get it merged.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 97.1%
Total Cost: $0.85
Models Tested: 4
Timestamp: 2026-05-11 18:01:34 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2.6 100.0% 9/9 0 9 $0.13 317,965
litellm_proxy_gemini_3.1_pro_preview 100.0% 9/9 0 9 $0.17 360,099
litellm_proxy_anthropic_claude_sonnet_4_6 88.9% 8/9 0 9 $0.56 374,954
litellm_proxy_deepseek_deepseek_v4_flash 100.0% 8/8 1 9 $0.00 332,884

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2.6

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.13
  • Token Usage: prompt: 313,578, completion: 4,387, cache_read: 239,616
  • Run Suffix: litellm_proxy_moonshot_kimi_k2.6_d13ec0a_kimi_k2_6_run_N9_20260511_175937

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.17
  • Token Usage: prompt: 355,810, completion: 4,289, cache_read: 332,015, reasoning: 2,503
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_d13ec0a_gemini_3_1_pro_run_N9_20260511_175941

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 88.9% (8/9)
  • Total Cost: $0.56
  • Token Usage: prompt: 368,479, completion: 6,475, cache_read: 266,189, cache_write: 101,976, reasoning: 1,099
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d13ec0a_claude_sonnet_4_6_run_N9_20260511_175947

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.06)

litellm_proxy_deepseek_deepseek_v4_flash

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.00
  • Token Usage: prompt: 328,274, completion: 4,610, cache_read: 290,048, reasoning: 1,206
  • Run Suffix: litellm_proxy_deepseek_deepseek_v4_flash_d13ec0a_deepseek_v4_flash_run_N9_20260511_175934
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-05-11 18:20:28 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 23.1s $0.02
01_standalone_sdk/03_activate_skill.py ✅ PASS 21.2s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 12.9s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 26.8s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 11.1s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 32.4s $0.03
01_standalone_sdk/11_async.py ✅ PASS 24.2s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 9.4s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 55.4s $0.06
01_standalone_sdk/14_context_condenser.py ✅ PASS 3m 11s $0.14
01_standalone_sdk/17_image_input.py ✅ PASS 20.3s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 20.0s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 18.7s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 14.6s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 9.6s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 12.9s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 42s $0.02
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 5m 16s $0.41
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 10s $0.08
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 19.0s $0.03
01_standalone_sdk/28_ask_agent_example.py ❌ FAIL
Exit code 1
10.2s --
01_standalone_sdk/29_llm_streaming.py ✅ PASS 36.1s $0.02
01_standalone_sdk/30_tom_agent.py ✅ PASS 8.9s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 4m 41s $0.33
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 24.7s $0.04
01_standalone_sdk/34_critic_example.py ✅ PASS 1m 21s $0.10
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 11.0s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 3.8s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 33.4s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 10.7s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 51.3s $0.32
01_standalone_sdk/41_task_tool_set.py ✅ PASS 27.7s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 1m 31s $0.09
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 6.5s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 8.0s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 3m 13s $0.49
01_standalone_sdk/46_agent_settings.py ✅ PASS 8.7s $0.00
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.3s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 12.9s $0.00
01_standalone_sdk/49_switch_llm_tool.py ✅ PASS 7.9s $0.03
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 37.5s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 21s $0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 15s $0.06
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 33s $0.04
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 24.5s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 23s $0.02
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 53.8s $0.13
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 37.6s $0.07
02_remote_agent_server/11_conversation_fork.py ✅ PASS 36.4s $0.00
02_remote_agent_server/12_settings_and_secrets_api.py ✅ PASS 2m 11s $0.02
02_remote_agent_server/13_workspace_get_llm.py ✅ PASS 19.4s $0.01
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 46.3s $0.04
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 43.7s $0.05
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 16.9s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 22.8s $0.02

❌ Some tests failed

Total: 55 | Passed: 54 | Failed: 1 | Total Cost: $3.10

Failed examples:

  • examples/01_standalone_sdk/28_ask_agent_example.py: Exit code 1

View full workflow run

@neubig neubig merged commit 025df53 into main May 11, 2026
69 of 70 checks passed
@neubig neubig deleted the rel-1.22.0 branch May 11, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

behavior-test integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants