fix(view): prevent KeyError on duplicate observation-like events#3051
Conversation
|
✅ PR Artifacts Cleaned Up The |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
Clean bug fix for the duplicate observation-like events crash. The two-layer defense (enforce detects duplicates + manipulation_indices uses discard) is pragmatic and well-tested.
[RISK ASSESSMENT]
Bug fix with straightforward logic addressing a real production issue (crash recovery duplicates). Changes are isolated to duplicate detection and don't affect normal operation. Good test coverage for the primary fix.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean bug fix with appropriate two-layer defense.
The duplicate detection logic in enforce() is straightforward and the defensive discard() fallback in manipulation_indices() is pragmatic. Test coverage directly validates the crash recovery scenario.
[RISK ASSESSMENT]
Isolated bug fix addressing a real crash scenario. Changes are defensive and well-tested. No impact on normal operation.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Bug fix verified - duplicate observation events are now properly detected and removed, preventing KeyError crashes in condensation.
Does this PR achieve its stated goal?
Yes. The PR successfully fixes issue #3045 where crash recovery could create duplicate observation-like events (AgentErrorEvent + late ObservationEvent) with the same tool_call_id, causing a KeyError in condensation. I reproduced the exact crash scenario on main (KeyError raised), then verified the fix eliminates the crash by: (1) making enforce() cardinality-aware to drop duplicates, and (2) using discard() in manipulation_indices() as a defensive layer. The reproduction script demonstrates the fix working correctly, and the new regression test passes.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed, uv environment ready |
| CI Status | ✅ All 29 checks passing (sdk-tests, agent-server-tests, pre-commit, coverage, etc.) |
| Functional Verification | ✅ Bug reproduced on main, fix verified on PR branch |
Functional Verification
Test 1: Reproduce the KeyError crash on main branch
Step 1 — Reproduce the bug (without the fix):
Checked out origin/main and ran the reproduction script that simulates crash recovery creating duplicate observation-like events:
cd /home/runner/work/software-agent-sdk/software-agent-sdk/pr-repo
git checkout origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run --frozen python /tmp/reproduce_issue_3045.pyOutput:
==============================================================================
Event sequence (1 ActionEvent + 2 observation-like events for the SAME tool_call_id):
[0] MessageEvent tool_call_id=
[1] ActionEvent tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm
[2] AgentErrorEvent tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm
[3] ObservationEvent tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm
ToolCallMatchingProperty.enforce() => events_to_remove = set()
Calling ToolCallMatchingProperty.manipulation_indices(events)...
-> KeyError: KeyError('toolu_01WK8m53DJP6KkuqVCBSbYzm') (BUG present)
Building View.from_events() and accessing view.manipulation_indices...
view len = 4
-> KeyError: KeyError('toolu_01WK8m53DJP6KkuqVCBSbYzm') (BUG present)
Interpretation: This confirms the bug exists. The scenario has 1 ActionEvent followed by 2 observation-like events (AgentErrorEvent from crash recovery + late ObservationEvent) with the same tool_call_id. The old enforce() returns events_to_remove = set() (fails to detect the duplicate), and both manipulation_indices() calls crash with KeyError when trying to remove the same tool_call_id twice from the pending set.
Step 2 — Apply the PR's changes:
Checked out the PR branch:
git checkout vasco/crash-recovery-can-create-duplicate-tool-results-and-crash-condensation-with-KeyErrorStep 3 — Re-run with the fix in place:
Ran the same reproduction script:
OPENHANDS_SUPPRESS_BANNER=1 uv run --frozen python /tmp/reproduce_issue_3045.pyOutput:
==============================================================================
Event sequence (1 ActionEvent + 2 observation-like events for the SAME tool_call_id):
[0] MessageEvent tool_call_id=
[1] ActionEvent tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm
[2] AgentErrorEvent tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm
[3] ObservationEvent tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm
ToolCallMatchingProperty.enforce() => events_to_remove = {'obs-late'}
Calling ToolCallMatchingProperty.manipulation_indices(events)...
{"message": "Duplicate observation-like event for tool_call_id=toolu_01WK8m53DJP6KkuqVCBSbYzm"}
-> no KeyError (BUG fixed)
Building View.from_events() and accessing view.manipulation_indices...
{"message": "Property <class 'openhands.sdk.context.view.properties.tool_call_matching.ToolCallMatchingProperty'> enforced, 1 events dropped."}
view len = 3
-> no KeyError (BUG fixed)
Interpretation: The fix works correctly. Now enforce() detects the duplicate and returns events_to_remove = {'obs-late'}, preventing it from reaching manipulation_indices(). The defensive discard() in manipulation_indices() logs a warning for any duplicate that might slip through but does not crash. The View correctly drops 1 event (view len = 3 instead of 4), and no KeyError is raised.
Test 2: Verify regression test passes
Ran the new test case that covers this scenario:
OPENHANDS_SUPPRESS_BANNER=1 uv run --frozen pytest tests/sdk/context/view/properties/test_tool_call_matching.py -k "duplicate" -vOutput:
tests/sdk/context/view/properties/test_tool_call_matching.py::TestToolCallMatchingPropertyEnforcement::test_duplicate_observation_like_events_for_same_tool_call_id PASSED [100%]
======================== 1 passed, 14 deselected, 5 warnings in 0.05s ========================
The regression test passes, confirming the fix handles the crash recovery duplicate scenario correctly.
Test 3: Verify no regressions in existing tests
Ran the entire tool_call_matching test suite:
OPENHANDS_SUPPRESS_BANNER=1 uv run --frozen pytest tests/sdk/context/view/properties/test_tool_call_matching.py -vOutput:
======================== 15 passed, 5 warnings in 0.72s ========================
All 15 tests pass (including 10 existing tests for enforcement and manipulation indices), confirming no regressions.
Issues Found
None.
csmith49
left a comment
There was a problem hiding this comment.
Recommend we turn this detection into a separate property implementation. Will help keep the duplicate item dropping logic easy-to-find (in case we need to change if we take first or last instance), and if we put it first in the property list it guarantees it's true for all downstream .enforce() calls.
One note there is that there's no way to ensure this property via manipulation indices, but that system is built with the assumption that the input view is already well-formed. For this property we can just return a complete set of indices and throw a warning if somehow the duplicate surfaces at that point.
…onUniquenessProperty
…ol-results-and-crash-condensation-with-KeyError
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean bug fix that addresses previous review feedback.
Previous review concerns fully resolved:
- ✅ Separate property implementation (
ObservationUniquenessProperty) created - ✅ Property correctly ordered first in property list to run before
ToolCallMatchingProperty - ✅ Good defensive programming:
manipulation_indices()logs warnings instead of crashing
The two-layer defense (enforce drops duplicates + manipulation_indices warns) is pragmatic and handles the crash recovery scenario cleanly.
[RISK ASSESSMENT]
Isolated bug fix for duplicate observations during crash recovery. Changes are defensive and well-tested. No impact on normal operation (non-duplicate case).
VERDICT:
✅ Worth merging
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The PR successfully prevents KeyError crashes when duplicate observation-like events exist for the same tool_call_id during crash recovery scenarios.
Does this PR achieve its stated goal?
Yes. The PR fixes issue #3045 by preventing KeyError during condensation when crash recovery creates duplicate observation-like events (e.g., an AgentErrorEvent + a late ObservationEvent with the same tool_call_id). I verified this end-to-end by reproducing the exact crash scenario on main and confirming it's fixed on this PR branch. The solution is architecturally sound: a new ObservationUniquenessProperty runs before ToolCallMatchingProperty and removes duplicates, preventing them from reaching the strict pairing logic that would crash.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Built successfully with make build |
| CI Status | ✅ All critical checks passing (sdk-tests, pre-commit, Python API, cross-tests) |
| Functional Verification | ✅ Crash scenario reproduced on main, confirmed fixed on PR branch |
Functional Verification
Test 1: Reproduce the KeyError on main branch
Step 1 — Establish baseline (main branch without fix):
Checked out main branch and created a reproduction scenario with duplicate observation-like events:
- ActionEvent with
tool_call_id="call_1" - AgentErrorEvent (crash recovery) with
tool_call_id="call_1" - Late ObservationEvent with
tool_call_id="call_1"
Ran ToolCallMatchingProperty.manipulation_indices() directly on these events:
Testing scenario with duplicate observation-like events on main branch...
Action: action_1 (tool_call_id=call_1)
AgentError: 6f62c620-de8a-4f7d-b8fd-f1703d7111dd (tool_call_id=call_1)
Late Observation: obs_late (tool_call_id=call_1)
Calling ToolCallMatchingProperty.manipulation_indices()...
✓ KeyError occurred as expected: 'call_1'
This is the bug that the PR fixes!
This confirms the bug exists on main: the second observation-like event tries to remove() the same tool_call_id from the pending set, causing a KeyError.
Step 2 — Apply the PR's changes:
Checked out PR branch vasco/crash-recovery-can-create-duplicate-tool-results-and-crash-condensation-with-KeyError
Step 3 — Re-run with the fix in place:
Ran the same scenario through the property pipeline (ObservationUniquenessProperty → ToolCallMatchingProperty):
Testing ObservationUniquenessProperty → ToolCallMatchingProperty pipeline...
Initial events: ['action_1', '0a4047c9-5cb5-4430-a094-655b9553030a', 'obs_late']
Step 1: Applying ObservationUniquenessProperty.enforce()...
Events to remove: {'obs_late'}
Remaining events: ['action_1', '0a4047c9-5cb5-4430-a094-655b9553030a']
Step 2: Applying ToolCallMatchingProperty...
Calling enforce()...
✓ enforce() succeeded, events to remove: set()
Calling manipulation_indices()...
✓ manipulation_indices() succeeded: ManipulationIndices({0, 2})
✓ Pipeline completed without KeyError!
Final events: ['action_1', '0a4047c9-5cb5-4430-a094-655b9553030a']
This shows the fix works: ObservationUniquenessProperty removes the duplicate observation before ToolCallMatchingProperty sees it, preventing the KeyError.
Test 2: Unit tests verification
Ran the new test suite:
uv run pytest tests/sdk/context/view/properties/test_observation_uniqueness.py -vAll 4 tests passed:
- ✅
test_enforce_drops_late_observation_after_agent_error - ✅
test_enforce_no_duplicates_returns_empty - ✅
test_manipulation_indices_returns_complete_for_well_formed_view - ✅
test_manipulation_indices_warns_but_does_not_crash_on_duplicates
Test 3: Property registration order
Verified that ObservationUniquenessProperty is registered before ToolCallMatchingProperty in ALL_PROPERTIES:
ALL_PROPERTIES: list[ViewPropertyBase] = [
ObservationUniquenessProperty(), # index 0
BatchAtomicityProperty(), # index 1
ToolCallMatchingProperty(), # index 2
ToolLoopAtomicityProperty(), # index 3
]This order ensures duplicates are removed before the pairing logic runs.
Issues Found
None.
Summary
ToolCallMatchingProperty.enforce()cardinality-aware so duplicate observation-like events for the sametool_call_id(e.g. anAgentErrorEventfrom crash recovery + a lateObservationEvent) get dropped instead ofslipping past membership-only checks.
set.remove()withset.discard()+logger.warninginmanipulation_indices()so a duplicate slipping pastenforce()no longer crashes condensation with a rawKeyError..pr/and a property-level regression test.Issue Number
#3045.
Type
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:58cdeb8-pythonRun
All tags pushed for this build
About Multi-Architecture Support
58cdeb8-python) is a multi-arch manifest supporting both amd64 and arm6458cdeb8-python-amd64) are also available if needed