feat(workflow): drop whole elements when truncating reduce input#3700
Conversation
reduce_agent serialized its fan-in then sliced it at a hard 12k character boundary, which can cut mid-JSON-token and silently drop findings in the exact codebase-audit path the tool advertises. For list/dict inputs, drop whole trailing elements (keeping valid JSON) and report how many were omitted, keeping the leading elements; scalars/strings and a single oversized element still fall back to character truncation.
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The reducer input-shaping behavior works as described: large list/dict fan-in payloads now keep valid leading JSON and report omitted trailing items, while existing fallback truncation still works.
Does this PR achieve its stated goal?
Yes. I reproduced the old behavior on origin/main: large list/dict reduce inputs were character-truncated and the JSON head failed to parse with an unterminated string. With the PR commit, the same realistic 60-item fan-in payloads kept 23 leading entries, preserved prefix order, stayed under the 12,000-character limit, and exposed the items omitted to fit marker; small and scalar fallback paths still behaved as expected.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully |
| CI Status | 🟡 Latest snapshot had no failing checks; several Agent Server/tools/QA checks were still pending, and PR Description Check was skipped |
| Functional Verification | ✅ Before/after probe verified the changed reduce input behavior with realistic list/dict/scalar inputs |
Functional Verification
Test 1: Large list/dict reduce fan-in payloads drop whole trailing elements
Step 1 — Reproduce / establish baseline without the fix:
Created /tmp/qa-pr3700-base from origin/main, then ran:
timeout 30s env OPENHANDS_SUPPRESS_BANNER=1 PYTHONPATH=/tmp/qa-pr3700-base/openhands-tools .venv/bin/python /tmp/qa_reduce_summary.py /tmp/qa-pr3700-base BASERelevant output:
BASE large_list: len=12046 limit=12000 marker=char-truncation head_json_valid=False kept=n/a/60 prefix_preserved=n/a parse_error=Unterminated string starting at
BASE large_dict: len=12046 limit=12000 marker=char-truncation head_json_valid=False kept=n/a/60 prefix_preserved=n/a parse_error=Unterminated string starting at
BASE small_passthrough=True
BASE single_oversized_char_truncated=True
BASE long_string_char_truncated=True
This confirms the pre-PR problem: structured list/dict reduce inputs were cut by character count, leaving invalid JSON heads for the reducer to consume.
Step 2 — Apply the PR's changes:
Created /tmp/qa-pr3700-head at commit d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.
Step 3 — Re-run with the fix in place:
Ran:
timeout 30s env OPENHANDS_SUPPRESS_BANNER=1 PYTHONPATH=/tmp/qa-pr3700-head/openhands-tools .venv/bin/python /tmp/qa_reduce_summary.py /tmp/qa-pr3700-head PRRelevant output:
PR large_list: len=11965 limit=12000 marker=omission head_json_valid=True kept=23/60 prefix_preserved=True parse_error=none
PR large_dict: len=11850 limit=12000 marker=omission head_json_valid=True kept=23/60 prefix_preserved=True parse_error=none
PR small_passthrough=True
PR single_oversized_char_truncated=True
PR long_string_char_truncated=True
This shows the PR fixes the stated issue: large structured payloads now drop whole trailing entries, keep parseable leading JSON, preserve ordering, and include an explicit omission marker. The unchanged passthrough/fallback cases still work.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review Summary
🟢 Good taste — Elegant solution that eliminates a real problem.
What Changed
The _format_value function (used to serialize reduce input for the reducer prompt) now drops whole trailing elements from lists/dicts when exceeding the character limit, ensuring valid JSON output. Previously, it truncated at arbitrary character boundaries, which could split JSON tokens mid-way.
Assessment
[CRITICAL ISSUES] — None
[IMPROVEMENT OPPORTUNITIES] — None
[TESTING GAPS] — None
The tests are comprehensive and cover:
- Small values pass through unchanged
- Large lists/dicts drop whole elements and stay valid JSON
- Single oversized elements fall back to character truncation
- Long strings fall back to character truncation
Algorithm Design
The greedy fill + corrective loop approach is sound:
- Estimates per-element size, adds elements until budget exceeded
- Corrects downward if
render(kept)exceeds budget (accounts for JSON indentation overhead) - Safety net for single oversized elements via
_truncate_textfallback
Always keeps at least one element (while len(kept) > 1), which is correct defensive behavior.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a contained internal change with comprehensive test coverage. The worst-case scenario (malformed JSON) is guarded against by the safety net.
✅ Worth merging: Clean implementation that solves a real problem with good test coverage.
KEY INSIGHT:
Dropping whole JSON elements instead of slicing mid-token is the right approach — it transforms a potential correctness bug into a well-documented truncation with clear user feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified oversized workflow reduce fan-in input now stays valid JSON by dropping whole trailing elements instead of slicing mid-token.
Does this PR achieve its stated goal?
Yes. On origin/main, the same realistic oversized list/dict reducer inputs produced truncated JSON fragments that failed json.loads with JSONDecodeError. On PR commit d08988d7282d8c97cfda1c5c52f7b3bd1b99506f, the formatted reducer input parsed successfully, preserved leading items in order, included the explicit omission marker, and stayed within the reducer input budget.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv run python created .venv, built local monorepo packages, and imported openhands.tools.workflow.impl successfully. |
| CI Status | 🟡 35 successful, 2 skipped, 1 pending (QA Changes by OpenHands/qa-changes) at review time. |
| Functional Verification | ✅ Reproduced the old invalid-JSON truncation and verified the PR behavior through _format_value and the public WorkflowContext.reduce_agent prompt-shaping path. |
Functional Verification
Test 1: Oversized list/dict reducer fan-in stays valid JSON
Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and ran a Python script that imports the workflow implementation, creates 60 realistic repository findings as both a list and dict, formats them as reducer input, and attempts to parse the JSON head before the truncation marker.
Relevant output:
large_list.json_error=JSONDecodeError: Unterminated string starting at: line 104 column 5 (char 11994)
large_list.limit=12000
large_list.output_len=12046
large_list.omission_marker=False
large_list.trunc_marker=True
large_list.json_valid=False
large_list.parsed_count=None
large_dict.json_error=JSONDecodeError: Expecting ',' delimiter: line 81 column 4 (char 12000)
large_dict.limit=12000
large_dict.output_len=12046
large_dict.omission_marker=False
large_dict.trunc_marker=True
large_dict.json_valid=False
large_dict.parsed_count=None
This confirms the described bug: list/dict reduce input was sliced at a character boundary, leaving invalid JSON for the reducer.
Step 2 — Apply the PR's changes:
Checked out d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.
Step 3 — Re-run with the fix in place:
Ran the same script against the PR checkout.
Relevant output:
large_list.limit=12000
large_list.output_len=11432
large_list.omission_marker=True
large_list.trunc_marker=False
large_list.json_valid=True
large_list.parsed_count=19
large_list.leading_items_preserved=True
large_dict.limit=12000
large_dict.output_len=11461
large_dict.omission_marker=True
large_dict.trunc_marker=False
large_dict.json_valid=True
large_dict.parsed_count=19
large_dict.leading_items_preserved=True
This shows the PR drops whole trailing elements, keeps valid JSON, preserves leading elements, and reports omitted items.
Test 2: Public reduce_agent path passes valid structured input to the reducer
Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and exercised WorkflowContext.reduce_agent(...) with a capturing context so the reducer prompt could be inspected without making an external LLM call.
Relevant output:
reduce_agent.result=captured reducer prompt
reduce_agent.prompt_len=12091
reduce_agent.omission_marker=False
reduce_agent.json_valid=False
reduce_agent.json_error=JSONDecodeError: Unterminated string starting at: line 84 column 16 (char 11485)
This confirms the user-facing workflow reducer path received invalid JSON before the fix.
Step 2 — Apply the PR's changes:
Checked out d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.
Step 3 — Re-run with the fix in place:
Ran the same public-path verification against the PR checkout.
Relevant output:
reduce_agent.result=captured reducer prompt
reduce_agent.prompt_len=11556
reduce_agent.omission_marker=True
reduce_agent.json_valid=True
reduce_agent.parsed_count=20
reduce_agent.leading_items_preserved=True
This confirms the actual reducer prompt-shaping path now provides valid structured input and an omission marker.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
HUMAN:
The workflow reducer sliced its fan-in at a hard 12k-char boundary, cutting mid-JSON and silently dropping findings. This drops whole elements instead so the reducer gets valid JSON. Reviewed the change and the new tests.
AGENT:
Why
reduce_agentserialized its fan-in input and then sliced it at a hard 12,000 character boundary. That cut can land mid-JSON-token and silently drop findings, in the exact repo-wide-audit path the workflow tool advertises (map findings → reduce), the reducer would receive truncated/!invalid intermediate results with no clear signal of what was lost.Summary
[N of M items omitted to fit the reduce input limit]marker, keeping the leading elements.How to Test
Self-verifiable with pytest only (no LLM, no OHE):
New tests cover: small input passthrough; large list/dict dropping whole elements with the head remaining valid JSON (the whole point); a single oversized element falling back to char truncation; and long-string char truncation. The pre-existing
test_format_value_truncates_large_intermediate_resultsstill passes(back-compat for the scalar path).
Type
Notes
Purely the reducer's input-shaping; no change to
run_agent/map_agents/pipelinebehavior or to the workflow tool's public surface.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:d08988d-pythonRun
All tags pushed for this build
About Multi-Architecture Support
d08988d-python) is a multi-arch manifest supporting both amd64 and arm64d08988d-python-amd64) are also available if needed