feat(workflow): drop whole elements when truncating reduce input by ak684 · Pull Request #3700 · OpenHands/software-agent-sdk

ak684 · 2026-06-14T19:30:26Z

HUMAN:

The workflow reducer sliced its fan-in at a hard 12k-char boundary, cutting mid-JSON and silently dropping findings. This drops whole elements instead so the reducer gets valid JSON. Reviewed the change and the new tests.

A human has tested these changes.

AGENT:

Why

reduce_agent serialized its fan-in input and then sliced it at a hard 12,000 character boundary. That cut can land mid-JSON-token and silently drop findings, in the exact repo-wide-audit path the workflow tool advertises (map findings → reduce), the reducer would receive truncated/!invalid intermediate results with no clear signal of what was lost.

Summary

For list/dict reduce inputs, drop whole trailing elements (keeping valid JSON) and append a clear [N of M items omitted to fit the reduce input limit] marker, keeping the leading elements.
Scalars/strings, and the degenerate case of a single element larger than the budget, still fall back to character truncation as a safety net.
Greedy fill by per-element estimate, then correct against the actual combined serialization so the result always fits without slicing mid-token.

How to Test

Self-verifiable with pytest only (no LLM, no OHE):

uv run pytest tests/tools/workflow      # 32 passed (5 new + existing)
uv run pytest tests/tools               # 912 passed, 9 skipped, 0 failed
uv run ruff check && uv run ruff format # clean

New tests cover: small input passthrough; large list/dict dropping whole elements with the head remaining valid JSON (the whole point); a single oversized element falling back to char truncation; and long-string char truncation. The pre-existing test_format_value_truncates_large_intermediate_results still passes
(back-compat for the scalar path).

Type

Feature

Notes

Purely the reducer's input-shaping; no change to run_agent/map_agents/pipeline
behavior or to the workflow tool's public surface.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d08988d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d08988d-python \
  ghcr.io/openhands/agent-server:d08988d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d08988d-golang-amd64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-golang-amd64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-golang-amd64
ghcr.io/openhands/agent-server:d08988d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d08988d-golang-arm64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-golang-arm64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-golang-arm64
ghcr.io/openhands/agent-server:d08988d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d08988d-java-amd64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-java-amd64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-java-amd64
ghcr.io/openhands/agent-server:d08988d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d08988d-java-arm64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-java-arm64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-java-arm64
ghcr.io/openhands/agent-server:d08988d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d08988d-python-amd64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-python-amd64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-python-amd64
ghcr.io/openhands/agent-server:d08988d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d08988d-python-arm64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-python-arm64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-python-arm64
ghcr.io/openhands/agent-server:d08988d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d08988d-golang
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-golang
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-golang
ghcr.io/openhands/agent-server:d08988d-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d08988d-java
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-java
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-java
ghcr.io/openhands/agent-server:d08988d-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d08988d-python
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-python
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-python
ghcr.io/openhands/agent-server:d08988d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., d08988d-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d08988d-python-amd64) are also available if needed

reduce_agent serialized its fan-in then sliced it at a hard 12k character boundary, which can cut mid-JSON-token and silently drop findings in the exact codebase-audit path the tool advertises. For list/dict inputs, drop whole trailing elements (keeping valid JSON) and report how many were omitted, keeping the leading elements; scalars/strings and a single oversized element still fall back to character truncation.

github-actions · 2026-06-14T19:30:55Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-14T19:31:04Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-14T19:37:10Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-tools/openhands/tools/workflow
impl.py	212	166	21%	84–91, 93–95, 99–101, 110–111, 126–128, 135–137, 152–154, 160–165, 170–171, 173, 177–179, 185, 189, 207–208, 210–219, 221, 238, 246–249, 251–252, 255–258, 262–263, 266–267, 273–274, 280–282, 290–293, 297–299, 306–308, 310–312, 314–319, 327–336, 339, 350–351, 355–358, 360, 365–366, 370–371, 379, 381–387, 390, 395, 399, 403–404, 409, 413–416, 421–424, 426, 430–435, 437–439, 441–444, 450–451, 455–456, 460, 498, 509–510, 517, 521–523, 528–531, 538
TOTAL	31166	15695	49%

all-hands-bot

✅ QA Report: PASS

The reducer input-shaping behavior works as described: large list/dict fan-in payloads now keep valid leading JSON and report omitted trailing items, while existing fallback truncation still works.

Does this PR achieve its stated goal?

Yes. I reproduced the old behavior on origin/main: large list/dict reduce inputs were character-truncated and the JSON head failed to parse with an unterminated string. With the PR commit, the same realistic 60-item fan-in payloads kept 23 leading entries, preserved prefix order, stayed under the 12,000-character limit, and exposed the items omitted to fit marker; small and scalar fallback paths still behaved as expected.

Phase	Result
Environment Setup	✅ `make build` completed successfully
CI Status	🟡 Latest snapshot had no failing checks; several Agent Server/tools/QA checks were still pending, and PR Description Check was skipped
Functional Verification	✅ Before/after probe verified the changed reduce input behavior with realistic list/dict/scalar inputs

Functional Verification

Test 1: Large list/dict reduce fan-in payloads drop whole trailing elements

Step 1 — Reproduce / establish baseline without the fix:
Created /tmp/qa-pr3700-base from origin/main, then ran:

timeout 30s env OPENHANDS_SUPPRESS_BANNER=1 PYTHONPATH=/tmp/qa-pr3700-base/openhands-tools .venv/bin/python /tmp/qa_reduce_summary.py /tmp/qa-pr3700-base BASE

Relevant output:

BASE large_list: len=12046 limit=12000 marker=char-truncation head_json_valid=False kept=n/a/60 prefix_preserved=n/a parse_error=Unterminated string starting at
BASE large_dict: len=12046 limit=12000 marker=char-truncation head_json_valid=False kept=n/a/60 prefix_preserved=n/a parse_error=Unterminated string starting at
BASE small_passthrough=True
BASE single_oversized_char_truncated=True
BASE long_string_char_truncated=True

This confirms the pre-PR problem: structured list/dict reduce inputs were cut by character count, leaving invalid JSON heads for the reducer to consume.

Step 2 — Apply the PR's changes:
Created /tmp/qa-pr3700-head at commit d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.

Step 3 — Re-run with the fix in place:
Ran:

timeout 30s env OPENHANDS_SUPPRESS_BANNER=1 PYTHONPATH=/tmp/qa-pr3700-head/openhands-tools .venv/bin/python /tmp/qa_reduce_summary.py /tmp/qa-pr3700-head PR

Relevant output:

PR large_list: len=11965 limit=12000 marker=omission head_json_valid=True kept=23/60 prefix_preserved=True parse_error=none
PR large_dict: len=11850 limit=12000 marker=omission head_json_valid=True kept=23/60 prefix_preserved=True parse_error=none
PR small_passthrough=True
PR single_oversized_char_truncated=True
PR long_string_char_truncated=True

This shows the PR fixes the stated issue: large structured payloads now drop whole trailing entries, keep parseable leading JSON, preserve ordering, and include an explicit omission marker. The unchanged passthrough/fallback cases still work.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-14T19:39:16Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review Summary

🟢 Good taste — Elegant solution that eliminates a real problem.

What Changed

The _format_value function (used to serialize reduce input for the reducer prompt) now drops whole trailing elements from lists/dicts when exceeding the character limit, ensuring valid JSON output. Previously, it truncated at arbitrary character boundaries, which could split JSON tokens mid-way.

Assessment

[CRITICAL ISSUES] — None

[IMPROVEMENT OPPORTUNITIES] — None

[TESTING GAPS] — None

The tests are comprehensive and cover:

Small values pass through unchanged
Large lists/dicts drop whole elements and stay valid JSON
Single oversized elements fall back to character truncation
Long strings fall back to character truncation

Algorithm Design

The greedy fill + corrective loop approach is sound:

Estimates per-element size, adds elements until budget exceeded
Corrects downward if render(kept) exceeds budget (accounts for JSON indentation overhead)
Safety net for single oversized elements via _truncate_text fallback

Always keeps at least one element (while len(kept) > 1), which is correct defensive behavior.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a contained internal change with comprehensive test coverage. The worst-case scenario (malformed JSON) is guarded against by the safety net.

✅ Worth merging: Clean implementation that solves a real problem with good test coverage.

KEY INSIGHT:
Dropping whole JSON elements instead of slicing mid-token is the right approach — it transforms a potential correctness bug into a well-documented truncation with clear user feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot

✅ QA Report: PASS

Verified oversized workflow reduce fan-in input now stays valid JSON by dropping whole trailing elements instead of slicing mid-token.

Does this PR achieve its stated goal?

Yes. On origin/main, the same realistic oversized list/dict reducer inputs produced truncated JSON fragments that failed json.loads with JSONDecodeError. On PR commit d08988d7282d8c97cfda1c5c52f7b3bd1b99506f, the formatted reducer input parsed successfully, preserved leading items in order, included the explicit omission marker, and stayed within the reducer input budget.

Phase	Result
Environment Setup	✅ `uv run python` created `.venv`, built local monorepo packages, and imported `openhands.tools.workflow.impl` successfully.
CI Status	🟡 35 successful, 2 skipped, 1 pending (`QA Changes by OpenHands/qa-changes`) at review time.
Functional Verification	✅ Reproduced the old invalid-JSON truncation and verified the PR behavior through `_format_value` and the public `WorkflowContext.reduce_agent` prompt-shaping path.

Functional Verification

Test 1: Oversized list/dict reducer fan-in stays valid JSON

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and ran a Python script that imports the workflow implementation, creates 60 realistic repository findings as both a list and dict, formats them as reducer input, and attempts to parse the JSON head before the truncation marker.

Relevant output:

large_list.json_error=JSONDecodeError: Unterminated string starting at: line 104 column 5 (char 11994)
large_list.limit=12000
large_list.output_len=12046
large_list.omission_marker=False
large_list.trunc_marker=True
large_list.json_valid=False
large_list.parsed_count=None

large_dict.json_error=JSONDecodeError: Expecting ',' delimiter: line 81 column 4 (char 12000)
large_dict.limit=12000
large_dict.output_len=12046
large_dict.omission_marker=False
large_dict.trunc_marker=True
large_dict.json_valid=False
large_dict.parsed_count=None

This confirms the described bug: list/dict reduce input was sliced at a character boundary, leaving invalid JSON for the reducer.

Step 2 — Apply the PR's changes:
Checked out d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.

Step 3 — Re-run with the fix in place:
Ran the same script against the PR checkout.

Relevant output:

large_list.limit=12000
large_list.output_len=11432
large_list.omission_marker=True
large_list.trunc_marker=False
large_list.json_valid=True
large_list.parsed_count=19
large_list.leading_items_preserved=True

large_dict.limit=12000
large_dict.output_len=11461
large_dict.omission_marker=True
large_dict.trunc_marker=False
large_dict.json_valid=True
large_dict.parsed_count=19
large_dict.leading_items_preserved=True

This shows the PR drops whole trailing elements, keeps valid JSON, preserves leading elements, and reports omitted items.

Test 2: Public `reduce_agent` path passes valid structured input to the reducer

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and exercised WorkflowContext.reduce_agent(...) with a capturing context so the reducer prompt could be inspected without making an external LLM call.

Relevant output:

reduce_agent.result=captured reducer prompt
reduce_agent.prompt_len=12091
reduce_agent.omission_marker=False
reduce_agent.json_valid=False
reduce_agent.json_error=JSONDecodeError: Unterminated string starting at: line 84 column 16 (char 11485)

This confirms the user-facing workflow reducer path received invalid JSON before the fix.

Step 2 — Apply the PR's changes:
Checked out d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.

Step 3 — Re-run with the fix in place:
Ran the same public-path verification against the PR checkout.

Relevant output:

reduce_agent.result=captured reducer prompt
reduce_agent.prompt_len=11556
reduce_agent.omission_marker=True
reduce_agent.json_valid=True
reduce_agent.parsed_count=20
reduce_agent.leading_items_preserved=True

This confirms the actual reducer prompt-shaping path now provides valid structured input and an omission marker.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

ak684 marked this pull request as draft June 14, 2026 19:33

all-hands-bot reviewed Jun 14, 2026

View reviewed changes

ak684 marked this pull request as ready for review June 14, 2026 19:38

ak684 requested a review from all-hands-bot June 14, 2026 19:38

all-hands-bot approved these changes Jun 14, 2026

View reviewed changes

all-hands-bot reviewed Jun 14, 2026

View reviewed changes

ak684 merged commit 6bf874e into main Jun 14, 2026
50 of 52 checks passed

ak684 deleted the alona/sdk-workflow-reduce-truncation branch June 14, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflow): drop whole elements when truncating reduce input#3700

feat(workflow): drop whole elements when truncating reduce input#3700
ak684 merged 1 commit into
mainfrom
alona/sdk-workflow-reduce-truncation

ak684 commented Jun 14, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ak684 commented Jun 14, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

How to Test

Type

Notes

Uh oh!

github-actions Bot commented Jun 14, 2026

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 14, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Large list/dict reduce fan-in payloads drop whole trailing elements

Issues Found

Uh oh!

all-hands-bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review Summary

What Changed

Assessment

Algorithm Design

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Oversized list/dict reducer fan-in stays valid JSON

Test 2: Public reduce_agent path passes valid structured input to the reducer

Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ak684 commented Jun 14, 2026 •

edited by github-actions Bot

Loading

all-hands-bot commented Jun 14, 2026 •

edited

Loading

Test 2: Public `reduce_agent` path passes valid structured input to the reducer