Skip to content

feat(workflow): drop whole elements when truncating reduce input#3700

Merged
ak684 merged 1 commit into
mainfrom
alona/sdk-workflow-reduce-truncation
Jun 14, 2026
Merged

feat(workflow): drop whole elements when truncating reduce input#3700
ak684 merged 1 commit into
mainfrom
alona/sdk-workflow-reduce-truncation

Conversation

@ak684

@ak684 ak684 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

HUMAN:

The workflow reducer sliced its fan-in at a hard 12k-char boundary, cutting mid-JSON and silently dropping findings. This drops whole elements instead so the reducer gets valid JSON. Reviewed the change and the new tests.

  • A human has tested these changes.

AGENT:


Why

reduce_agent serialized its fan-in input and then sliced it at a hard 12,000 character boundary. That cut can land mid-JSON-token and silently drop findings, in the exact repo-wide-audit path the workflow tool advertises (map findings → reduce), the reducer would receive truncated/!invalid intermediate results with no clear signal of what was lost.

Summary

  • For list/dict reduce inputs, drop whole trailing elements (keeping valid JSON) and append a clear [N of M items omitted to fit the reduce input limit] marker, keeping the leading elements.
  • Scalars/strings, and the degenerate case of a single element larger than the budget, still fall back to character truncation as a safety net.
  • Greedy fill by per-element estimate, then correct against the actual combined serialization so the result always fits without slicing mid-token.

How to Test

Self-verifiable with pytest only (no LLM, no OHE):

uv run pytest tests/tools/workflow      # 32 passed (5 new + existing)
uv run pytest tests/tools               # 912 passed, 9 skipped, 0 failed
uv run ruff check && uv run ruff format # clean

New tests cover: small input passthrough; large list/dict dropping whole elements with the head remaining valid JSON (the whole point); a single oversized element falling back to char truncation; and long-string char truncation. The pre-existing test_format_value_truncates_large_intermediate_results still passes
(back-compat for the scalar path).

Type

  • Feature

Notes

Purely the reducer's input-shaping; no change to run_agent/map_agents/pipeline
behavior or to the workflow tool's public surface.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d08988d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d08988d-python \
  ghcr.io/openhands/agent-server:d08988d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d08988d-golang-amd64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-golang-amd64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-golang-amd64
ghcr.io/openhands/agent-server:d08988d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d08988d-golang-arm64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-golang-arm64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-golang-arm64
ghcr.io/openhands/agent-server:d08988d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d08988d-java-amd64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-java-amd64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-java-amd64
ghcr.io/openhands/agent-server:d08988d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d08988d-java-arm64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-java-arm64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-java-arm64
ghcr.io/openhands/agent-server:d08988d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d08988d-python-amd64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-python-amd64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-python-amd64
ghcr.io/openhands/agent-server:d08988d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d08988d-python-arm64
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-python-arm64
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-python-arm64
ghcr.io/openhands/agent-server:d08988d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d08988d-golang
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-golang
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-golang
ghcr.io/openhands/agent-server:d08988d-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d08988d-java
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-java
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-java
ghcr.io/openhands/agent-server:d08988d-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d08988d-python
ghcr.io/openhands/agent-server:d08988d7282d8c97cfda1c5c52f7b3bd1b99506f-python
ghcr.io/openhands/agent-server:alona-sdk-workflow-reduce-truncation-python
ghcr.io/openhands/agent-server:d08988d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., d08988d-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d08988d-python-amd64) are also available if needed

reduce_agent serialized its fan-in then sliced it at a hard 12k character
boundary, which can cut mid-JSON-token and silently drop findings in the exact
codebase-audit path the tool advertises. For list/dict inputs, drop whole
trailing elements (keeping valid JSON) and report how many were omitted, keeping
the leading elements; scalars/strings and a single oversized element still fall
back to character truncation.
@github-actions

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@ak684 ak684 marked this pull request as draft June 14, 2026 19:33
@github-actions

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/workflow
   impl.py21216621%84–91, 93–95, 99–101, 110–111, 126–128, 135–137, 152–154, 160–165, 170–171, 173, 177–179, 185, 189, 207–208, 210–219, 221, 238, 246–249, 251–252, 255–258, 262–263, 266–267, 273–274, 280–282, 290–293, 297–299, 306–308, 310–312, 314–319, 327–336, 339, 350–351, 355–358, 360, 365–366, 370–371, 379, 381–387, 390, 395, 399, 403–404, 409, 413–416, 421–424, 426, 430–435, 437–439, 441–444, 450–451, 455–456, 460, 498, 509–510, 517, 521–523, 528–531, 538
TOTAL311661569549% 

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The reducer input-shaping behavior works as described: large list/dict fan-in payloads now keep valid leading JSON and report omitted trailing items, while existing fallback truncation still works.

Does this PR achieve its stated goal?

Yes. I reproduced the old behavior on origin/main: large list/dict reduce inputs were character-truncated and the JSON head failed to parse with an unterminated string. With the PR commit, the same realistic 60-item fan-in payloads kept 23 leading entries, preserved prefix order, stayed under the 12,000-character limit, and exposed the items omitted to fit marker; small and scalar fallback paths still behaved as expected.

Phase Result
Environment Setup make build completed successfully
CI Status 🟡 Latest snapshot had no failing checks; several Agent Server/tools/QA checks were still pending, and PR Description Check was skipped
Functional Verification ✅ Before/after probe verified the changed reduce input behavior with realistic list/dict/scalar inputs
Functional Verification

Test 1: Large list/dict reduce fan-in payloads drop whole trailing elements

Step 1 — Reproduce / establish baseline without the fix:
Created /tmp/qa-pr3700-base from origin/main, then ran:

timeout 30s env OPENHANDS_SUPPRESS_BANNER=1 PYTHONPATH=/tmp/qa-pr3700-base/openhands-tools .venv/bin/python /tmp/qa_reduce_summary.py /tmp/qa-pr3700-base BASE

Relevant output:

BASE large_list: len=12046 limit=12000 marker=char-truncation head_json_valid=False kept=n/a/60 prefix_preserved=n/a parse_error=Unterminated string starting at
BASE large_dict: len=12046 limit=12000 marker=char-truncation head_json_valid=False kept=n/a/60 prefix_preserved=n/a parse_error=Unterminated string starting at
BASE small_passthrough=True
BASE single_oversized_char_truncated=True
BASE long_string_char_truncated=True

This confirms the pre-PR problem: structured list/dict reduce inputs were cut by character count, leaving invalid JSON heads for the reducer to consume.

Step 2 — Apply the PR's changes:
Created /tmp/qa-pr3700-head at commit d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.

Step 3 — Re-run with the fix in place:
Ran:

timeout 30s env OPENHANDS_SUPPRESS_BANNER=1 PYTHONPATH=/tmp/qa-pr3700-head/openhands-tools .venv/bin/python /tmp/qa_reduce_summary.py /tmp/qa-pr3700-head PR

Relevant output:

PR large_list: len=11965 limit=12000 marker=omission head_json_valid=True kept=23/60 prefix_preserved=True parse_error=none
PR large_dict: len=11850 limit=12000 marker=omission head_json_valid=True kept=23/60 prefix_preserved=True parse_error=none
PR small_passthrough=True
PR single_oversized_char_truncated=True
PR long_string_char_truncated=True

This shows the PR fixes the stated issue: large structured payloads now drop whole trailing entries, keep parseable leading JSON, preserve ordering, and include an explicit omission marker. The unchanged passthrough/fallback cases still work.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@ak684 ak684 marked this pull request as ready for review June 14, 2026 19:38
@ak684 ak684 requested a review from all-hands-bot June 14, 2026 19:38

all-hands-bot commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

🟢 Good taste — Elegant solution that eliminates a real problem.

What Changed

The _format_value function (used to serialize reduce input for the reducer prompt) now drops whole trailing elements from lists/dicts when exceeding the character limit, ensuring valid JSON output. Previously, it truncated at arbitrary character boundaries, which could split JSON tokens mid-way.

Assessment

[CRITICAL ISSUES] — None

[IMPROVEMENT OPPORTUNITIES] — None

[TESTING GAPS] — None

The tests are comprehensive and cover:

  • Small values pass through unchanged
  • Large lists/dicts drop whole elements and stay valid JSON
  • Single oversized elements fall back to character truncation
  • Long strings fall back to character truncation

Algorithm Design

The greedy fill + corrective loop approach is sound:

  1. Estimates per-element size, adds elements until budget exceeded
  2. Corrects downward if render(kept) exceeds budget (accounts for JSON indentation overhead)
  3. Safety net for single oversized elements via _truncate_text fallback

Always keeps at least one element (while len(kept) > 1), which is correct defensive behavior.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a contained internal change with comprehensive test coverage. The worst-case scenario (malformed JSON) is guarded against by the safety net.


Worth merging: Clean implementation that solves a real problem with good test coverage.

KEY INSIGHT:
Dropping whole JSON elements instead of slicing mid-token is the right approach — it transforms a potential correctness bug into a well-documented truncation with clear user feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified oversized workflow reduce fan-in input now stays valid JSON by dropping whole trailing elements instead of slicing mid-token.

Does this PR achieve its stated goal?

Yes. On origin/main, the same realistic oversized list/dict reducer inputs produced truncated JSON fragments that failed json.loads with JSONDecodeError. On PR commit d08988d7282d8c97cfda1c5c52f7b3bd1b99506f, the formatted reducer input parsed successfully, preserved leading items in order, included the explicit omission marker, and stayed within the reducer input budget.

Phase Result
Environment Setup uv run python created .venv, built local monorepo packages, and imported openhands.tools.workflow.impl successfully.
CI Status 🟡 35 successful, 2 skipped, 1 pending (QA Changes by OpenHands/qa-changes) at review time.
Functional Verification ✅ Reproduced the old invalid-JSON truncation and verified the PR behavior through _format_value and the public WorkflowContext.reduce_agent prompt-shaping path.
Functional Verification

Test 1: Oversized list/dict reducer fan-in stays valid JSON

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and ran a Python script that imports the workflow implementation, creates 60 realistic repository findings as both a list and dict, formats them as reducer input, and attempts to parse the JSON head before the truncation marker.

Relevant output:

large_list.json_error=JSONDecodeError: Unterminated string starting at: line 104 column 5 (char 11994)
large_list.limit=12000
large_list.output_len=12046
large_list.omission_marker=False
large_list.trunc_marker=True
large_list.json_valid=False
large_list.parsed_count=None

large_dict.json_error=JSONDecodeError: Expecting ',' delimiter: line 81 column 4 (char 12000)
large_dict.limit=12000
large_dict.output_len=12046
large_dict.omission_marker=False
large_dict.trunc_marker=True
large_dict.json_valid=False
large_dict.parsed_count=None

This confirms the described bug: list/dict reduce input was sliced at a character boundary, leaving invalid JSON for the reducer.

Step 2 — Apply the PR's changes:
Checked out d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.

Step 3 — Re-run with the fix in place:
Ran the same script against the PR checkout.

Relevant output:

large_list.limit=12000
large_list.output_len=11432
large_list.omission_marker=True
large_list.trunc_marker=False
large_list.json_valid=True
large_list.parsed_count=19
large_list.leading_items_preserved=True

large_dict.limit=12000
large_dict.output_len=11461
large_dict.omission_marker=True
large_dict.trunc_marker=False
large_dict.json_valid=True
large_dict.parsed_count=19
large_dict.leading_items_preserved=True

This shows the PR drops whole trailing elements, keeps valid JSON, preserves leading elements, and reports omitted items.

Test 2: Public reduce_agent path passes valid structured input to the reducer

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and exercised WorkflowContext.reduce_agent(...) with a capturing context so the reducer prompt could be inspected without making an external LLM call.

Relevant output:

reduce_agent.result=captured reducer prompt
reduce_agent.prompt_len=12091
reduce_agent.omission_marker=False
reduce_agent.json_valid=False
reduce_agent.json_error=JSONDecodeError: Unterminated string starting at: line 84 column 16 (char 11485)

This confirms the user-facing workflow reducer path received invalid JSON before the fix.

Step 2 — Apply the PR's changes:
Checked out d08988d7282d8c97cfda1c5c52f7b3bd1b99506f.

Step 3 — Re-run with the fix in place:
Ran the same public-path verification against the PR checkout.

Relevant output:

reduce_agent.result=captured reducer prompt
reduce_agent.prompt_len=11556
reduce_agent.omission_marker=True
reduce_agent.json_valid=True
reduce_agent.parsed_count=20
reduce_agent.leading_items_preserved=True

This confirms the actual reducer prompt-shaping path now provides valid structured input and an omission marker.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@ak684 ak684 merged commit 6bf874e into main Jun 14, 2026
50 of 52 checks passed
@ak684 ak684 deleted the alona/sdk-workflow-reduce-truncation branch June 14, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants