fix(swarm): surface errors, output contract, Windows-safe store, redact paths by Teerapat-Vatpitak · Pull Request #119 · HKUDS/Vibe-Trading

Teerapat-Vatpitak · 2026-05-16T05:00:14Z

Summary

Surface swarm task/run errors at every MCP read boundary (incl. the event stream) instead of folding timeout/empty into a completed status.
Enforce a worker output contract (reject plan-only / unparsed-tool-markup / fabricated deliverables).
Make SwarmStore atomic write/read Windows-safe (WinError-scoped bounded retry; POSIX behavior unchanged).
Add src/tools/redaction.py (anchored, idempotent, no regex) and redact internal absolute paths from user-facing errors and events (CWE-209/497).

Why

A run could report completed with no grounded deliverable; raised errors and task_failed/run_error event payloads leaked the OS user and venv/install topology; on Windows a concurrent os.replace race crashed the swarm.

Changes

swarm/{serialization,runtime,store,worker,models}, tools/{redaction,read_file,write_file,edit_file}
redaction wired at swarm read boundaries, file-tool errors, and event payloads

Test Plan

pytest --ignore=agent/tests/e2e_backtest -q -> 1080 passed, 1 skipped
New regression tests; fail-before/pass-after verified per fix

Checklist

No changes to protected areas (src/agent|session|providers)
No hardcoded values
Follows CONTRIBUTING.md

Note: follow-up will harden _classify_deliverable/_is_error_result (substring -> JSON parse); out of scope here.

run_swarm / get_swarm_status / get_run_result and the in-process run_swarm tool each hand-maintained a field allowlist that omitted SwarmTask.error. A misconfigured provider returned status="failed" with no diagnosable reason, though the error was captured on disk the whole time. Add src/swarm/serialization.py as the single source of truth for the per-task projection (now includes error + iterations) plus a run-level error summary; route all three boundaries through it so the allowlist cannot silently drift again. Additive JSON change; the only shape normalization is _format_result missing-summary "" -> null. (cherry picked from commit 95f2717ac32bd02829f650bc971e3cd6dd7fffa6)

A worker that emitted only its plan ("Phase 1 — Plan"), mock data, unparsed tool markup, or (for a data agent) made no tool call and wrote no report was returned status="completed". runtime.py also folded timeout/token_limit into completed. The run reported success with a stub as final_report (P01); degraded workers were invisible (P03). Add WorkerStatus.incomplete and a hybrid output contract in worker.py: content-sanity for every agent, tool-evidence only for data agents so tool-less synthesis/editor roles are not false-rejected. Fail-fast to incomplete (no corrective retry — left for a follow-up). runtime.py only maps a true completed to TaskStatus.completed; anything else fails the run, so final_report can no longer be a non-deliverable. (cherry picked from commit 774b26824509678c953cd0200ef7a56c03f23201)

The MCP run_swarm wrapper called start_run() without include_shell_tools, so stdio swarm workers silently lost bash and could not run the scripts their own execution rules mandate (P03-B). build_filtered_registry also dropped a requested-but-unavailable tool with no log, hiding the contradiction. Pass the server's shell-tool policy into start_run and warn on a dropped tool. Add swarm_runs_root() as the single source of truth for the run store location, shared by mcp_server and the path_utils run-dir allow-list so the two can no longer drift (P03-A: an installed-wheel layout had put every worker run_dir outside it). (cherry picked from commit 2975fda09b5ad44d15580d9330890f72f43c00ce)

SwarmStore._atomic_write did tmp.replace(target) with no retry. On Windows os.replace raises PermissionError WinError 5/32 when the target is concurrently open by load_run on the poll path, crashing the worker thread and leaving the run stuck pending (P13). POSIX os.replace is atomic and never raises these. Add a WinError-scoped, bounded retry around the rename and a matching transient read/parse retry in load_run (the other side of the same race). Non-transient errors re-raise immediately; off-Windows the loop runs once, so POSIX behavior is unchanged. Pre-existing, not introduced by the P01/P03/P04 fixes — surfaced by E2E dogfooding. (cherry picked from commit 46a779cf436353591e7840879f8c32f763dde826)

Remove an unused `from pathlib import Path` in the PR03 test added by 2975fda, and the pre-existing unused `from src.swarm.mailbox import Mailbox` in runtime.py (file already in this branch's diff). Net ruff delta for PR1-PR4 is now -1. The remaining mcp_server.py:41 E402 is pre-existing on upstream and structural — left out of scope. (cherry picked from commit e96f56af1ead9b5b7058c90ec0d4df738b45e473)

Unknown preset / workspace-escape / missing run-or-task dir errors embedded the full install path (OS username, .venvs/site-packages topology) — CWE-209/497, reachable via MCP/API/agent/CLI (P10). Drop the absolute path while keeping the actionable bits: preset name + Available list, the workspace-root boundary, and the logical run/task id (basename only). Two test_path_safety assertions that matched the old leaking message are updated to the redacted wording (behavior unchanged — still raises and rejects). Verified-clean sites (caller's own input / env-var / names list) left untouched. (cherry picked from commit 69ea45295410cd25ea3079c25b7e3c4c73390b80)

Add agent/src/tools/redaction.py: anchored prefix redaction (no regex, idempotent, None-safe) that hides only known-internal root prefixes while keeping the relative tail for diagnosability. Wire it through the swarm error read boundaries (serialization, runtime, store) and the file tools' final broad except so a leaked absolute path cannot reach a user-facing string (CWE-209/497, P10). ValueError branches and the protected providers/llm.py are left untouched. (cherry picked from commit 21dc9cdfacf17cccd31cacfbe34481f62f1390af)

warren618 · 2026-05-16T06:58:20Z

Thanks — will review this week.

# Conflicts: # agent/mcp_server.py

warren618 · 2026-05-16T15:53:18Z

Merged after a quick origin/main merge into the branch (purely style conflicts in mcp_server.py from #120's reformatting — your serialize_task version preserved end-to-end).

Three things worth calling out:

The serialization.py single-source extraction is a textbook fix for P04 — three read boundaries × hand-maintained field allowlists × one omitted error field is exactly the bug class that hides forever. Centralizing the projection means a future field can't be silently dropped by one of the three.
_classify_deliverable's Hybrid policy is the right shape — content sanity for everyone, tool-evidence only for data agents — and the explicit synthesis/editor false-reject guard test (test_synthesis_agent_prose_is_accepted) shows the policy was designed around the failure mode, not just around the bug. The substring heuristics for plan-only / fabrication markers are pragmatic; I'll harden _classify_deliverable / _is_error_result toward JSON parse as you flagged (out of scope here was the right call).
redaction.py is correct in every axis: anchored to known internal roots only (no over-redaction of caller paths — the test_no_over_redaction matrix proves it), idempotent (sentinel is not itself a root), no regex (zero ReDoS surface), None-safe.

The Windows-vs-POSIX split in _replace_with_retry (getattr(exc, "winerror", None) in {5, 32} → POSIX OSError never matches → loop runs exactly once off-Windows) is the right shape too. Nice work overall.

_internal_roots resolves 8 candidate paths + builds 3 separator variants + sorts on every redact_internal_paths call. On the worker hot path this runs once per task error / event / file-tool failure. Result is process- stable (paths don't change without a restart), so @cache it. Follow-up to #119.

The substring head-match (a) false-positived on nested status fields and (b) false-negated when the error envelope sat past the 160-char head. Parse the result as JSON and check only the top-level status field; fall back to the original substring scan when the payload is truncated / unparseable so the classifier never raises on the worker hot path. Adds 7 unit tests pinning the new behavior — nested-status non-match, past-head match, truncated fallback, non-error statuses (degenerate / warning) correctly classified as non-error. Follow-up to #119.

…ct paths (HKUDS#119) * fix(swarm): surface task error at read boundaries run_swarm / get_swarm_status / get_run_result and the in-process run_swarm tool each hand-maintained a field allowlist that omitted SwarmTask.error. A misconfigured provider returned status="failed" with no diagnosable reason, though the error was captured on disk the whole time. Add src/swarm/serialization.py as the single source of truth for the per-task projection (now includes error + iterations) plus a run-level error summary; route all three boundaries through it so the allowlist cannot silently drift again. Additive JSON change; the only shape normalization is _format_result missing-summary "" -> null. (cherry picked from commit 95f2717ac32bd02829f650bc971e3cd6dd7fffa6) * fix(swarm): enforce worker output contract A worker that emitted only its plan ("Phase 1 — Plan"), mock data, unparsed tool markup, or (for a data agent) made no tool call and wrote no report was returned status="completed". runtime.py also folded timeout/token_limit into completed. The run reported success with a stub as final_report (P01); degraded workers were invisible (P03). Add WorkerStatus.incomplete and a hybrid output contract in worker.py: content-sanity for every agent, tool-evidence only for data agents so tool-less synthesis/editor roles are not false-rejected. Fail-fast to incomplete (no corrective retry — left for a follow-up). runtime.py only maps a true completed to TaskStatus.completed; anything else fails the run, so final_report can no longer be a non-deliverable. (cherry picked from commit 774b26824509678c953cd0200ef7a56c03f23201) * fix(swarm): thread shell tools + unify runs root The MCP run_swarm wrapper called start_run() without include_shell_tools, so stdio swarm workers silently lost bash and could not run the scripts their own execution rules mandate (P03-B). build_filtered_registry also dropped a requested-but-unavailable tool with no log, hiding the contradiction. Pass the server's shell-tool policy into start_run and warn on a dropped tool. Add swarm_runs_root() as the single source of truth for the run store location, shared by mcp_server and the path_utils run-dir allow-list so the two can no longer drift (P03-A: an installed-wheel layout had put every worker run_dir outside it). (cherry picked from commit 2975fda09b5ad44d15580d9330890f72f43c00ce) * fix(swarm): make store atomic write Windows-safe SwarmStore._atomic_write did tmp.replace(target) with no retry. On Windows os.replace raises PermissionError WinError 5/32 when the target is concurrently open by load_run on the poll path, crashing the worker thread and leaving the run stuck pending (P13). POSIX os.replace is atomic and never raises these. Add a WinError-scoped, bounded retry around the rename and a matching transient read/parse retry in load_run (the other side of the same race). Non-transient errors re-raise immediately; off-Windows the loop runs once, so POSIX behavior is unchanged. Pre-existing, not introduced by the P01/P03/P04 fixes — surfaced by E2E dogfooding. (cherry picked from commit 46a779cf436353591e7840879f8c32f763dde826) * style(swarm): drop unused imports (ruff F401) Remove an unused `from pathlib import Path` in the PR03 test added by 2975fda, and the pre-existing unused `from src.swarm.mailbox import Mailbox` in runtime.py (file already in this branch's diff). Net ruff delta for PR1-PR4 is now -1. The remaining mcp_server.py:41 E402 is pre-existing on upstream and structural — left out of scope. (cherry picked from commit e96f56af1ead9b5b7058c90ec0d4df738b45e473) * fix: redact internal absolute paths in errors Unknown preset / workspace-escape / missing run-or-task dir errors embedded the full install path (OS username, .venvs/site-packages topology) — CWE-209/497, reachable via MCP/API/agent/CLI (P10). Drop the absolute path while keeping the actionable bits: preset name + Available list, the workspace-root boundary, and the logical run/task id (basename only). Two test_path_safety assertions that matched the old leaking message are updated to the redacted wording (behavior unchanged — still raises and rejects). Verified-clean sites (caller's own input / env-var / names list) left untouched. (cherry picked from commit 69ea45295410cd25ea3079c25b7e3c4c73390b80) * fix(security): redact internal paths in errors Add agent/src/tools/redaction.py: anchored prefix redaction (no regex, idempotent, None-safe) that hides only known-internal root prefixes while keeping the relative tail for diagnosability. Wire it through the swarm error read boundaries (serialization, runtime, store) and the file tools' final broad except so a leaked absolute path cannot reach a user-facing string (CWE-209/497, P10). ValueError branches and the protected providers/llm.py are left untouched. (cherry picked from commit 21dc9cdfacf17cccd31cacfbe34481f62f1390af) * fix(security): redact swarm event error payloads --------- Co-authored-by: Haozhe Wu <haozhe_wu@connect.hku.hk>

_internal_roots resolves 8 candidate paths + builds 3 separator variants + sorts on every redact_internal_paths call. On the worker hot path this runs once per task error / event / file-tool failure. Result is process- stable (paths don't change without a restart), so @cache it. Follow-up to HKUDS#119.

The substring head-match (a) false-positived on nested status fields and (b) false-negated when the error envelope sat past the 160-char head. Parse the result as JSON and check only the top-level status field; fall back to the original substring scan when the payload is truncated / unparseable so the classifier never raises on the worker hot path. Adds 7 unit tests pinning the new behavior — nested-status non-match, past-head match, truncated fallback, non-error statuses (degenerate / warning) correctly classified as non-error. Follow-up to HKUDS#119.

Teerapat-Vatpitak added 8 commits May 16, 2026 11:06

fix(security): redact swarm event error payloads

689afe4

Teerapat-Vatpitak marked this pull request as ready for review May 16, 2026 07:00

warren618 mentioned this pull request May 16, 2026

Discuss: .env resolution observability in providers/llm.py (protected module) #123

Closed

Merge remote-tracking branch 'origin/main' into pr/swarm-redaction

0e3f982

# Conflicts: # agent/mcp_server.py

warren618 merged commit 0a254c0 into HKUDS:main May 16, 2026
1 check passed

Teerapat-Vatpitak deleted the pr/swarm-redaction branch May 17, 2026 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(swarm): surface errors, output contract, Windows-safe store, redact paths#119

fix(swarm): surface errors, output contract, Windows-safe store, redact paths#119
warren618 merged 9 commits into
HKUDS:mainfrom
Teerapat-Vatpitak:pr/swarm-redaction

Teerapat-Vatpitak commented May 16, 2026

Uh oh!

warren618 commented May 16, 2026

Uh oh!

Uh oh!

warren618 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Teerapat-Vatpitak commented May 16, 2026

Summary

Why

Changes

Test Plan

Checklist

Uh oh!

warren618 commented May 16, 2026

Uh oh!

Uh oh!

warren618 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants