Skip to content

fix(swarm): surface errors, output contract, Windows-safe store, redact paths#119

Merged
warren618 merged 9 commits into
HKUDS:mainfrom
Teerapat-Vatpitak:pr/swarm-redaction
May 16, 2026
Merged

fix(swarm): surface errors, output contract, Windows-safe store, redact paths#119
warren618 merged 9 commits into
HKUDS:mainfrom
Teerapat-Vatpitak:pr/swarm-redaction

Conversation

@Teerapat-Vatpitak
Copy link
Copy Markdown
Contributor

Summary

  • Surface swarm task/run errors at every MCP read boundary (incl. the event stream) instead of folding timeout/empty into a completed status.
  • Enforce a worker output contract (reject plan-only / unparsed-tool-markup / fabricated deliverables).
  • Make SwarmStore atomic write/read Windows-safe (WinError-scoped bounded retry; POSIX behavior unchanged).
  • Add src/tools/redaction.py (anchored, idempotent, no regex) and redact internal absolute paths from user-facing errors and events (CWE-209/497).

Why

A run could report completed with no grounded deliverable; raised errors and task_failed/run_error event payloads leaked the OS user and venv/install topology; on Windows a concurrent os.replace race crashed the swarm.

Changes

  • swarm/{serialization,runtime,store,worker,models}, tools/{redaction,read_file,write_file,edit_file}
  • redaction wired at swarm read boundaries, file-tool errors, and event payloads

Test Plan

  • pytest --ignore=agent/tests/e2e_backtest -q -> 1080 passed, 1 skipped
  • New regression tests; fail-before/pass-after verified per fix

Checklist

  • No changes to protected areas (src/agent|session|providers)
  • No hardcoded values
  • Follows CONTRIBUTING.md

Note: follow-up will harden _classify_deliverable/_is_error_result (substring -> JSON parse); out of scope here.

run_swarm / get_swarm_status / get_run_result and the in-process
run_swarm tool each hand-maintained a field allowlist that omitted
SwarmTask.error. A misconfigured provider returned status="failed"
with no diagnosable reason, though the error was captured on disk
the whole time.

Add src/swarm/serialization.py as the single source of truth for the
per-task projection (now includes error + iterations) plus a run-level
error summary; route all three boundaries through it so the allowlist
cannot silently drift again. Additive JSON change; the only shape
normalization is _format_result missing-summary "" -> null.

(cherry picked from commit 95f2717ac32bd02829f650bc971e3cd6dd7fffa6)
A worker that emitted only its plan ("Phase 1 — Plan"), mock data,
unparsed tool markup, or (for a data agent) made no tool call and
wrote no report was returned status="completed". runtime.py also
folded timeout/token_limit into completed. The run reported success
with a stub as final_report (P01); degraded workers were invisible
(P03).

Add WorkerStatus.incomplete and a hybrid output contract in worker.py:
content-sanity for every agent, tool-evidence only for data agents so
tool-less synthesis/editor roles are not false-rejected. Fail-fast to
incomplete (no corrective retry — left for a follow-up). runtime.py
only maps a true completed to TaskStatus.completed; anything else
fails the run, so final_report can no longer be a non-deliverable.

(cherry picked from commit 774b26824509678c953cd0200ef7a56c03f23201)
The MCP run_swarm wrapper called start_run() without
include_shell_tools, so stdio swarm workers silently lost bash and
could not run the scripts their own execution rules mandate (P03-B).
build_filtered_registry also dropped a requested-but-unavailable
tool with no log, hiding the contradiction.

Pass the server's shell-tool policy into start_run and warn on a
dropped tool. Add swarm_runs_root() as the single source of truth
for the run store location, shared by mcp_server and the path_utils
run-dir allow-list so the two can no longer drift (P03-A: an
installed-wheel layout had put every worker run_dir outside it).

(cherry picked from commit 2975fda09b5ad44d15580d9330890f72f43c00ce)
SwarmStore._atomic_write did tmp.replace(target) with no retry. On
Windows os.replace raises PermissionError WinError 5/32 when the
target is concurrently open by load_run on the poll path, crashing
the worker thread and leaving the run stuck pending (P13). POSIX
os.replace is atomic and never raises these.

Add a WinError-scoped, bounded retry around the rename and a matching
transient read/parse retry in load_run (the other side of the same
race). Non-transient errors re-raise immediately; off-Windows the
loop runs once, so POSIX behavior is unchanged. Pre-existing, not
introduced by the P01/P03/P04 fixes — surfaced by E2E dogfooding.

(cherry picked from commit 46a779cf436353591e7840879f8c32f763dde826)
Remove an unused `from pathlib import Path` in the PR03 test added by
2975fda, and the pre-existing unused `from src.swarm.mailbox import
Mailbox` in runtime.py (file already in this branch's diff). Net ruff
delta for PR1-PR4 is now -1. The remaining mcp_server.py:41 E402 is
pre-existing on upstream and structural — left out of scope.

(cherry picked from commit e96f56af1ead9b5b7058c90ec0d4df738b45e473)
Unknown preset / workspace-escape / missing run-or-task dir errors
embedded the full install path (OS username, .venvs/site-packages
topology) — CWE-209/497, reachable via MCP/API/agent/CLI (P10).

Drop the absolute path while keeping the actionable bits: preset
name + Available list, the workspace-root boundary, and the logical
run/task id (basename only). Two test_path_safety assertions that
matched the old leaking message are updated to the redacted wording
(behavior unchanged — still raises and rejects). Verified-clean
sites (caller's own input / env-var / names list) left untouched.

(cherry picked from commit 69ea45295410cd25ea3079c25b7e3c4c73390b80)
Add agent/src/tools/redaction.py: anchored prefix redaction (no
regex, idempotent, None-safe) that hides only known-internal root
prefixes while keeping the relative tail for diagnosability.

Wire it through the swarm error read boundaries (serialization,
runtime, store) and the file tools' final broad except so a leaked
absolute path cannot reach a user-facing string (CWE-209/497, P10).
ValueError branches and the protected providers/llm.py are left
untouched.

(cherry picked from commit 21dc9cdfacf17cccd31cacfbe34481f62f1390af)
@warren618
Copy link
Copy Markdown
Collaborator

Thanks — will review this week.

@warren618 warren618 merged commit 0a254c0 into HKUDS:main May 16, 2026
1 check passed
@warren618
Copy link
Copy Markdown
Collaborator

Merged after a quick origin/main merge into the branch (purely style conflicts in mcp_server.py from #120's reformatting — your serialize_task version preserved end-to-end).

Three things worth calling out:

  1. The serialization.py single-source extraction is a textbook fix for P04 — three read boundaries × hand-maintained field allowlists × one omitted error field is exactly the bug class that hides forever. Centralizing the projection means a future field can't be silently dropped by one of the three.

  2. _classify_deliverable's Hybrid policy is the right shape — content sanity for everyone, tool-evidence only for data agents — and the explicit synthesis/editor false-reject guard test (test_synthesis_agent_prose_is_accepted) shows the policy was designed around the failure mode, not just around the bug. The substring heuristics for plan-only / fabrication markers are pragmatic; I'll harden _classify_deliverable / _is_error_result toward JSON parse as you flagged (out of scope here was the right call).

  3. redaction.py is correct in every axis: anchored to known internal roots only (no over-redaction of caller paths — the test_no_over_redaction matrix proves it), idempotent (sentinel is not itself a root), no regex (zero ReDoS surface), None-safe.

The Windows-vs-POSIX split in _replace_with_retry (getattr(exc, "winerror", None) in {5, 32} → POSIX OSError never matches → loop runs exactly once off-Windows) is the right shape too. Nice work overall.

warren618 added a commit that referenced this pull request May 16, 2026
_internal_roots resolves 8 candidate paths + builds 3 separator variants
+ sorts on every redact_internal_paths call. On the worker hot path this
runs once per task error / event / file-tool failure. Result is process-
stable (paths don't change without a restart), so @cache it.

Follow-up to #119.
warren618 added a commit that referenced this pull request May 16, 2026
The substring head-match (a) false-positived on nested status fields and
(b) false-negated when the error envelope sat past the 160-char head.
Parse the result as JSON and check only the top-level status field; fall
back to the original substring scan when the payload is truncated /
unparseable so the classifier never raises on the worker hot path.

Adds 7 unit tests pinning the new behavior — nested-status non-match,
past-head match, truncated fallback, non-error statuses (degenerate /
warning) correctly classified as non-error.

Follow-up to #119.
@Teerapat-Vatpitak Teerapat-Vatpitak deleted the pr/swarm-redaction branch May 17, 2026 17:36
ykykj pushed a commit to ykykj/Vibe-Trading that referenced this pull request May 23, 2026
…ct paths (HKUDS#119)

* fix(swarm): surface task error at read boundaries

run_swarm / get_swarm_status / get_run_result and the in-process
run_swarm tool each hand-maintained a field allowlist that omitted
SwarmTask.error. A misconfigured provider returned status="failed"
with no diagnosable reason, though the error was captured on disk
the whole time.

Add src/swarm/serialization.py as the single source of truth for the
per-task projection (now includes error + iterations) plus a run-level
error summary; route all three boundaries through it so the allowlist
cannot silently drift again. Additive JSON change; the only shape
normalization is _format_result missing-summary "" -> null.

(cherry picked from commit 95f2717ac32bd02829f650bc971e3cd6dd7fffa6)

* fix(swarm): enforce worker output contract

A worker that emitted only its plan ("Phase 1 — Plan"), mock data,
unparsed tool markup, or (for a data agent) made no tool call and
wrote no report was returned status="completed". runtime.py also
folded timeout/token_limit into completed. The run reported success
with a stub as final_report (P01); degraded workers were invisible
(P03).

Add WorkerStatus.incomplete and a hybrid output contract in worker.py:
content-sanity for every agent, tool-evidence only for data agents so
tool-less synthesis/editor roles are not false-rejected. Fail-fast to
incomplete (no corrective retry — left for a follow-up). runtime.py
only maps a true completed to TaskStatus.completed; anything else
fails the run, so final_report can no longer be a non-deliverable.

(cherry picked from commit 774b26824509678c953cd0200ef7a56c03f23201)

* fix(swarm): thread shell tools + unify runs root

The MCP run_swarm wrapper called start_run() without
include_shell_tools, so stdio swarm workers silently lost bash and
could not run the scripts their own execution rules mandate (P03-B).
build_filtered_registry also dropped a requested-but-unavailable
tool with no log, hiding the contradiction.

Pass the server's shell-tool policy into start_run and warn on a
dropped tool. Add swarm_runs_root() as the single source of truth
for the run store location, shared by mcp_server and the path_utils
run-dir allow-list so the two can no longer drift (P03-A: an
installed-wheel layout had put every worker run_dir outside it).

(cherry picked from commit 2975fda09b5ad44d15580d9330890f72f43c00ce)

* fix(swarm): make store atomic write Windows-safe

SwarmStore._atomic_write did tmp.replace(target) with no retry. On
Windows os.replace raises PermissionError WinError 5/32 when the
target is concurrently open by load_run on the poll path, crashing
the worker thread and leaving the run stuck pending (P13). POSIX
os.replace is atomic and never raises these.

Add a WinError-scoped, bounded retry around the rename and a matching
transient read/parse retry in load_run (the other side of the same
race). Non-transient errors re-raise immediately; off-Windows the
loop runs once, so POSIX behavior is unchanged. Pre-existing, not
introduced by the P01/P03/P04 fixes — surfaced by E2E dogfooding.

(cherry picked from commit 46a779cf436353591e7840879f8c32f763dde826)

* style(swarm): drop unused imports (ruff F401)

Remove an unused `from pathlib import Path` in the PR03 test added by
2975fda, and the pre-existing unused `from src.swarm.mailbox import
Mailbox` in runtime.py (file already in this branch's diff). Net ruff
delta for PR1-PR4 is now -1. The remaining mcp_server.py:41 E402 is
pre-existing on upstream and structural — left out of scope.

(cherry picked from commit e96f56af1ead9b5b7058c90ec0d4df738b45e473)

* fix: redact internal absolute paths in errors

Unknown preset / workspace-escape / missing run-or-task dir errors
embedded the full install path (OS username, .venvs/site-packages
topology) — CWE-209/497, reachable via MCP/API/agent/CLI (P10).

Drop the absolute path while keeping the actionable bits: preset
name + Available list, the workspace-root boundary, and the logical
run/task id (basename only). Two test_path_safety assertions that
matched the old leaking message are updated to the redacted wording
(behavior unchanged — still raises and rejects). Verified-clean
sites (caller's own input / env-var / names list) left untouched.

(cherry picked from commit 69ea45295410cd25ea3079c25b7e3c4c73390b80)

* fix(security): redact internal paths in errors

Add agent/src/tools/redaction.py: anchored prefix redaction (no
regex, idempotent, None-safe) that hides only known-internal root
prefixes while keeping the relative tail for diagnosability.

Wire it through the swarm error read boundaries (serialization,
runtime, store) and the file tools' final broad except so a leaked
absolute path cannot reach a user-facing string (CWE-209/497, P10).
ValueError branches and the protected providers/llm.py are left
untouched.

(cherry picked from commit 21dc9cdfacf17cccd31cacfbe34481f62f1390af)

* fix(security): redact swarm event error payloads

---------

Co-authored-by: Haozhe Wu <haozhe_wu@connect.hku.hk>
ykykj pushed a commit to ykykj/Vibe-Trading that referenced this pull request May 23, 2026
_internal_roots resolves 8 candidate paths + builds 3 separator variants
+ sorts on every redact_internal_paths call. On the worker hot path this
runs once per task error / event / file-tool failure. Result is process-
stable (paths don't change without a restart), so @cache it.

Follow-up to HKUDS#119.
ykykj pushed a commit to ykykj/Vibe-Trading that referenced this pull request May 23, 2026
The substring head-match (a) false-positived on nested status fields and
(b) false-negated when the error envelope sat past the 160-char head.
Parse the result as JSON and check only the top-level status field; fall
back to the original substring scan when the payload is truncated /
unparseable so the classifier never raises on the worker hot path.

Adds 7 unit tests pinning the new behavior — nested-status non-match,
past-head match, truncated fallback, non-error statuses (degenerate /
warning) correctly classified as non-error.

Follow-up to HKUDS#119.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants