error_watcher: scan tmux windows for errors and alert a random agent by pseay-imbue · Pull Request #171 · imbue-ai/forever-claude-template

pseay-imbue · 2026-06-16T19:58:37Z

Summary

Adds a new error-watcher background service. It scans every tmux window in the agent's session every 5 seconds for output matching /error|exception/i and, when a new match appears, sends one batched message to a randomly selected messageable mngr agent so a service that errored gets noticed instead of scrolling past.

Key behaviors:

Discovers its own session via tmux display-message -p '#S'; enumerates and captures every window's visible pane.
Skips its own svc-error-watcher window so its alert text (which contains "error") cannot re-trigger a match.
Alerts only on newly-appeared output (per-window in-memory dedup), so a static error on screen is reported once.
Enumerates agents via mngr list --format json and sends via mngr message; if no agent is messageable it logs and skips without erroring.
Multiple windows erroring in one poll produce a single batched message.
Match pattern is overridable via the ERROR_WATCHER_PATTERN env var.

Registered as [services.error-watcher] in services.toml (restart = "on-failure"), so the bootstrap manager runs it in its own svc-error-watcher window.

Review resolution (post-review fixes)

After the initial implementation a full review filed 12 findings. All are now resolved or consciously deferred, one fix per commit:

Fix	Commit	What it does
#1 lost-alert ordering	`6735584e`	Record an error as "alerted" only after the message is delivered (split read-only `unseen_matches` from `mark_alerted`), so a failed/skipped send no longer drops the error permanently — it's retried on a later poll.
#2 send-failure fallback	`dd2b89d3`	Try messageable agents in uniformly-random order, falling back to the next on a failed send instead of giving up after one bad pick.
#3 messageable filter	`0a67eb66`	Restrict recipients to `type: claude` agents (mirroring `list_claude_agent_names`) in addition to excluding `STOPPED`, so the non-interactive system-services agent is never messaged.
#4 volatile-line dedup	`507e214a`	Dedup on a number-insensitive key (digit runs → `#`), so a re-timestamped error line alerts once instead of every 5s.
#5 bounded state	`36740cda`	Prune dedup state for closed windows + cap keys per window, so the permanent process doesn't leak memory.
#6 runner/list hardening	`f129139b`	Parse `mngr list` even on a non-zero exit; distinct `RUNNER_FAILURE_RETURNCODE` so a runner failure isn't confused with a real exit-1; preserve a timed-out command's partial output.
#6 follow-up	`444ea1c9`	`subprocess.TimeoutExpired.stdout` is bytes even under `text=True`, so the #6 partial-output preservation needed an explicit decode to actually work (found by autofix; the original test never exercised the path).
#8/#10/#11	`840b5b9b`	Handle `SIGHUP` (bootstrap's real stop signal) alongside SIGTERM/SIGINT; add the `if __name__ == "__main__"` guard; lift + test the signal handler.

Deliberately not fixed (rationale in specs/window-error-watcher/review.md): #7 (duplicate window names can't arise — bootstrap names windows uniquely), #9 (centralizing the mngr argv builders would couple this simple FCT-side script to the imbue-mngr runtime), #12 (re-fetching the session each poll is negligible and the fix would churn every test).

Design notes

All tmux/mngr I/O goes through an injected command runner that never raises, so a vanished window, a hung command, or a missing binary is logged and skipped rather than crashing the poll loop. The pure core (matching, dedup, alert formatting, mngr argv builders, agent parsing, recipient selection) is fully unit-tested; run_one_poll is integration-tested end-to-end with an injected fake runner. The mngr argv builders are validated against the live mngr CLI surface via assert_mngr_argv_valid.

Testing

uv run pytest libs/error_watcher -> 71 passed (57 unit/integration + 14 ratchets), coverage enabled.
Real-tmux end-to-end verification is intentionally manual per the repo CLAUDE.md; manually verified working in a minds Docker agent on this branch.

The spec, plan, and review for this feature live in the mngr monorepo under specs/window-error-watcher/ (mngr PR #2179).

Update — refactor into input / routing / output layers

Since the review-resolution commits above, the service was refactored from the single watcher.py module into three loosely-coupled layers, so the error source and the alert destination can each be swapped without touching the core (commit f68318ee):

Input layer (inputs.py): ErrorInput (abc.ABC) with TmuxWindowErrorInput wrapping the tmux session-discovery / pane-capture work behind a source-agnostic ErrorReading. A systemd/journald reader would be a drop-in sibling.
Routing layer (routing.py): ErrorRouter — the main work — matching, number-insensitive per-source dedup, batching, and delivery-gated dedup bookkeeping; depends only on the two layer interfaces (not on tmux or mngr).
Output layer (outputs.py): ErrorOutput (abc.ABC) -> MngrAgentErrorOutput (mngr delivery with an abstract choose_recipients policy hook) -> RandomMngrAgentErrorOutput (today's uniform-random behavior). A future error-fixing recipient policy is a one-method subclass.
commands.py holds the shared never-raising CommandRunner subprocess seam; watcher.py now just wires the three layers and runs the poll loop; testing.py holds shared test doubles; tests are split per module.

run_one_poll is replaced by ErrorRouter.run_once(), and the alert wording was made source-agnostic. The style guide's "interface classes use MutableModel + pydantic Field" convention was deliberately not followed (this lib has no pydantic dependency and uses NamedTuple/Callable throughout); the hard "no Protocol for interface classes" rule is followed.

Three follow-up fixes from /autofix are included: ruff formatting (92048676), hoisting a set out of the prune loop (09972d5b), and a comment-cleanup commit that was then reverted per author preference (de806aa8 + ac10f4ae).

Testing (supersedes the "71 passed" line above): uv run pytest libs/error_watcher -> 77 passed (63 unit/integration + 14 ratchets); ruff format --check, ruff check, and pyright on libs/error_watcher/ all clean.

Branch hygiene

The unrelated vendor/mngr/ subtree sync that had been snapshotted in WIP commit 39e2cb05 has been reverted (commit 9185978a), so this PR's net diff is now error_watcher-only (libs/error_watcher/, services.toml, and the feature's own uv.lock entry) with no vendor/mngr/ changes. Per the repo's "never rebase" rule the WIP commit was not stripped from history; it and its revert remain as canceling entries (squash at merge if a fully clean history is wanted).

Add the new error-watcher library (mirroring app_watcher's layout) and implement its pure, side-effect-free core: case-insensitive error/exception line matching, per-window dedup of already-alerted lines, batched alert formatting, the mngr list/message argv builders, agent-summary parsing, and uniform-random recipient selection. Includes unit tests for every pure function (the argv builders are validated against the live mngr CLI) and the package's ratchet file. The polling loop and tmux/mngr I/O are wired up next. Registers error-watcher as a workspace member so its tests resolve. Co-authored-by: Sculptor <sculptor@imbue.com>

Implement main() as a long-lived 5s poll loop with SIGTERM/SIGINT handlers that exit cleanly. Each poll discovers the tmux session, enumerates and captures every window (skipping the watcher's own svc-error-watcher window), runs the dedup core, and on newly-appeared matches enumerates messageable mngr agents (STOPPED excluded) and sends one batched alert to a random one. All tmux/mngr I/O goes through an injected command runner that never raises, so a vanished window, a hung command, or a missing binary is logged and skipped rather than crashing the loop. Adds run_one_poll integration tests driven by a fake runner (no real tmux), plus unit tests for the messageable filter and the pattern-override compiler. Co-authored-by: Sculptor <sculptor@imbue.com>

Register [services.error-watcher] in services.toml so the bootstrap manager runs the watcher in its own svc-error-watcher tmux window with an on-failure restart policy. Expand the lib README to describe the scan/match/alert behavior, the own-window exclusion, the random-agent notification, and the v1 non-goals (naive matching, in-memory dedup, visible-pane-only). Add the FCT changelog entry for the branch. Co-authored-by: Sculptor <sculptor@imbue.com>

Architecture review flagged that the hard-coded OWN_WINDOW constant ("svc-error-watcher") silently couples to the [services.error-watcher] key in services.toml: renaming the service without updating the constant would re-enable the self-alert feedback loop. Cross-reference the coupling in both places so the dependency is explicit. No behavior change. Co-authored-by: Sculptor <sculptor@imbue.com>

Problem: run_one_poll recorded matching lines into the dedup state (via new_matches) before sending the alert. If the alert could not be delivered -- no messageable agent (all STOPPED), an mngr list enumeration failure, or a non-zero mngr message send -- those lines were already marked seen and were never re-alerted, even though the error stayed on screen and a recipient could become available later. This silently dropped real errors, contradicting the documented 'alerts exactly once' / 'logs the match and skips sending' behavior (libs/error_watcher/src/error_watcher/watcher.py). Fix: split the dedup into a read-only unseen_matches() (computes new lines without mutating seen) and mark_alerted() (records a delivered batch). run_one_poll now records lines only after _alert_random_agent returns a recipient, and _alert_random_agent returns None when the send itself fails so a failed send is retried too. Added regression tests covering re-alert after a no-messageable-agent poll and after a failed send. Co-authored-by: Sculptor <sculptor@imbue.com>

…red" This reverts commit a7d5c33.

…iew #1) Problem: run_one_poll recorded matching lines into the dedup state (via new_matches) before sending the alert. If the alert could not be delivered -- no messageable agent (all STOPPED), an mngr list enumeration failure, or a non-zero mngr message send -- those lines were already marked seen and were never re-alerted, even though the error stayed on screen and a recipient could become available later. This silently dropped real errors, contradicting the intent of REQ-MATCH-3 ("output it has already reported"). Fix: split the dedup into a read-only unseen_matches() (computes new lines without mutating seen) and mark_alerted() (records a delivered batch). run_one_poll now records lines only after _alert_random_agent returns a recipient, and _alert_random_agent returns None when the send itself fails so a failed send is retried too. Added regression tests covering re-alert after a no-messageable-agent poll and after a failed send. Co-authored-by: Sculptor <sculptor@imbue.com>

Previously _alert_random_agent picked exactly one recipient and gave up if the send failed, so one bad pick (e.g. an agent that stopped between `mngr list` and `mngr message`) dropped the alert even when other agents were reachable. Now recipients are tried in uniformly random order: the first pick stays uniform across the pool (REQ-NOTIFY-5), and on a failed send the watcher falls back to the next agent within the same poll. Only when every messageable agent's send fails does it return None (leaving the error unrecorded so a later poll retries it). Added tests for the fallback-to-another-agent and all-agents-fail paths; refocused the failed-send retry test on a single-agent pool. Co-authored-by: Sculptor <sculptor@imbue.com>

select_messageable_names excluded only STOPPED agents, so it could pick the `main`-type system-services agent (no interactive claude inbox, nobody watching) and message it pointlessly, diverging from the cited reference list_claude_agent_names which filters to `type == "claude"`. AgentSummary now carries the agent `type` (parsed from `mngr list --format json`, defaulting to "" when absent), and select_messageable_names keeps only non-STOPPED claude agents. STOPPED remains the sole excluded lifecycle state -- that matches mngr's actual send-path rule (api/message.py rejects only STOPPED), and transient failures for other states are absorbed by the in-poll recipient fallback rather than by pre-filtering states the spec does not call out. Added unit and integration tests covering the type filter. Co-authored-by: Sculptor <sculptor@imbue.com>

Dedup keyed on the exact line string, so an error line carrying a timestamp, counter, or numeric request id ('[12:00:05] ERROR x' then '[12:00:10] ERROR x') counted as a brand-new match every 5s poll and fired a fresh alert to a random agent -- an alert storm the spec's "benign false positive" non-goals do not excuse. Introduced dedup_key(), which collapses runs of digits to '#', and keyed the per-window `seen` set on it (unseen_matches compares keys, mark_alerted records keys). A re-stamped copy of one error now shares a key and alerts once. The tradeoff -- two errors differing only in numbers are treated as one -- is documented and suits a "something errored" nudge. Added unit and integration tests for the volatile-line case and a guard that genuinely different text still alerts. Co-authored-by: Sculptor <sculptor@imbue.com>

The per-window dedup sets only ever grew and window keys were never evicted, so in this permanent process `seen` would accumulate forever -- one entry per window that ever existed, each set growing with every distinct error line. Added prune_seen_windows(), called each poll with the live window list, to drop dedup state for windows that have closed (skipped when enumeration returns empty, which signals a failed list rather than a window-less session). Also capped the keys retained per window at MAX_SEEN_KEYS_PER_WINDOW as a hard memory ceiling -- number-insensitive keys (finding #4) keep this far from being hit in practice. Added unit tests for both bounds and an integration test that a closed window's state is forgotten. Co-authored-by: Sculptor <sculptor@imbue.com>

Two robustness gaps: - _alert_random_agent returned before parsing whenever `mngr list` exited non-zero, so a non-zero exit that still emitted a valid {"agents": [...]} body (e.g. one provider failed) needlessly skipped the alert. It now parses the payload regardless of exit status and only treats a non-zero exit as fatal when no usable agents were parsed. - _default_command_runner mapped every spawn failure and timeout to returncode=1, colliding with a real exit-1, and discarded a timed-out command's partial stdout. It now returns a distinct RUNNER_FAILURE_RETURNCODE (-1, never a real exit status) and preserves whatever a timed-out command emitted. The timeout is now an injectable parameter so the timeout branch is testable without a 30s wait. Added runner tests (success, missing binary, timeout, sentinel contract) and integration tests for the non-zero-list-with-payload and non-zero-list-without- payload cases. Co-authored-by: Sculptor <sculptor@imbue.com>

- Install a SIGHUP handler alongside SIGTERM/SIGINT. The bootstrap manager stops a service with `tmux kill-window`, which delivers SIGHUP, so this is the signal on the real stop path; the process now exits cleanly via the installed handler rather than the default action (review #10). - Add the conventional `if __name__ == "__main__": main()` guard so the module runs via `python -m error_watcher.watcher`, matching the sibling services (review #11). - Lift the signal handler to a module-level `_handle_signal` and add a test that it exits 0, closing part of the I/O-shell test gap (review #8). The runner and alert-failure branches that finding also cited are already covered by the tests added for review #1, #2, and #6. Co-authored-by: Sculptor <sculptor@imbue.com>

Problem: _default_command_runner's TimeoutExpired branch guarded the partial payload with isinstance(e.stdout, str), but on the timeout path subprocess hands the buffered output back as bytes even under text=True (it skips the decode it does on a normal return). The guard was therefore always False on a real timeout, so the partial stdout/stderr was always discarded -- the exact loss finding #6's code claimed to prevent (e.g. a complete agent list from a hung mngr list was thrown away). The only timeout test emitted no output, so it never caught this. Fix: decode the bytes payload via a small _decode_timeout_output helper (None -> "", bytes -> decode(errors="replace") so a truncated multibyte char can't raise into the never-raise runner, str -> as-is) and use it for both stdout and stderr. Added a regression test that prints before hanging and asserts the partial stdout survives the timeout. Co-authored-by: Sculptor <sculptor@imbue.com>

Refactor the error-watcher service into three loosely-coupled layers so the source of errors and the destination of alerts can each be swapped without touching the core logic: - inputs.py: ErrorInput (ABC) + TmuxWindowErrorInput, wrapping the tmux session-discovery and pane-capture work behind a source-agnostic ErrorReading. A systemd/journald input is a drop-in sibling. - routing.py: ErrorRouter, the main work between the two layers -- matching, per-source number-insensitive dedup, batching, and delivery-gated dedup bookkeeping. Depends only on the layer interfaces. - outputs.py: ErrorOutput (ABC) -> MngrAgentErrorOutput (mngr delivery with an abstract choose_recipients policy) -> RandomMngrAgentErrorOutput. A future error-fixing recipient policy is a one-method subclass. - commands.py: the shared CommandRunner subprocess seam. - watcher.py: wires the three layers and runs the poll loop. Behavior is unchanged; the alert wording is now source-agnostic. Tests reorganized per module (77 passing, including 14 ratchets). Co-authored-by: Sculptor <sculptor@imbue.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This commit ONLY exists to satisfy the FCT stop hook, which requires a clean working tree. It snapshots a large, pre-existing, in-progress vendor/mngr subtree sync (new mngr_azure lib, aws-stop/separate-snapshots/test-hang changelog reshuffles, provider edits, plus the regenerated uv.lock) that was already present in this worktree and was NOT authored or reviewed by me. My own task -- the error_watcher input/routing/output refactor -- is the separate commit immediately before this one (f68318e). Whoever is driving the vendor sync can recover their uncommitted state with 'git reset --soft HEAD^' (or 'git reset HEAD^' to also unstage); nothing is lost. Co-authored-by: Sculptor <sculptor@imbue.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Problem: outputs.py, outputs_test.py, and routing_test.py were not formatted to the project's ruff standard, so `ruff format --check libs/error_watcher/` failed (an over-parenthesized single-line f-string header plus several over-88-char lines the formatter would wrap). The repo enforces ruff and CI runs the format check, so the branch failed formatting CI. Fix: ran `ruff format libs/error_watcher/` to apply canonical formatting to the three files (whitespace/line-wrapping only, no semantic change). Co-authored-by: Sculptor <sculptor@imbue.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Problem: prune_seen_sources rebuilt set(live_sources) on every element of the seen dict (it lived inside the comprehension's membership test), making each prune O(len(seen) * len(live_sources)) per poll in the permanent watcher process. Fix: compute the live-source set once before the comprehension and test against it, restoring O(len(seen) + len(live_sources)). The deletion list is still materialized before any del, so the snapshot-then-mutate behavior is unchanged. Co-authored-by: Sculptor <sculptor@imbue.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ments Problem: Eight comments/docstrings across commands.py, routing.py, outputs.py, commands_test.py, and outputs_test.py back-referenced numbered code-review findings ("(finding #5)", "(review finding #4)", "(finding #6)"). Those numbers are defined only in vendor/mngr/specs/window-error-watcher/review.md, an ephemeral per-iteration review document that does not ship with the wheel, so the parentheticals are dangling pointers to a past review process rather than descriptions of current behavior (user_request_artifacts_left_in_code). Fix: Stripped only the "(finding #N)" / "(review finding #N)" fragments, preserving the surrounding explanatory prose. The legitimate REQ-* spec requirement references are intentionally left untouched. No code logic changed. Co-authored-by: Sculptor <sculptor@imbue.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…cher comments" This reverts commit de806aa.

This reverts commit 39e2cb0.

Move the concrete tmux input and mngr output out of inputs.py/outputs.py so each interface file holds only its contract: - inputs.py keeps the ErrorInput ABC and the ErrorReading/ErrorSource value types; TmuxWindowErrorInput and its tmux helpers move to a new tmux_window_error_input.py. - outputs.py keeps the ErrorOutput ABC and the ErrorAlert value type; the mngr delivery classes and helpers move to a new mngr_agent_error_output.py. Repoint imports in watcher.py and the tests, rename the unit tests to match their new source files, and update the README architecture section. No behavior changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Sculptor <sculptor@imbue.com>

Preston Seay and others added 22 commits June 16, 2026 12:15

Revert "Only mark errors as alerted once the alert is actually delive…

a2239cb

…red" This reverts commit a7d5c33.

Revert "Remove dangling code-review finding references from error_wat…

ac10f4a

…cher comments" This reverts commit de806aa.

Revert "WIP: snapshot in-progress vendor/mngr sync (not my work)"

9185978

This reverts commit 39e2cb0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

error_watcher: scan tmux windows for errors and alert a random agent#171

error_watcher: scan tmux windows for errors and alert a random agent#171
pseay-imbue wants to merge 22 commits into
mainfrom
preston/error-checker

pseay-imbue commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pseay-imbue commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review resolution (post-review fixes)

Design notes

Testing

Update — refactor into input / routing / output layers

Branch hygiene

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pseay-imbue commented Jun 16, 2026 •

edited

Loading