Fix deferred-cleanup drain: tolerate can't find session + run reaper on 4h health check#560
Conversation
… 4h cadence reaper
Defect 1 completion — the benign-tmux-no-op matcher in processDeferredCleanups
only tolerated "no server running" / "session not found", so on a host with a
*running* tmux server `tmux kill-session -t <missing>` returns
`can't find session: <name>`, which flipped retryableFailure and pinned an
otherwise-complete entry in cleanup-pending.json forever (green in CI with no
tmux server, red on a dev box). Extract the three benign "session already gone"
shapes into an exported `isBenignTmuxKillStderr` helper and tolerate all three
as a class. Refactor the spawn import to a namespace import so the non-benign
retain path is testable (single call site).
Defect 2 completion — the 4h health-check reaper call already existed in
resolveQueueRequestCommand; add the AC(c) tests proving the executed path ticks
processDeferredCleanups exactly once, and that a gate-skipped tick and the peek
path do NOT (reaping rides the executed cadence only).
Tests: isBenignTmuxKillStderr (all 3 benign + non-benign/empty); non-benign
tmux failure retains; explicit harnessDir threads into serverStatus({harnessDir});
AC(c) positive + gate-skip + peek negatives. Full suite 2870 pass / 0 fail
(was 2862/1). No data-shape change; backlog self-drains (no manual purge).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review Focus on bugs, correctness issues, and edge cases. Do not check adherence to a spec or plan. |
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Suggest Refactor — task-703d0553 (deferred-cleanup drain)What I'd do differently next time, and follow-ups worth considering. 1. A shared "benign external-CLI stderr" classifier instead of ad-hoc
|
Summary
Fixes the deferred-cleanup queue (
mag/cleanup-pending.json) never draining — orphan tmux sessions / worktrees / branches from long-merged tasks piled up (360-entry backlog going back to 2026-05-14). Two defects:Drain resilience — the reaper re-queued the whole entry whenever any sub-step failed. The dominant driver was the t3code arm marking failure while integration is paused (server intentionally down, gh-ludics-539). Replaced the all-or-nothing
failedflag with a per-stepretryableFailure, and made the paused-t3code arm a skip, not a failure (consultst3codeIntegrationEnabled()before probing the server).Cadence — the reaper ran from only one place (briefing precompute, ~1×/day). Added a second trigger on the executed 4h health-check path so reaping is evenly spaced; the briefing call is preserved.
The round-1 completion (this PR's delta)
The above shipped in the first commit but left a host-dependent gap:
tmux kill-session -t <missing>returnscan't find session: <name>when a tmux server is running (dev/worker box), which the benign matcher (no server running/session not found) did not tolerate — so a benign no-op flippedretryableFailureand pinned otherwise-complete entries forever (green in CI, red on a live host). This PR:isBenignTmuxKillStderrhelper tolerating all three "session already gone" shapes as a class.safeSyncOutputcall indeferred-cleanup.tsto a namespace import so the non-benign retain path is spy-testable.Tests
isBenignTmuxKillStderr— all three benign shapes + non-benign/empty negativesharnessDirthreads intoserverStatus({ harnessDir }); within-grace not reaped earlyFull suite 2870 pass / 0 fail (was 2862 / 1).
typecheck, all 15 CI lints, andbun run buildclean.Scope
No manual edit of
mag/cleanup-pending.json— the backlog self-drains once the fix ships. No change tocleanup_delay_hours/ 25h default / 72h cap. t3code stays paused. No data-shape change (CleanupEntryunchanged).Proposal:
docs/proposals/deferred-cleanup-drain-and-cadence.md🤖 Generated with Claude Code