Skip to content

fix: daemon idle reaper checks work bead assignee before killing polecats#3294

Merged
steveyegge merged 2 commits intogastownhall:mainfrom
dannomayernotabot:polecat/daemon-reaper
Mar 28, 2026
Merged

fix: daemon idle reaper checks work bead assignee before killing polecats#3294
steveyegge merged 2 commits intogastownhall:mainfrom
dannomayernotabot:polecat/daemon-reaper

Conversation

@dannomayernotabot
Copy link
Copy Markdown
Contributor

Summary

  • The idle polecat reaper checked the agent bead's hook_bead field to determine if a polecat had active work, but updateAgentHookBead() was made a no-op (declaring work bead assignee as authoritative). This caused the reaper to kill working polecats whose agent bead hook_bead pointed to stale/closed beads from previous swarms.
  • Adds hasAssignedOpenWork() fallback: before killing a polecat as "working-no-hook", queries bd list --rig=<rig> --assignee=<polecat> --status=hooked (and in_progress, open) to check the authoritative source. If any assigned open work bead exists, the polecat is protected.
  • Validated: 20-minute test with 13 polecats — zero reaper kills (was 4+ before fix).

Context

The sling code and daemon reaper disagree about source of truth:

  • sling_helpers.go:559: updateAgentHookBead is a no-op, declares "work bead assignee is authoritative"
  • daemon.go:2299: idle reaper reads agent bead hook_bead to decide whether to kill

When polecats are reused (same name, new work), the stale hook_bead on the agent bead points to a closed bead from a previous dispatch, causing the reaper to kill the polecat after 15 minutes.

Test plan

  • Validated live: slung 13 beads to reused polecat names, waited 20 min past 15-min reaper threshold, zero reaper kills
  • Before fix: daemon log showed Reaping idle polecat ... (state=working-no-hook) at 15 min
  • After fix: daemon reaper log completely empty
  • Build passes (go build ./...)

🤖 Generated with Claude Code

@github-actions github-actions bot added the status/needs-triage Inbox — we haven't looked at it yet label Mar 26, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 26, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
9395 1 9394 48
View the top 1 failed test(s) by shortest run time
github.com/steveyegge/gastown/internal/daemon::TestReapIdlePolecat_ReapsIdleNoHook
Stack Traces | 0s run time
=== RUN   TestReapIdlePolecat_ReapsIdleNoHook
    polecat_health_test.go:492: expected idle polecat with no agent to be reaped, got: ""
    polecat_health_test.go:495: expected working-no-hook reason, got: ""
--- FAIL: TestReapIdlePolecat_ReapsIdleNoHook (0.00s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

dannomayernotabot and others added 2 commits March 27, 2026 19:16
… binaries

- Add nolint:unparam to writeTemplate for unused hooksDir parameter
- Update TestBuildRefineryPatrolVars_FullConfig to expect judgment_enabled
  and review_depth vars added to buildRefineryPatrolVars
- Add dummy opencode binary to PATH in TestRunHooksSyncNonClaudeAgent so
  agent resolution doesn't fall back to claude
- Add dummy agent binaries to scaffoldWorkspace in doctor tests so
  template agent sync checks can actually detect non-Claude agents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cats

The idle reaper checked agent bead hook_bead to determine if a polecat
had active work, but updateAgentHookBead was made a no-op (declaring
work bead assignee as authoritative). This caused the reaper to kill
working polecats whose agent bead hook_bead pointed to stale/closed
beads from previous swarms.

Add hasAssignedOpenWork() fallback: before killing a polecat as
"working-no-hook", query bd list for beads assigned to this polecat
with hooked/in_progress/open status. If any exist, the polecat has
active work and should not be reaped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
steveyegge added a commit that referenced this pull request Mar 27, 2026
…lling polecats

Resolve conflicts and merge dannomayernotabot's fix (PR #3294):
- Adds hasAssignedOpenWork() to query authoritative work bead assignee
  before reaper kills polecats with stale hook_bead fields
- Keeps existing IsAgentRunning process-level guard from GH#3342
- Test fixes: dummy binaries on PATH for agent resolution fallback
- patrol_helpers_test: add judgment_enabled/review_depth expected vars

Co-authored-by: Danno <dannomayernotabot@users.noreply.github.com>

(gt-2fr)
steveyegge added a commit that referenced this pull request Mar 27, 2026
Three test infrastructure issues in polecat_health_test.go:
- writeFakeTestBD: return [] for 'list' commands so hasAssignedOpenWork
  doesn't false-positive on agent bead JSON
- writeFakeTmuxWithAgent: use $* glob matching instead of $1, since
  tmux.run() prepends -u flag before the subcommand
- TestReapIdlePolecat_ReapsIdleNoHook: write heartbeat to .runtime/heartbeats/
  (not heartbeats/) and use NewTmuxWithSocket to avoid socket mismatch

Also register 'myr' test prefix in both reap tests so session name
resolves correctly (PrefixFor fallback returns 'gt', not 'myr').

(gt-2fr)
htristan pushed a commit to htristan/gastown that referenced this pull request Mar 27, 2026
…e before killing polecats

Resolve conflicts and merge dannomayernotabot's fix (PR gastownhall#3294):
- Adds hasAssignedOpenWork() to query authoritative work bead assignee
  before reaper kills polecats with stale hook_bead fields
- Keeps existing IsAgentRunning process-level guard from GH#3342
- Test fixes: dummy binaries on PATH for agent resolution fallback
- patrol_helpers_test: add judgment_enabled/review_depth expected vars

Co-authored-by: Danno <dannomayernotabot@users.noreply.github.com>

(gt-2fr)
htristan pushed a commit to htristan/gastown that referenced this pull request Mar 27, 2026
…merge

Three test infrastructure issues in polecat_health_test.go:
- writeFakeTestBD: return [] for 'list' commands so hasAssignedOpenWork
  doesn't false-positive on agent bead JSON
- writeFakeTmuxWithAgent: use $* glob matching instead of $1, since
  tmux.run() prepends -u flag before the subcommand
- TestReapIdlePolecat_ReapsIdleNoHook: write heartbeat to .runtime/heartbeats/
  (not heartbeats/) and use NewTmuxWithSocket to avoid socket mismatch

Also register 'myr' test prefix in both reap tests so session name
resolves correctly (PrefixFor fallback returns 'gt', not 'myr').

(gt-2fr)
@steveyegge steveyegge merged commit a0f192c into gastownhall:main Mar 28, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/needs-triage Inbox — we haven't looked at it yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants