Skip to content

fix(heartbeat): prevent false auto-block after successful continuation retry#4440

Open
DeadlySilent wants to merge 5 commits intopaperclipai:masterfrom
DeadlySilent:fix/watchdog-succeeded-continuation-guard
Open

fix(heartbeat): prevent false auto-block after successful continuation retry#4440
DeadlySilent wants to merge 5 commits intopaperclipai:masterfrom
DeadlySilent:fix/watchdog-succeeded-continuation-guard

Conversation

@DeadlySilent
Copy link
Copy Markdown

Summary

  • prevent stranded-work reconciliation from auto-blocking in_progress issues when the latest continuation retry already succeeded
  • add regression test to lock the behavior

Why

reconcileStrandedAssignedIssues() could escalate to blocked based only on a previous retryReason=issue_continuation_needed, even when the latest run had already succeeded. This caused false-positive auto-block churn in live issue lanes.

Changes

  • server/src/services/heartbeat.ts
    • add guard: if latest run status is succeeded, skip continuation escalation path
  • server/src/__tests__/heartbeat-process-recovery.test.ts
    • new test: does not auto-block in-progress work when the latest continuation retry already succeeded

Validation

  • targeted test run:
    • pnpm --filter server test -- heartbeat-process-recovery.test.ts (pass)

Operational context

This patch was applied in hotfix runtime and validated against recurrence conditions observed in IOS lane (IOS-601).

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR fixes a false auto-block regression in heartbeat reconciliation by skipping the continuation escalation path when the latest run already succeeded, and bundles related improvements: stale execution lock repair in routines.ts, a new partial unique index enforcing one open issue per routine, and a new assertCanCreateIssues permission gate on the issues POST route.

  • P1 — isOpenRoutineExecutionConflict misses the new constraint name: the helper only matches "issues_open_routine_execution_uq", so violations of the new "issues_open_routine_origin_uq" index (added in this same PR) will propagate as unhandled errors instead of triggering the coalescing/repair logic.
  • The PR description does not follow the required template from CONTRIBUTING.md — it's missing the Thinking Path, Model Used, Risks, and Checklist sections. Please fill out the full PR template before merging.

Confidence Score: 4/5

Not safe to merge until the constraint name mismatch in isOpenRoutineExecutionConflict is resolved.

The heartbeat guard and test coverage are correct, but the new issues_open_routine_origin_uq constraint added in this PR isn't matched by the conflict-detection helper, meaning live duplicate-key errors from that index will bubble up as unhandled failures. That's a present defect on the changed code path.

server/src/services/routines.ts — isOpenRoutineExecutionConflict needs to match both issues_open_routine_execution_uq and issues_open_routine_origin_uq.

Important Files Changed

Filename Overview
server/src/services/heartbeat.ts Adds a guard to skip continuation escalation when the latest run already succeeded — minimal, targeted fix for the false auto-block bug.
server/src/services/routines.ts Adds stale execution lock repair path and isOpenRoutineExecutionConflict helper, but the helper only matches the old constraint name (issues_open_routine_execution_uq) and will not catch conflicts from the new issues_open_routine_origin_uq index added in this same PR, causing unhandled errors.
server/src/routes/issues.ts Adds assertCanCreateIssues permission gate to POST /issues; redundant assertCompanyAccess call and open question about non-board/agent actor coverage.
packages/db/src/migrations/0057_open_routine_origin_uniqueness.sql Adds partial unique index issues_open_routine_origin_uq on (company_id, origin_kind, origin_id) for open routine execution issues — migration looks correct.
server/src/tests/heartbeat-process-recovery.test.ts New regression test correctly seeds a succeeded continuation-retry run and asserts the heartbeat reconciler skips escalation — good coverage for the primary fix.
server/src/tests/routines-service.test.ts New test covers stale execution lock repair after a duplicate-key conflict, verifying a fresh issue is created and the old lock is cleared.
packages/db/src/migrations/meta/_journal.json Journal entry for migration 0057 added correctly; trailing newline added.
packages/db/src/schema/issues.ts Schema definition for the new openRoutineOriginIdx unique index matches the migration SQL.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: server/src/services/routines.ts
Line: 63-79

Comment:
**New constraint name not handled**

`isOpenRoutineExecutionConflict` only matches `"issues_open_routine_execution_uq"`, but the migration added in this PR introduces a *second* partial unique index named `"issues_open_routine_origin_uq"` (on `company_id, origin_kind, origin_id`). When `issueSvc.create` violates the new index, Postgres returns `code: "23505"` with `constraint_name: "issues_open_routine_origin_uq"`, which won't match either branch of the check, so `!isOpenExecutionConflict` is `true` and the error is re-thrown unhandled — the run is marked `failed` and the recovery paths are never reached.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: server/src/routes/issues.ts
Line: 1348-1350

Comment:
**Double `assertCompanyAccess` call**

`assertCanCreateIssues` already calls `assertCompanyAccess(req, companyId)` internally (line 405), so the call on line 1348 is redundant. This is harmless today but creates a maintenance hazard if the access logic ever differs between the two callsites.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: server/src/routes/issues.ts
Line: 397-413

Comment:
**`user` actor type falls through to `unauthorized()`**

`assertCanCreateIssues` only explicitly handles `board` and `agent` actors; any other `req.actor.type` (e.g. a `user` actor, if that type exists in the system) hits the terminal `throw unauthorized()`. If `user` actors are ever expected to call `POST /companies/:companyId/issues`, they would receive a 401 instead of a 403. Please confirm this fall-through is intentional and all valid actor types are covered.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(heartbeat): skip continuation auto-b..." | Re-trigger Greptile

Comment thread server/src/services/routines.ts
Comment thread server/src/routes/issues.ts Outdated
Comment thread server/src/routes/issues.ts
DeadlySilent pushed a commit to DeadlySilent/paperclip that referenced this pull request Apr 24, 2026
Fix P1 review comment on PR paperclipai#4440:
- Update isOpenRoutineExecutionConflict to also handle the new
  issues_open_routine_origin_uq constraint added in migration 0057
- Without this fix, violation of the new constraint would be
  re-thrown unhandled, causing runs to be marked failed instead
  of reaching the recovery path

Also remove redundant assertCompanyAccess call (P2):
- assertCanCreateIssues already calls assertCompanyAccess internally
- The call at line 1350 was redundant

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Fix P1 review comment on PR paperclipai#4440:
- Update isOpenRoutineExecutionConflict to also handle the new
  issues_open_routine_origin_uq constraint added in migration 0057
- Without this fix, violation of the new constraint would be
  re-thrown unhandled, causing runs to be marked failed instead
  of reaching the recovery path

Also remove redundant assertCompanyAccess call (P2):
- assertCanCreateIssues already calls assertCompanyAccess internally
- The call at line 1350 was redundant

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant