fix(heartbeat): escalate stranded issue when recovery retry succeeds without execution path#4459
Conversation
…without execution path The reconciler queues an issue_continuation_needed (or assignment_recovery) retry when an assigned issue has no live execution path. The escalation gate previously only tripped on failed/cancelled/timed_out terminal statuses, so a recovery run that exited successfully (e.g., posted a comment and returned) without re-establishing a real execution path would leave the issue in the same state, causing the reconciler to re-queue another recovery run on every tick (default 30s). This produced an indefinite loop until manual intervention. The hasActiveExecutionPath check earlier in the same branch already guarantees we only reach this guard when the issue is still stranded, so any terminal status of the recovery retry — including succeeded — should trigger escalation to blocked. Rename didAutomaticRecoveryFail to didAutomaticRecoveryExhaust to reflect that succeeded retries are now also considered exhausted.
Greptile SummaryThis PR fixes an infinite recovery loop in the stranded-issue reconciler: when a heartbeat recovery run exits with The PR description is detailed and technically correct, but it does not follow the required PR template from Confidence Score: 4/5Safe to merge after cleaning up unused constants and filling in the PR template; the fix is correct and well-tested. Only P2 findings: one dead constant left in two files and a missing PR template. The core logic change is correct, symmetrically applied, and the updated test explicitly verifies the new escalation path. Minor cleanup needed in server/src/services/recovery/service.ts and server/src/services/heartbeat.ts to remove the now-unused UNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSES constant. Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: server/src/services/recovery/service.ts
Line: 37
Comment:
**Unused constant left behind**
`UNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSES` is declared here but is no longer referenced anywhere in `recovery/service.ts` after the swap to `HEARTBEAT_RUN_TERMINAL_STATUSES`. The same dead constant also remains in `heartbeat.ts` (line 154). Both can be removed to avoid confusion about which set of statuses represents an "unsuccessful" terminal state.
```suggestion
const HEARTBEAT_RUN_TERMINAL_STATUSES = ["succeeded", "failed", "cancelled", "timed_out"] as const;
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(heartbeat): escalate stranded issue ..." | Re-trigger Greptile |
| @@ -35,6 +35,7 @@ import { isAutomaticRecoverySuppressedByPauseHold } from "./pause-hold-guard.js" | |||
|
|
|||
| const EXECUTION_PATH_HEARTBEAT_RUN_STATUSES = ["queued", "running", "scheduled_retry"] as const; | |||
| const UNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSES = ["failed", "cancelled", "timed_out"] as const; | |||
There was a problem hiding this comment.
UNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSES is declared here but is no longer referenced anywhere in recovery/service.ts after the swap to HEARTBEAT_RUN_TERMINAL_STATUSES. The same dead constant also remains in heartbeat.ts (line 154). Both can be removed to avoid confusion about which set of statuses represents an "unsuccessful" terminal state.
| const UNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSES = ["failed", "cancelled", "timed_out"] as const; | |
| const HEARTBEAT_RUN_TERMINAL_STATUSES = ["succeeded", "failed", "cancelled", "timed_out"] as const; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: server/src/services/recovery/service.ts
Line: 37
Comment:
**Unused constant left behind**
`UNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSES` is declared here but is no longer referenced anywhere in `recovery/service.ts` after the swap to `HEARTBEAT_RUN_TERMINAL_STATUSES`. The same dead constant also remains in `heartbeat.ts` (line 154). Both can be removed to avoid confusion about which set of statuses represents an "unsuccessful" terminal state.
```suggestion
const HEARTBEAT_RUN_TERMINAL_STATUSES = ["succeeded", "failed", "cancelled", "timed_out"] as const;
```
How can I resolve this? If you propose a fix, please make it concise.…overy service Remove the parallel copy of didAutomaticRecoveryExhaust from heartbeat.ts and export the canonical implementation from recovery/service.ts. The execution-path recovery caller in heartbeat.ts now imports it directly, completing the consolidation started when reconcileStrandedAssignedIssues was moved into recovery/service.ts.
Problem
When an assigned
in_progressissue has no live execution path, the stranded-issue reconciler queues a recovery retry withretryReason: issue_continuation_needed. The escalation guard —didAutomaticRecoveryFail— was supposed to detect that the retry was exhausted and escalate the issue toblocked. However, the guard only checked for unsuccessful terminal statuses (failed,cancelled,timed_out). If the recovery run exited withsucceeded(e.g., the agent posted a comment and returned cleanly without re-establishing an execution path), the guard returnedfalse, the issue remainedin_progresswith no execution path, and the reconciler re-queued another recovery run on the next scheduler tick (default 30 s). This produced an indefinite loop that could only be stopped by manual intervention.The same logic applies to the
assignment_recoverypath fortodoissues.Root cause
The call site checks
hasActiveExecutionPathbefore calling this guard, so by the time we reach the escalation check the issue is already known to be stranded. A succeeded recovery run that left the issue stranded is semantically exhausted — it should escalate, not re-queue.Fix
Two commits:
Commit 1 (
f9f382ec): ReplaceUNSUCCESSFUL_HEARTBEAT_RUN_TERMINAL_STATUSESwithHEARTBEAT_RUN_TERMINAL_STATUSES(which includessucceeded) inrecovery/service.tsandheartbeat.ts. RenamedidAutomaticRecoveryFail→didAutomaticRecoveryExhaustto reflect that succeeded retries are now also considered exhausted.Commit 2 (
14604ce8): Remove the parallel copy ofdidAutomaticRecoveryExhaustfromheartbeat.ts. Export the canonical implementation fromrecovery/service.tsand import it at the one remaining call site inheartbeat.ts. Completes the consolidation that was started whenreconcileStrandedAssignedIssueswas moved intorecovery/service.ts.Reproducer
in_progressstate with no activeexecution_run_id.heartbeat_runfor that issue withstatus = succeededandcontextSnapshot.retryReason = "issue_continuation_needed".reconcileStrandedAssignedIssues().blocked.Testing
blocked).heartbeat-process-recovery.test.tspass.pnpm --filter @paperclipai/server typecheckpasses.expectStrandedRecoveryArtifactshelper exists in test file at line 572, outside the diff window; (2) P2 false positive — thetodo-branch early-exit atrecovery/service.ts:1456(latestRun.status === "succeeded" → skip) fires beforedidAutomaticRecoveryExhaustis called, so the concern abouttodoescalation does not apply. No actionable findings remained after analysis.