Make routine dispatch failures visible#3599
Make routine dispatch failures visible#3599qazsero wants to merge 7 commits intopaperclipai:masterfrom
Conversation
Greptile SummaryThis PR adds fail-safe dispatch handling for scheduled and webhook routines: when an automated run fails to dispatch, it creates a visible blocked issue assigned to the routine assignee and links it to the run record. It also adds two regression tests (stale issue-counter correction and failure-blocker linkage) and documents the behavior in the API docs.
Confidence Score: 4/5Not safe to merge until the transactional atomicity bug in One P1 finding: the blocker issue is written to the DB outside the enclosing transaction, which can produce an orphaned visible issue if the transaction rolls back. Additionally, the PR description is missing required template sections (Thinking Path, Risks, Model Used, Checklist). server/src/services/routines.ts — specifically the Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: server/src/services/routines.ts
Line: 698-708
Comment:
**Blocker issue created outside the enclosing transaction**
`issueSvc.create` is called with the outer `db` connection, not `txDb`, so the INSERT is committed immediately and independently of the surrounding `db.transaction()`. If either of the subsequent `finalizeRun(…, txDb)` or `updateRoutineTouchedState(…, txDb)` calls throw (e.g. a constraint violation on `routine_runs`), the transaction rolls back and `createdRun` is gone — but the blocker issue already exists in the DB, referencing a non-existent run ID in its description. The fix is to thread `txDb` through to `createAutomatedFailureIssue` and pass it to `issueSvc.create`:
```typescript
async function createAutomatedFailureIssue(
input: { ... },
dbConn: Db,
) {
if (input.source !== "schedule" && input.source !== "webhook") return null;
return issueSvc.create(input.routine.companyId, { ... }, dbConn);
}
```
and at the call site inside the catch block:
```typescript
failureIssue = await createAutomatedFailureIssue({...}, txDb);
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: make routine dispatch failures visi..." | Re-trigger Greptile |
| return issueSvc.create(input.routine.companyId, { | ||
| projectId: input.routine.projectId, | ||
| goalId: input.routine.goalId, | ||
| parentId: input.routine.parentIssueId, | ||
| title: `Blocked routine: ${input.routine.title}`, | ||
| description: formatRoutineFailureDescription(input), | ||
| status: "blocked", | ||
| priority: input.routine.priority, | ||
| assigneeAgentId: input.routine.assigneeAgentId, | ||
| originKind: "manual", | ||
| }); |
There was a problem hiding this comment.
Blocker issue created outside the enclosing transaction
issueSvc.create is called with the outer db connection, not txDb, so the INSERT is committed immediately and independently of the surrounding db.transaction(). If either of the subsequent finalizeRun(…, txDb) or updateRoutineTouchedState(…, txDb) calls throw (e.g. a constraint violation on routine_runs), the transaction rolls back and createdRun is gone — but the blocker issue already exists in the DB, referencing a non-existent run ID in its description. The fix is to thread txDb through to createAutomatedFailureIssue and pass it to issueSvc.create:
async function createAutomatedFailureIssue(
input: { ... },
dbConn: Db,
) {
if (input.source !== "schedule" && input.source !== "webhook") return null;
return issueSvc.create(input.routine.companyId, { ... }, dbConn);
}and at the call site inside the catch block:
failureIssue = await createAutomatedFailureIssue({...}, txDb);Prompt To Fix With AI
This is a comment left during a code review.
Path: server/src/services/routines.ts
Line: 698-708
Comment:
**Blocker issue created outside the enclosing transaction**
`issueSvc.create` is called with the outer `db` connection, not `txDb`, so the INSERT is committed immediately and independently of the surrounding `db.transaction()`. If either of the subsequent `finalizeRun(…, txDb)` or `updateRoutineTouchedState(…, txDb)` calls throw (e.g. a constraint violation on `routine_runs`), the transaction rolls back and `createdRun` is gone — but the blocker issue already exists in the DB, referencing a non-existent run ID in its description. The fix is to thread `txDb` through to `createAutomatedFailureIssue` and pass it to `issueSvc.create`:
```typescript
async function createAutomatedFailureIssue(
input: { ... },
dbConn: Db,
) {
if (input.source !== "schedule" && input.source !== "webhook") return null;
return issueSvc.create(input.routine.companyId, { ... }, dbConn);
}
```
and at the call site inside the catch block:
```typescript
failureIssue = await createAutomatedFailureIssue({...}, txDb);
```
How can I resolve this? If you propose a fix, please make it concise.|
Staff review: changes requested before QA.
Local verification from Staff passed: Note: GitHub would not allow this token to submit a formal request-changes review because it is treated as the PR author, so I am leaving the blocker as a PR comment and routing the Paperclip issue back to Builder. |
|
Staff review: changes requested before QA.
The routine fail-safe changes look materially improved from the previous review; I am holding this at Staff only because of the unrelated release-note deletion. |
|
Staff review: approved for QA.
Next gate is QA validation for the screenshot-backed routine/run behavior requested by VEN-314. |
|
🌱 gardener · ✅
Context fit
Tree nodes referenced
No concerns at the tree-alignment layer. The @qazsero staff review flags a legitimate edge case (stored assignee no longer assignable → failure-blocker creation fails too → no visible blocker), but that's a code-correctness question within the already-aligned VEN-314 intent. The fix path the reviewer recommends (retry unassigned / fallback owner / guarantee a visible error issue) stays consistent with decision #6; whichever mechanism Builder picks, the tree-level decision doesn't move. Reviewed commit: 🌱 Posted by repo-gardener — an open-source context-aware review bot built on First-Tree. Reviews this repo against serenakeyitan/paperclip-tree, a user-maintained context tree. Not affiliated with this project's maintainers. |
Staff review: changes requestedThis is close, but I do not think it is safe for QA yet. Blocking issues:
Please either make those side effects atomic, or add compensating cleanup/retry behavior for both the wakeup artifacts and schedule claim, with regression coverage that forces the post-wakeup finalization failure path. Verification I ran on a merge-result worktree against current
I did not run broader gates after finding the blocking review issue. |
Staff ReviewNot ready for QA yet. The latest patch still does not cover the production wakeup lifecycle for the post-wakeup finalization failure case. Blocker:
What I would expect before QA:
Verification I ran:
|
Staff ReviewNot ready for QA yet. The latest patch improves the steady queued and steady running cases, but there is still a race in the cleanup path. Blocking issue:
Expected before QA:
Verification I ran:
|
Staff ReviewApproved for QA.
Verification run locally on
GitHub checks are green at review time: Next owner: QA Engineer for the screenshot-backed routine/run validation gate on VEN-314. |
Staff ReviewChanges requested before QA. Blocking issue:
Expected before QA:
Verification I ran:
GitHub checks are green, but this fail-safe edge is still inside the VEN-314 acceptance criteria, so I am routing the Paperclip issue back to Builder. |
d2b0ef1 to
8134189
Compare
|
Staff review approved for QA. Reviewed refreshed head Checked:
Verification:
Residual QA gate: validate the live visual routine path with browser/screenshot evidence or a precise blocked visual-review issue. The code change cannot by itself prove the live Product Designer/Growth routine prompts in production. |
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
8134189 to
3329965
Compare
Thinking Path
What Changed
failedruns linked to blocked issues.issueService.create()if the enclosing dispatch transaction later rolls back.Verification
pnpm exec vitest run server/src/__tests__/routines-service.test.tspassed: 22 tests.pnpm exec tsc --noEmit --pretty false --project server/tsconfig.jsonpassed.pnpm run typecheckpassed.pnpm buildpassed.env -u DATABASE_URL PATH="/Users/fabiannitka/.nvm/versions/node/v24.12.0/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/bin:/bin:/usr/sbin:/sbin" pnpm test:runpassed: 229 files, 1259 passed, 1 skipped.Risks
issueService.create()owns its own transaction; regression coverage forces rollback after blocker creation and asserts no orphan issues remain.Model Used
Checklist