fix(orchestrator): resume interactive workflows on chat platforms#1756
fix(orchestrator): resume interactive workflows on chat platforms#1756Wirasm wants to merge 3 commits into
Conversation
) Interactive approval-gate and interactive-loop workflows started from Slack, Telegram, Discord, or GitHub never resumed after the user provided their answer — each approval response triggered a brand-new workflow run from node 0 in a fresh worktree, re-asking the same questions indefinitely. The cause was a `platform.getPlatformType() === 'web'` gate that wrapped the entire resume-detection block in `dispatchOrchestratorWorkflow`, leaving all chat platforms to unconditionally fall through to a fresh `executeWorkflow`. The chat-side `resumeRun` mechanism that previously handled this was removed in #915 (natural-language approval routing) without lifting the resume lookup out of the web branch. Changes: - Restructure dispatchOrchestratorWorkflow so resume detection (findResumableRunByParentConversation + hydrateResumableRun) runs for every platform; only the background-dispatch branch remains web-only - Add codebaseId parameter to findResumableRunByParentConversation so persistent chat conversation IDs (Telegram chat_id, Slack thread) cannot resume a stale run from a different project - Add tests for chat resume, codebase scoping, and fresh-run fallback Fixes #1741
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Comprehensive PR ReviewPR: #1756 — fix(orchestrator): resume interactive workflows on chat platforms SummaryThe PR cleanly lifts the interactive-workflow resume block out of the web-only gate and applies it to all platforms, with correct Verdict:
🟡 Medium Issues (Needs Decision)Missing test: web non-interactive + resumable run dispatch priority📍 The refactor moved resume detection before the View recommended test (LOW effort — copy-paste of existing pattern)test('web non-interactive workflow with resumable run resumes foreground (not background)', async () => {
mockGetOrCreateConversation.mockReturnValueOnce(Promise.resolve(makeDispatchConversation()));
mockGetCodebase.mockReturnValueOnce(Promise.resolve(makeDispatchCodebase()));
mockHandleCommand.mockReturnValueOnce(Promise.resolve(makeWorkflowResult(undefined))); // non-interactive
mockFindResumableRunByParentConversation.mockReturnValueOnce(
Promise.resolve({
id: 'web-noninteractive-resume-1',
workflow_name: 'test-workflow',
working_path: '/repos/test-repo/worktrees/web-feature',
parent_conversation_id: 'conv-1',
status: 'paused',
})
);
const platform = makePlatform(); // getPlatformType returns 'web'
await handleMessage(platform, 'conv-1', '/workflow run test-workflow');
expect(mockHydrateResumableRun).toHaveBeenCalled();
expect(mockExecuteWorkflow).toHaveBeenCalled();
expect(mockDispatchBackgroundWorkflow).not.toHaveBeenCalled();
const callArgs = mockExecuteWorkflow.mock.calls[0] as unknown[];
expect(callArgs[3]).toBe('/repos/test-repo/worktrees/web-feature');
});🟢 Low IssuesView 4 low-priority observationsL1 —
Fix: Add L2 — DB resume lookup failure now blocks all platforms (behavioral scope expansion, not a bug) Previously a transient DB error only affected web dispatches. After the fix it blocks all platforms. This is correct per the fail-fast principle — launching fresh when a resumable run might exist risks duplicate worktrees. Flagged for awareness only; leave as-is. L3 — "…starting fresh in the same worktree" message now shown on chat platforms (cosmetic) Pre-existing message, technically accurate. "Worktree" is opaque to chat users but not misleading. Out of scope for this PR. L4 — GitHub platform not explicitly tested for chat resume path Telegram/Slack/Discord are exercised by the 3 new tests. GitHub shares the same What's Good
Reviewed by Archon |
…ive resume test - Add hydrateResumableRun to executor mock in orchestrator.test.ts to mirror the real module exports and prevent opaque TypeErrors for future test contributors - Add test asserting that a web non-interactive workflow with a resumable run resumes foreground rather than dispatching a fresh background run, pinning the priority order of the if/else if dispatch block
⚡ Self-Fix Report (Aggressive)Status: COMPLETE Fixes Applied (2 total)
View all fixes
Tests Added
Skipped (3)
Suggested Follow-up Issues(none) Validation✅ Type check | ✅ Lint | ✅ Tests (all packages, 0 failures) Self-fix by Archon · aggressive mode · fixes pushed to |
Review SummaryVerdict: minor-fixes-needed This PR fixes a long-standing bug where chat platforms (Slack, Telegram, Discord, GitHub) always started a fresh workflow run instead of resuming a paused one after an approval gate. The implementation is clean, the Blocking issues
Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact. |
Review SummaryVerdict: ready-to-merge This PR lifts the Blocking issuesNone. Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality. |
Summary
findResumableRunByParentConversation→hydrateResumableRun→ resume path) out of theif (platform === 'web')gate indispatchOrchestratorWorkflowso it runs for all platforms. Addedcodebase_idscoping to the resume query to prevent cross-project resume on persistent chat conversation IDs.hydrateResumableRunis unchanged.getPausedWorkflowRun(natural-language approval interceptor) is unchanged. Issue C from the reporter (codebase name resolution) is out of scope.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
orchestrator-agent.ts:dispatchOrchestratorWorkflowworkflowDb.findResumableRunByParentConversationcodebaseIdas 3rd argworkflows.ts:findResumableRunByParentConversationAND codebase_id = $3orchestrator-agent.ts:dispatchOrchestratorWorkflowexecuteWorkflowworking_path; fresh path: called withcwdorchestrator-agent.ts:dispatchOrchestratorWorkflowdispatchBackgroundWorkfloworchestrator-agent.ts:dispatchOrchestratorWorkflowhydrateResumableRunLabel Snapshot
risk: lowsize: Scorecore:orchestrator,core:dbChange Metadata
bugcoreLinked Issue
Validation Evidence (required)
All six checks passed:
check:bundledcheck:bundled-skilltype-checklintformat:checktestNew tests added to
orchestrator-agent.test.ts:chat resume: resumes a paused run on chat platform when one existschat resume: scopes resume query to (workflow, conversation, codebase)chat resume: starts fresh run when no resumable run exists on chat platformEvidence provided: All automated checks passed as listed above.
Intentionally skipped: None.
Security Impact (required)
Compatibility / Migration
codebase_idfilter; all callers already havecodebase.idavailable.codebase_idis an existing column onremote_agent_workflow_runs; no schema changes.Human Verification (required)
Automated CI covers the logic paths via the three new unit tests. Manual end-to-end verification requires a live Slack/Telegram bot with an approval-gate workflow, which was not available in the worktree environment.
working_pathSide Effects / Blast Radius (required)
dispatchOrchestratorWorkflow(all dispatch paths now run the resume lookup),findResumableRunByParentConversation(new requiredcodebaseIdparameter)pausedrun pointing to a deleted worktree will be picked up and fail with a clear error; user canbun run cli workflow abandon <id>to clear it. This was already the behavior on web.hydrateResumableRunreturnsnullif no completed nodes exist, causing a graceful fall-through to a fresh run on the same worktree.Rollback Plan (required)
git revert de5808f2— single commit, no migration to undo.Risks and Mitigations
findResumableRunByParentConversation; if the DB query is slow, chat dispatch latency increases slightly.(workflow_name, parent_conversation_id, codebase_id, status)via existing indexes; expected sub-millisecond latency. The query was already executed on every web dispatch.