fix(orchestrator): resume interactive workflows on chat platforms by Wirasm · Pull Request #1756 · coleam00/Archon

Wirasm · 2026-05-25T09:53:30Z

Summary

Problem: Approval-gate and interactive-loop workflows launched from Slack, Telegram, Discord, or GitHub never resumed after a user response — each reply triggered a brand-new run from node 0 in a fresh worktree, discarding all completed work and re-asking the same questions indefinitely.
Why it matters: Every chat-platform user running any interactive or approval-gate workflow was fully broken; only Web worked correctly.
What changed: Lifted the resume-detection block (findResumableRunByParentConversation → hydrateResumableRun → resume path) out of the if (platform === 'web') gate in dispatchOrchestratorWorkflow so it runs for all platforms. Added codebase_id scoping to the resume query to prevent cross-project resume on persistent chat conversation IDs.
What did not change: The background-dispatch path (web + non-interactive, no resumable run) is unchanged. hydrateResumableRun is unchanged. getPausedWorkflowRun (natural-language approval interceptor) is unchanged. Issue C from the reporter (codebase name resolution) is out of scope.

UX Journey

Before

User (Slack)            Archon                      Workflow Engine
────────────            ──────                      ───────────────
sends message ────────▶ handleMessage
                        detects workflow name
                        calls dispatchOrchestratorWorkflow
                          platform === 'slack'
                          → ELSE branch (no resume check)
                          → executeWorkflow(fresh cwd) ──────────▶ starts NEW run from node 0
                                                                   creates NEW worktree
                                                                   re-asks approval question ──▶ user sees duplicate question
                        (prior paused run abandoned, loop restarts)

After

User (Slack)            Archon                      Workflow Engine
────────────            ──────                      ───────────────
sends message ────────▶ handleMessage
                        detects workflow name
                        calls dispatchOrchestratorWorkflow
                          [findResumableRunByParentConversation(name, convId, codebaseId)]
                          → resumable run found (status=paused)
                          → hydrateResumableRun → prepared != null
                          → executeWorkflow(resumableRun.working_path) ──▶ RESUMES from paused node
                                                                           continues in original worktree
                                                                           workflow completes ──────────▶ user sees result

Architecture Diagram

Before

dispatchOrchestratorWorkflow
├── if platform === 'web'
│   ├── findResumableRunByParentConversation(name, convId)  ← resume lookup
│   │   ├── found: hydrateResumableRun → executeWorkflow(working_path)
│   │   └── not found + interactive: executeWorkflow(fresh cwd)
│   └── not found + !interactive: dispatchBackgroundWorkflow
└── else  (slack / telegram / discord / github)
    └── executeWorkflow(fresh cwd)  ← ALWAYS fresh, no resume check

After

dispatchOrchestratorWorkflow
├── [~] findResumableRunByParentConversation(name, convId, codebaseId)  ← ALL platforms
│   ├── found: hydrateResumableRun → executeWorkflow(working_path)
│   └── not found:
│       ├── if platform === 'web' && !interactive: dispatchBackgroundWorkflow
│       └── else: executeWorkflow(fresh cwd)

Connection inventory:

From	To	Status	Notes
`orchestrator-agent.ts:dispatchOrchestratorWorkflow`	`workflowDb.findResumableRunByParentConversation`	modified	Now called for all platforms; adds `codebaseId` as 3rd arg
`workflows.ts:findResumableRunByParentConversation`	PostgreSQL/SQLite	modified	SQL gains `AND codebase_id = $3`
`orchestrator-agent.ts:dispatchOrchestratorWorkflow`	`executeWorkflow`	unchanged	Resume path: called with `working_path`; fresh path: called with `cwd`
`orchestrator-agent.ts:dispatchOrchestratorWorkflow`	`dispatchBackgroundWorkflow`	unchanged	Condition unchanged: web + non-interactive + no resumable run
`orchestrator-agent.ts:dispatchOrchestratorWorkflow`	`hydrateResumableRun`	unchanged	Called only when resume candidate found

Label Snapshot

Risk: risk: low
Size: size: S
Scope: core
Module: core:orchestrator, core:db

Change Metadata

Change type: bug
Primary scope: core

Linked Issue

Closes Chat (Slack/Telegram) approval & interactive-loop workflows never resume — re-ask the same questions forever #1741

Validation Evidence (required)

bun run validate

All six checks passed:

Check	Result
`check:bundled`	✅ Pass — bundled-defaults.generated.ts up to date (36 commands, 20 workflows)
`check:bundled-skill`	✅ Pass — bundled-skill.ts up to date (21 files)
`type-check`	✅ Pass — 0 errors across all 10 packages
`lint`	✅ Pass — 0 errors, 0 warnings (--max-warnings 0)
`format:check`	✅ Pass — all files formatted
`test`	✅ Pass — all packages, 0 failures

New tests added to orchestrator-agent.test.ts:

chat resume: resumes a paused run on chat platform when one exists
chat resume: scopes resume query to (workflow, conversation, codebase)
chat resume: starts fresh run when no resumable run exists on chat platform
Evidence provided: All automated checks passed as listed above.
Intentionally skipped: None.

Security Impact (required)

New permissions/capabilities? No
New external network calls? No
Secrets/tokens handling changed? No
File system access scope changed? No

Compatibility / Migration

Backward compatible? Yes — the resume query gains an additional codebase_id filter; all callers already have codebase.id available.
Config/env changes? No
Database migration needed? No — codebase_id is an existing column on remote_agent_workflow_runs; no schema changes.

Human Verification (required)

Automated CI covers the logic paths via the three new unit tests. Manual end-to-end verification requires a live Slack/Telegram bot with an approval-gate workflow, which was not available in the worktree environment.

Verified scenarios: type-check, lint, format, all unit tests (including new chat-resume tests)
Edge cases checked (by tests): codebase-scoped query call, fresh-run fallback when no resumable run found, paused-run resume with correct working_path
What was not verified: live Slack/Telegram end-to-end round-trip

Side Effects / Blast Radius (required)

Affected subsystems: dispatchOrchestratorWorkflow (all dispatch paths now run the resume lookup), findResumableRunByParentConversation (new required codebaseId parameter)
Potential unintended effects: A stale paused run pointing to a deleted worktree will be picked up and fail with a clear error; user can bun run cli workflow abandon <id> to clear it. This was already the behavior on web.
Guardrails: hydrateResumableRun returns null if no completed nodes exist, causing a graceful fall-through to a fresh run on the same worktree.

Rollback Plan (required)

Fast rollback: git revert de5808f2 — single commit, no migration to undo.
Feature flags: None.
Observable failure symptoms: Approval/loop workflows on chat platforms restart from scratch after user reply (original symptom from Chat (Slack/Telegram) approval & interactive-loop workflows never resume — re-ask the same questions forever #1741).

Risks and Mitigations

Risk: All existing dispatches now call findResumableRunByParentConversation; if the DB query is slow, chat dispatch latency increases slightly.
- Mitigation: The query is indexed on (workflow_name, parent_conversation_id, codebase_id, status) via existing indexes; expected sub-millisecond latency. The query was already executed on every web dispatch.

) Interactive approval-gate and interactive-loop workflows started from Slack, Telegram, Discord, or GitHub never resumed after the user provided their answer — each approval response triggered a brand-new workflow run from node 0 in a fresh worktree, re-asking the same questions indefinitely. The cause was a `platform.getPlatformType() === 'web'` gate that wrapped the entire resume-detection block in `dispatchOrchestratorWorkflow`, leaving all chat platforms to unconditionally fall through to a fresh `executeWorkflow`. The chat-side `resumeRun` mechanism that previously handled this was removed in #915 (natural-language approval routing) without lifting the resume lookup out of the web branch. Changes: - Restructure dispatchOrchestratorWorkflow so resume detection (findResumableRunByParentConversation + hydrateResumableRun) runs for every platform; only the background-dispatch branch remains web-only - Add codebaseId parameter to findResumableRunByParentConversation so persistent chat conversation IDs (Telegram chat_id, Slack thread) cannot resume a stale run from a different project - Add tests for chat resume, codebase scoping, and fresh-run fallback Fixes #1741

coderabbitai · 2026-05-25T09:53:36Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8010f676-f6ef-4ee1-8464-44a9e4faf912

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch archon/task-archon-fix-github-issue-experimental-1779701539931

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Wirasm · 2026-05-25T10:03:39Z

Comprehensive PR Review

PR: #1756 — fix(orchestrator): resume interactive workflows on chat platforms
Reviewed by: 3 specialized agents (code-review, error-handling, test-coverage)
Date: 2026-05-25

Summary

The PR cleanly lifts the interactive-workflow resume block out of the web-only gate and applies it to all platforms, with correct codebase_id scoping to prevent cross-project resumes on persistent chat IDs. All three new code paths are covered by targeted tests. No silent error swallowing introduced.

Verdict: APPROVE

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	0
🟡 MEDIUM	1
🟢 LOW	4

🟡 Medium Issues (Needs Decision)

Missing test: web non-interactive + resumable run dispatch priority

📍 packages/core/src/orchestrator/orchestrator-agent.ts / orchestrator-agent.test.ts

The refactor moved resume detection before the else if (web && !interactive) background-dispatch gate. The only "non-interactive web" test uses a null resumable run. A future refactor could accidentally reintroduce the old guard without test failure — a web user's paused run would silently get a fresh background dispatch instead of resuming.

View recommended test (LOW effort — copy-paste of existing pattern)

test('web non-interactive workflow with resumable run resumes foreground (not background)', async () => {
  mockGetOrCreateConversation.mockReturnValueOnce(Promise.resolve(makeDispatchConversation()));
  mockGetCodebase.mockReturnValueOnce(Promise.resolve(makeDispatchCodebase()));
  mockHandleCommand.mockReturnValueOnce(Promise.resolve(makeWorkflowResult(undefined))); // non-interactive
  mockFindResumableRunByParentConversation.mockReturnValueOnce(
    Promise.resolve({
      id: 'web-noninteractive-resume-1',
      workflow_name: 'test-workflow',
      working_path: '/repos/test-repo/worktrees/web-feature',
      parent_conversation_id: 'conv-1',
      status: 'paused',
    })
  );

  const platform = makePlatform(); // getPlatformType returns 'web'
  await handleMessage(platform, 'conv-1', '/workflow run test-workflow');

  expect(mockHydrateResumableRun).toHaveBeenCalled();
  expect(mockExecuteWorkflow).toHaveBeenCalled();
  expect(mockDispatchBackgroundWorkflow).not.toHaveBeenCalled();
  const callArgs = mockExecuteWorkflow.mock.calls[0] as unknown[];
  expect(callArgs[3]).toBe('/repos/test-repo/worktrees/web-feature');
});

🟢 Low Issues

View 4 low-priority observations

L1 — orchestrator.test.ts executor mock missing hydrateResumableRun
📍 orchestrator.test.ts:166-168

orchestrator-agent.ts imports both executeWorkflow and hydrateResumableRun from @archon/workflows/executor, but orchestrator.test.ts only mocks executeWorkflow. Safe today (all tests use null resumable run, so hydrateResumableRun is never called), but a future test exercising the resume path would get an opaque TypeError: hydrateResumableRun is not a function.

Fix: Add hydrateResumableRun: mock(() => Promise.resolve(null)) to the executor mock block.

L2 — DB resume lookup failure now blocks all platforms (behavioral scope expansion, not a bug)
📍 orchestrator-agent.ts:369-373

Previously a transient DB error only affected web dispatches. After the fix it blocks all platforms. This is correct per the fail-fast principle — launching fresh when a resumable run might exist risks duplicate worktrees. Flagged for awareness only; leave as-is.

L3 — "…starting fresh in the same worktree" message now shown on chat platforms (cosmetic)
📍 orchestrator-agent.ts:406-409

Pre-existing message, technically accurate. "Worktree" is opaque to chat users but not misleading. Out of scope for this PR.

L4 — GitHub platform not explicitly tested for chat resume path

Telegram/Slack/Discord are exercised by the 3 new tests. GitHub shares the same else branch so existing tests provide indirect coverage. Optional completeness addition; risk is low.

What's Good

Scoping is airtight: codebase_id added to both the SQL query and log context — a persistent Telegram chat_id spanning two projects cannot accidentally resume the wrong project's run.
Log improved: platformType field added to orchestrator.foreground_resume_detected — Slack/Telegram/Discord/GitHub resume events are now distinguishable from web in production logs.
Tests check the right things: The resume test verifies callArgs[3] is the prior working_path and opts.preCreatedRun.id comes from the hydrated run — not just toHaveBeenCalled().
No silent swallows: Every error handler re-throws, logs + re-throws, or explicitly notifies the user.
CLAUDE.md compliance: Type safety, fail-fast, YAGNI, no autonomous lifecycle mutation, logging format, DB error pattern, test isolation — all pass.

Reviewed by Archon prp-review-agents workflow

…ive resume test - Add hydrateResumableRun to executor mock in orchestrator.test.ts to mirror the real module exports and prevent opaque TypeErrors for future test contributors - Add test asserting that a web non-interactive workflow with a resumable run resumes foreground rather than dispatching a fresh background run, pinning the priority order of the if/else if dispatch block

Wirasm · 2026-05-25T10:10:04Z

⚡ Self-Fix Report (Aggressive)

Status: COMPLETE
Pushed: ✅ Changes pushed to archon/task-archon-fix-github-issue-experimental-1779701539931
Commit: c0c9565c
Philosophy: Fix everything unless clearly a new concern

Fixes Applied (2 total)

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	0
🟡 MEDIUM	1
🟢 LOW	1

View all fixes

✅ Web non-interactive + resumable run has no test (packages/core/src/orchestrator/orchestrator-agent.test.ts) — Added test 'web non-interactive workflow with resumable run resumes foreground (not background)' pinning the dispatch priority order: resume check beats the background-dispatch gate. Asserts executeWorkflow is called (not dispatchBackgroundWorkflow) with the prior worktree path when a resumable run exists.
✅ Executor mock missing hydrateResumableRun (packages/core/src/orchestrator/orchestrator.test.ts:166-168) — Added hydrateResumableRun: mock(() => Promise.resolve(null)) to mirror real module exports and prevent opaque TypeError: hydrateResumableRun is not a function for future test contributors.

Tests Added

packages/core/src/orchestrator/orchestrator-agent.test.ts: web non-interactive workflow with resumable run resumes foreground (not background)

Skipped (3)

Severity	Finding	Reason
🟢 LOW	DB resume lookup failure now blocks all platforms	Intentional fail-fast — correct per CLAUDE.md; launching fresh when lookup fails risks duplicate worktrees
🟢 LOW	"starting fresh in the same worktree" message shown on chat platforms	Pre-existing message, out of scope, cosmetic only
🟢 LOW	GitHub platform not explicitly tested	Shares same `else` branch as Telegram/Slack/Discord; indirect coverage sufficient

Suggested Follow-up Issues

(none)

Validation

✅ Type check | ✅ Lint | ✅ Tests (all packages, 0 failures)

Self-fix by Archon · aggressive mode · fixes pushed to archon/task-archon-fix-github-issue-experimental-1779701539931

Wirasm · 2026-05-25T10:34:04Z

Review Summary

Verdict: minor-fixes-needed

This PR fixes a long-standing bug where chat platforms (Slack, Telegram, Discord, GitHub) always started a fresh workflow run instead of resuming a paused one after an approval gate. The implementation is clean, the codebaseId scoping prevents cross-project resume on shared chat IDs, and the new tests cover the key permutations. One test mock needs updating before merge, and one docs line needs a quick update.

Blocking issues

packages/core/src/orchestrator/orchestrator.test.ts:167: mock.module('@archon/workflows/executor') stubs executeWorkflow but is missing hydrateResumableRun, which dispatchOrchestratorWorkflow also imports (orchestrator-agent.ts:34). Any test that accidentally triggers the resume path will throw TypeError: hydrateResumableRun is not a function.
- Fix: Add hydrateResumableRun: mock(() => Promise.resolve(null)) to the mock block.

Suggested fixes

packages/docs-web/src/content/docs/guides/authoring-workflows.md:~531: The "DAG Resume on Failure" section says "Chat (web): Approving or rejecting a paused workflow auto-resumes..." — this excludes Slack, Telegram, Discord, and GitHub, which now also resume correctly after this PR.
- Fix: Update to "Chat platforms (web, Slack, Telegram, Discord, GitHub)" or simply "Chat platforms". Optionally add a note that resume is scoped to the current codebase.

Minor / nice-to-have

packages/core/src/orchestrator/orchestrator-agent.test.ts:1318: A comment (// cwd comes from validateAndResolveIsolation (default '/test/cwd'), not a prior worktree) visually belongs to the wrong test block — the assertions themselves are correct.
packages/core/src/orchestrator/orchestrator-agent.test.ts:1378: Test name says "scopes resume query" but only verifies mock call args, not the null-return behavior for mismatched codebases. Not blocking — behavior is covered elsewhere.
packages/core/src/db/workflows.ts:342–344: Function JSDoc says "the web orchestrator" — should be "the orchestrator (all platforms)".
packages/docs-web/src/content/docs/guides/authoring-workflows.md:~531 (low priority): Consider adding a note that chat-platform resume is scoped to codebaseId for multi-project chat adapter safety.

Compliments

Excellent comments throughout: the block comment explaining why resume detection now runs for ALL platforms (orchestrator-agent.ts:364-369) and the test comment describing the ordering constraint for web non-interactive resume (orchestrator-agent.test.ts) are exactly the kind of non-obvious WHY documentation that prevents future regressions.
The #1741 reference in the test is appropriate — it's a permanent issue number that gives future engineers a trail to follow.
The codebaseId addition is a thoughtful safety measure that prevents cross-project resume on shared Telegram chat IDs without requiring users to change anything.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

Wirasm · 2026-05-26T06:26:42Z

Review Summary

Verdict: ready-to-merge

This PR lifts the findResumableRunByParentConversation resume-detection block out of the web-only guard and adds a codebaseId scope to the DB query, so chat platforms (Slack, Telegram, Discord, GitHub) now resume prior runs correctly — and won't accidentally resume a run from a different project on a shared conversation ID. Code quality is high and no error-handling issues were found.

Blocking issues

None.

Suggested fixes

packages/core/src/db/workflows.ts:341 — findResumableRunByParentConversation gained a codebaseId parameter and an AND codebase_id = $3 SQL clause but lacks a direct unit test. The SQL change is exercised only transitively through mocked orchestrator tests. Add a test in workflows.test.ts validating that: (1) a matching run is found when codebase_id matches, (2) no run is found when codebase_id differs even if workflow+conversation match, (3) null is returned when no run exists. This is an explicit regression guard for the cross-project-resume fix.

Minor / nice-to-have

packages/core/src/orchestrator/orchestrator-agent.test.ts:1321 — The web background-dispatch test doesn't explicitly stub a codebase return, so it passes undefined as codebaseId to the now-3-arg findResumableRunByParentConversation. Works via default mock behavior, but making the null expectation explicit is cleaner.
packages/core/src/orchestrator/orchestrator-agent.ts:363 — The 5-line block comment can be trimmed to a one-liner: "Check for a resumable run on this workflow before dispatching fresh." The platform-rationale detail belongs in the test comment (where it already exists with the Chat (Slack/Telegram) approval & interactive-loop workflows never resume — re-ask the same questions forever #1741 reference).

Compliments

The regression test with the "Regression for Chat (Slack/Telegram) approval & interactive-loop workflows never resume — re-ask the same questions forever #1741" comment documents the bug's consequence (worktree loss, repeated approval prompts) in a way that future maintainers will immediately understand — exactly what comments should do.
The codebase_id scoping rationale is documented in both the workflows.ts docstring and the test description, tying the fix to the invariant it's protecting.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality.

simplify: inline single-use mock vars in orchestrator.test.ts

8916bb2

This was referenced May 25, 2026

Chat (Slack/Telegram) approval & interactive-loop workflows never resume — re-ask the same questions forever #1741

Open

fix(orchestrator): check for resumable workflow run on all platforms (closes #1741) #1749

Closed

Wirasm mentioned this pull request May 25, 2026

fix(web): resume interactive loop workflows on approve #1420

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(orchestrator): resume interactive workflows on chat platforms#1756

fix(orchestrator): resume interactive workflows on chat platforms#1756
Wirasm wants to merge 3 commits into
devfrom
archon/task-archon-fix-github-issue-experimental-1779701539931

Wirasm commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Review skipped

Uh oh!

Wirasm commented May 25, 2026

Uh oh!

Wirasm commented May 25, 2026

Uh oh!

Wirasm commented May 25, 2026

Uh oh!

Wirasm commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wirasm commented May 25, 2026

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Wirasm commented May 25, 2026

Comprehensive PR Review

Summary

🟡 Medium Issues (Needs Decision)

Missing test: web non-interactive + resumable run dispatch priority

🟢 Low Issues

What's Good

Uh oh!

Wirasm commented May 25, 2026

⚡ Self-Fix Report (Aggressive)

Fixes Applied (2 total)

Tests Added

Skipped (3)

Suggested Follow-up Issues

Validation

Uh oh!

Wirasm commented May 25, 2026

Review Summary

Blocking issues

Suggested fixes

Minor / nice-to-have

Compliments

Uh oh!

Wirasm commented May 26, 2026

Review Summary

Blocking issues

Suggested fixes

Minor / nice-to-have

Compliments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 25, 2026 •

edited

Loading