Skip to content

fix(orchestrator): check for resumable workflow run on all platforms (closes #1741)#1749

Closed
kagura-agent wants to merge 3 commits into
coleam00:devfrom
kagura-agent:fix/chat-workflow-resume-1741
Closed

fix(orchestrator): check for resumable workflow run on all platforms (closes #1741)#1749
kagura-agent wants to merge 3 commits into
coleam00:devfrom
kagura-agent:fix/chat-workflow-resume-1741

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

@kagura-agent kagura-agent commented May 23, 2026

Summary

  • Problem: Chat platforms (Slack, Telegram, Discord, GitHub) never resume approval-gate or interactive-loop workflows. Each user answer starts a fresh run at node 0 instead of resuming where the gate paused.
  • Why it matters: Approval gates and interactive loops are unusable on non-web platforms — the workflow re-asks the same questions indefinitely.
  • What changed: Lifted the findResumableRunByParentConversation check out of the platform.getPlatformType() === "web" conditional so it runs for all platforms before dispatch.
  • What did not change (scope boundary): Web-specific dispatch logic (interactive foreground vs background) remains unchanged. The resume/hydrate behavior is identical — it just now triggers for non-web platforms too.

UX Journey

Before

  User (Slack)           Archon                   Workflow Engine
  ────────────           ──────                   ───────────────
  triggers workflow ──▶  dispatches fresh run
                         hits approval gate ─────▶ pauses, asks user
  approves ───────────▶  dispatches NEW fresh run  (ignores paused run)
                         hits approval gate ─────▶ pauses, asks user again
  (infinite loop)

After

  User (Slack)           Archon                   Workflow Engine
  ────────────           ──────                   ───────────────
  triggers workflow ──▶  dispatches fresh run
                         hits approval gate ─────▶ pauses, asks user
  approves ───────────▶  finds resumable run ────▶ hydrates + resumes
                         continues past gate ────▶ completes workflow
  sees result ◀────────  sends completion

Architecture Diagram

Before

dispatchOrchestratorWorkflow()
├─ if (platform === web)
│  ├─ findResumableRun()  ← resume check ONLY here
│  ├─ if resumable → hydrate + execute
│  ├─ elif interactive → foreground execute
│  └─ else → background dispatch
└─ else (chat platforms)
   └─ executeWorkflow()   ← NO resume check, always fresh

After

dispatchOrchestratorWorkflow()
├─ findResumableRun()     ← resume check for ALL platforms
├─ if resumable → hydrate + execute (any platform)
├─ elif (platform === web)
│  ├─ if interactive → foreground execute
│  └─ else → background dispatch
└─ else (chat platforms)
   └─ executeWorkflow()   ← only reached when nothing to resume

Validation

  • bun run test — all tests pass
  • Reviewed findResumableRunByParentConversation query: uses workflow_name + parent_conversation_id (DB fields), both populated identically for web and chat platforms
  • Net diff: 56 insertions, 56 deletions (pure restructuring, no new logic)

Closes #1741

Summary by CodeRabbit

  • New Features
    • Workflow resumption now available across all platforms, letting users continue interrupted workflows from their last completed step.
    • If a previous run is detected, the workflow resumes from the detected working path; if no resumable state is found, a warning is shown and the workflow restarts in the same working path.
    • Web behavior preserved: interactive workflows run in foreground; non-interactive workflows continue as background runs.

Review Change Stack

…loses coleam00#1741)

Chat platforms (Slack, Telegram, Discord, GitHub) never resumed
approval/interactive-loop workflows because findResumableRunByParentConversation
was only called inside the web-specific branch.

Lift the resume check to run before the platform type conditional so all
platforms discover and hydrate a prior paused run. Platform-specific dispatch
(background vs foreground) only triggers when there is nothing to resume.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2a08b95f-8866-4477-9eaa-9136b84caf55

📥 Commits

Reviewing files that changed from the base of the PR and between 59c5b5e and b2a2c50.

📒 Files selected for processing (1)
  • packages/core/src/orchestrator/orchestrator-agent.ts
💤 Files with no reviewable changes (1)
  • packages/core/src/orchestrator/orchestrator-agent.ts

📝 Walkthrough

Walkthrough

The PR moves resumable-run detection and hydration out of the web-only branch into a shared path after isolation resolution. The orchestrator checks for a resumable run by parent conversation; if found it hydrates and resumes on the prior working_path (or warns and runs fresh if hydration yields no prepared state); otherwise it continues with platform-specific dispatch logic.

Changes

Platform-agnostic workflow resumption

Layer / File(s) Summary
Resumable-run lookup and hydration for all platforms
packages/core/src/orchestrator/orchestrator-agent.ts
Resumable-run detection now runs before platform branching and applies to all platforms. When a resumable run with working_path is found it hydrates via hydrateResumableRun and resumes with executeWorkflow using the prepared state; if hydration returns no prepared state it warns and executes a fresh run in the same working_path; if no resumable run exists it falls back to the existing web foreground/background or non-web dispatch logic.

Sequence Diagram

sequenceDiagram
  participant Conversation
  participant OrchestratorAgent
  participant WorkflowDB
  participant Hydrator as hydrateResumableRun
  participant Executor as executeWorkflow
  Conversation->>OrchestratorAgent: dispatchOrchestratorWorkflow(conversation)
  OrchestratorAgent->>WorkflowDB: findResumableRunByParentConversation(workflow.name, conversation.id)
  WorkflowDB-->>OrchestratorAgent: resumableRun (with working_path) / null
  alt resumableRun with working_path
    OrchestratorAgent->>Hydrator: hydrateResumableRun(resumableRun)
    Hydrator-->>OrchestratorAgent: preparedState / null
    alt preparedState
      OrchestratorAgent->>Executor: executeWorkflow(preparedState, working_path, options)
    else no preparedState
      OrchestratorAgent->>Conversation: sendWarning("no completed nodes, running fresh in same working_path")
      OrchestratorAgent->>Executor: executeWorkflow(fresh, working_path, options)
    end
  else no resumableRun
    OrchestratorAgent->>Executor: platform-specific dispatch / executeWorkflow(cwd,...)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #1741: Addresses the same bug where chat platforms never resumed interactive workflows because resume lookup was web-only.
  • #1549: Touches the same resumable-run lookup and executeWorkflow resume path.
  • #1350: Related to executeWorkflow options/refactor that interacts with the new shared resume path.
  • #1131: Also modifies dispatchOrchestratorWorkflow resume detection using parent conversation id.

Possibly related PRs

  • coleam00/Archon#1646: Refactors resume handling to hydrate resumable runs and pass resumed state into executeWorkflow.
  • coleam00/Archon#1530: Preserves completed-node state across resumes used by hydrate/execute resume flows.
  • coleam00/Archon#1329: Also modifies orchestrator resume and foreground dispatch logic.

Poem

🐰 A rabbit hops through workflow lanes,
Where paused states once were lost,
Now chats and web share the same plains—
No more questions asked twice at a cost.
Resume, dear run, and bound across.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: moving resumable-run detection from web-only to all platforms to fix a specific issue #1741.
Description check ✅ Passed The description covers all required template sections: problem, why it matters, what changed, scope boundaries, UX journey (before/after), architecture diagram, and validation evidence including test confirmation.
Linked Issues check ✅ Passed The PR successfully addresses the core objective from #1741: moving findResumableRunByParentConversation out of the web-only conditional so it runs for all platforms, enabling chat platforms to resume paused approval-gate and interactive-loop workflows.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objective in #1741: restructuring dispatchOrchestratorWorkflow to run resumable-run detection across all platforms. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 372-377: Remove the resumable run working path from the resume log
payload to avoid logging PII: in the block using getLog().info where the object
currently includes workflowName: workflow.name, resumableRunId: resumableRun.id,
workingPath: resumableRun.working_path, delete the workingPath/working_path
entry so only workflowName and resumableRunId are logged (leave getLog().info
and the other fields intact).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be0f7c83-2d4a-4203-92ea-11127b668766

📥 Commits

Reviewing files that changed from the base of the PR and between db2c294 and 59c5b5e.

📒 Files selected for processing (1)
  • packages/core/src/orchestrator/orchestrator-agent.ts

Comment thread packages/core/src/orchestrator/orchestrator-agent.ts
@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented May 25, 2026

Review Summary

Verdict: minor-fixes-needed

This PR extends the resumable-run check to all platforms, fixing a silent regression where non-web platforms with approval/loop gates were restarting workflows instead of resuming them. Code logic and error handling look correct. One test is needed before merge.

Blocking issues

None.

Suggested fixes

  • packages/core/src/orchestrator/orchestrator-agent.test.ts: Add a test for non-web platform (e.g., Slack) with a resumable run found. The new path (findResumableRunByParentConversationhydrateResumableRunexecuteWorkflow with resumableRun.working_path) is untested for non-web platforms. Existing tests cover the web path and the non-web + no-resumable-run path. Suggested: a test in the workflow dispatch describe block asserting executeWorkflow is called with working_path from the resumable run (not cwd) and that parentConversationId / preCreatedRun opts are passed.

Minor / nice-to-have

  • packages/core/src/orchestrator/orchestrator-agent.test.ts: An explicit test for non-web + hydration-returning-null (mirroring the existing 'falls through to fresh run when hydration returns null' test but with platform.getPlatformType() === 'slack') would make the fallback intent clearer. Low priority.

Compliments

  • Clean refactor — the logic change is minimal and well-scoped.
  • Error handling is consistent with existing patterns throughout the file.
  • CLAUDE.md compliance is solid across all dimensions checked.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage.

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented May 25, 2026

Closing in favor of #1756.

Thanks @kagura-agent for surfacing this and the first-pass fix — your diagnosis of the if (platform === 'web') gate was exactly right and is what #1756 builds on. The additional change in #1756 is adding a codebase_id scope to findResumableRunByParentConversation so that two projects sharing the same persistent chat conversation ID (Slack thread, Telegram chat_id) don't cross-resume each other's stale runs. Without that scope it's a silent-bug class issue waiting to bite the first user with two projects in one chat thread.

Net of #1756 vs this PR: same primary fix + the codebase-collision bug closed in one shot + test coverage. Credit's yours on the diagnosis.

@Wirasm Wirasm closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chat (Slack/Telegram) approval & interactive-loop workflows never resume — re-ask the same questions forever

2 participants