fix(cloud-frontend): track async job ids for suspend/snapshot/restart toasts by standujar · Pull Request #7813 · elizaOS/eliza

standujar · 2026-05-19T22:04:20Z

Why

Backend lifecycle ops moved to the job queue in #7810: suspend, restart, snapshot, logs all return 202 + jobId instead of completing inline.

The frontend was still firing the success toast immediately on 2xx and reloading the page, which lied about the operation being done. "Snapshot saved" appeared in < 1s while the daemon had just started a 30s job.

What changed

`packages/cloud-frontend/src/dashboard/containers/_components/agent-actions.tsx`:

Detection of 202 + jobId is now generic (any action), not hardcoded to provision/resume.
New "queued" toast variants for snapshot / suspend / shutdown.
For queued ops, the existing `useJobPoller` already handles `onComplete` / `onFailed` toasts + auto-refresh — no `window.location.reload()` needed on the queued path.

`packages/cloud-frontend/src/dashboard/containers/_components/eliza-agents-table.tsx`:

`handleSuspend` now reads 202 + jobId and `poller.track()`s it (same hook the provision flow uses).

Test plan

Click Suspend in the agents table → toast says "Suspend queued" → after ~5s of polling, table updates to stopped + toast "Agent provisioning completed" (provision-completed name is misleading but the hook fires it; copy can be tweaked later).
Click Save Snapshot in agent detail → "Snapshot queued" toast → poller resolves with onComplete callback.
Click Restart (if exposed via UI) → "Restart queued" → resolves.
Provision/Resume still work as before (no regression).
409 path still tracked via existing fallback message.

Depends on

#7810 (backend routes returning 202 + jobId). Mergeable in any order — without #7810 the frontend changes are dead code paths (no route returns 202 + jobId for those actions yet), but they don't break anything.

Follow-up

Generic poller toast copy currently says "Provisioning completed" / "Provisioning failed" regardless of which op finished. Untangle by passing the action name through `useJobPoller` and templating the toast.

Greptile Summary

This PR fixes premature "success" toasts for async lifecycle operations (suspend, snapshot, restart) by detecting the 202+jobId response pattern — previously used only for provision/resume — and routing those actions through useJobPoller instead of resolving immediately.

agent-actions.tsx: 202+jobId detection is generalised to any action with per-action queued toast messages; the premature window.location.reload() on the 202 path is removed (reload still happens after the job completes via autoRefresh). The 409+jobId path is also generalised.
eliza-agents-table.tsx: handleSuspend now reads the response body for 202+jobId and calls poller.track(), matching the pattern handleProvision already used. However, unlike agent-actions.tsx, the 409+jobId case is not handled — a concurrent suspend attempt throws a false error instead of attaching to the existing job.

Confidence Score: 3/5

Safe for most flows, but the table's suspend handler has an incomplete edge-case that produces a visible false error toast for concurrent suspend attempts.

The core fix (routing 202+jobId through the job poller) is sound and the happy path works correctly. The table's handleSuspend does not handle the 409+jobId case that agent-actions.tsx now covers — if a suspend is already running and the user clicks suspend again from the table, they see "Suspend failed" and the optimistic status reverts, even though the daemon is healthy. The completion toasts ("Agent provisioning completed") are also incorrect for suspend and snapshot operations, which is a user-visible lie each time those jobs finish.

eliza-agents-table.tsx needs the 409+jobId guard added to handleSuspend to match the pattern in agent-actions.tsx.

Important Files Changed

Filename	Overview
packages/cloud-frontend/src/dashboard/containers/_components/agent-actions.tsx	Generalises 202+jobId tracking to all actions (not just provision/resume), adds per-action queued toast messages, and removes premature window.location.reload() on the async path. Minor issues: shutdown reuses the suspendQueued i18n key, and onComplete/onFailed messages still say "provisioning" for non-provisioning operations.
packages/cloud-frontend/src/dashboard/containers/_components/eliza-agents-table.tsx	handleSuspend updated to read 202+jobId and call poller.track(), but misses the 409+jobId case that agent-actions.tsx now handles — a concurrent suspend attempt shows a false "Suspend failed" error instead of attaching to the in-flight job.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant FE as Frontend (agent-actions / table)
    participant API as Backend API
    participant P as useJobPoller
    participant J as /api/v1/jobs/:id

    U->>FE: Click Suspend / Snapshot / Restart
    FE->>API: PATCH/POST action
    alt 409 + jobId (already in flight)
        API-->>FE: "409 { data: { jobId } }"
        FE->>P: poller.track(agentId, jobId)
        FE-->>U: "toast.info("{action} already in progress")"
    else 202 + jobId (newly queued)
        API-->>FE: "202 { data: { jobId } }"
        FE->>P: poller.track(agentId, jobId)
        FE-->>U: "toast.success("{action} queued")"
    else 2xx no jobId (legacy inline)
        API-->>FE: "200 {}"
        FE-->>U: "toast.success("{action} done")"
        FE->>FE: window.location.reload()
    else error
        API-->>FE: 4xx/5xx
        FE-->>U: toast.error("Action failed: ...")
    end

    loop Every 5s while job active
        P->>J: GET /api/v1/jobs/:jobId
        J-->>P: "{ status, error }"
        alt completed
            P->>FE: onComplete() → toast "Agent provisioning completed"
            P->>FE: window.location.reload()
        else failed
            P->>FE: onFailed() → toast.error(job.error)
            P->>FE: window.location.reload()
        else timed out
            P->>FE: onFailed("Timed out waiting...")
        end
    end

Comments Outside Diff (1)

packages/cloud-frontend/src/dashboard/containers/_components/agent-actions.tsx, line 34-48 (link)

Generic onComplete/onFailed messages are wrong for all newly-tracked actions

The PR extends job tracking to snapshot, suspend, and shutdown, but the useJobPoller callbacks remain hardcoded to "Agent provisioning completed" / "Provisioning failed". A user who clicks "Save Snapshot" sees "Snapshot queued", waits ~30 s, and then receives "Agent provisioning completed" — which directly contradicts the earlier toast. The same problem applies to suspend and shutdown in both this file and eliza-agents-table.tsx (line 278–286). The follow-up mentioned in the PR description would need to pass an action-to-message map into useJobPoller or use per-action pollers to fix this.

_{Reviews (1): Last reviewed commit: "fix(cloud-frontend): track async job ids..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

… toasts Backend lifecycle ops moved to the job queue in #7810: suspend, restart, snapshot, logs all return 202 + jobId instead of completing inline. The frontend was still firing the success toast immediately on 2xx and reloading the page, which lied about the operation being done when the daemon had only just started it. Generalize the existing provision/resume 202+jobId path so any action that returns a jobId attaches to the existing useJobPoller. The toast now says "Suspend queued" / "Snapshot queued" up front and resolves through onComplete/onFailed once the daemon actually finishes. The window.location.reload() is gone for queued ops — the poller's onComplete callback handles the refresh. Also covers the standalone handleSuspend on the agents table.

coderabbitai · 2026-05-19T22:04:28Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 014b093b-1fa1-4d9d-8053-5ac299f70340

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/cloud-frontend-job-poller-async-routes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-05-19T22:05:09Z

Claude encountered an error after 0s —— View job

I'll analyze this and get back to you.

greptile-apps · 2026-05-19T22:08:54Z

+      if (!res.ok && res.status !== 202) {
        // Revert optimistic update
        void refreshData();
        throw new Error("Suspend failed");
      }
+
+      // 202 + jobId: the daemon executes the suspend asynchronously.
+      // Track the job so the table reflects the real completion (and
+      // the success toast doesn't lie before the container actually
+      // stops).
+      const data = await res.json().catch(() => ({}));
+      const jobId = (data as { data?: { jobId?: string } }).data?.jobId;
+      if (res.status === 202 && jobId) {
+        poller.track(id, jobId);
+        toast.success("Suspend queued");
+        return;


409+jobId case silently errors in the table's suspend handler

agent-actions.tsx (the detail page) now correctly extracts a jobId from a 409 response and attaches to the in-flight job. handleSuspend here does not — a 409 hits the !res.ok && res.status !== 202 guard (409 satisfies both conditions), throws "Suspend failed", and reverts the optimistic update. A user who clicks Suspend while a suspend is already running will see an error toast and a spurious "stopped"→"running" flicker even though the daemon is working correctly. The table should mirror the agent-actions pattern: read the body first, check for 409+jobId, and call poller.track() before falling through to the error throw.

greptile-apps · 2026-05-19T22:08:55Z

+          suspend: t("cloud.containers.agentActions.suspendQueued", {
+            defaultValue: "Suspend queued",
+          }),
+          shutdown: t("cloud.containers.agentActions.suspendQueued", {
+            defaultValue: "Suspend queued",
+          }),


The shutdown action uses the same i18n key as suspend (suspendQueued). If a "shutdownQueued" translation is ever added to the i18n catalogue, the shutdown action will still resolve to the suspend copy.

Suggested change

suspend: t("cloud.containers.agentActions.suspendQueued", {

defaultValue: "Suspend queued",

}),

shutdown: t("cloud.containers.agentActions.suspendQueued", {

defaultValue: "Suspend queued",

}),

suspend: t("cloud.containers.agentActions.suspendQueued", {

defaultValue: "Suspend queued",

}),

shutdown: t("cloud.containers.agentActions.shutdownQueued", {

defaultValue: "Shutdown queued",

}),

Five small wins surfaced by a second /clean pass after the lifecycle queue stack merged (elizaOS#7810/elizaOS#7813/elizaOS#7815/elizaOS#7816): - provisioning-jobs.ts: drop 4 redundant type casts. `status: "error"`, `"deletion_failed"`, and the `webhook_status` updates are all literals matching the inferred parameter type — the `as Parameters<...>[1]` / `as Partial<Job>` casts added nothing. - provisioning-jobs.ts: executeAgentProvision failure path now uses agentProvisionJobResultToRecord({...}) like every other executor, preserving the typed-serialization-boundary pattern. - v1/eliza/agents/[id]/route.ts (DELETE): drop the redundant `instanceof Error && error.message === "Agent not found"` branch. failureResponse() already maps any "not found" error to 404. - v1/agents/[id]/logs/route.ts: add `success: false` to the 404 body to match the response shape every sibling route uses. - v1/eliza/agents/[id]/resume/route.ts: trim a forward-looking comment about a "future docker start fast path" — belongs in a ticket, not the route. Kept the audit-log rationale. - __tests__/provisioning-job-types.test.ts: lock the registry size with `expect(Object.keys(JOB_TYPES)).toHaveLength(7)`. A new entry without a matching wire-value assertion now fails CI instead of being silently under-covered. Net diff: -5 LOC, 5 files. No behavior change.

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

standujar mentioned this pull request May 19, 2026

refactor(cloud-shared): managed services enqueue jobs instead of inline shutdown() #7815

Merged

4 tasks

lalalune merged commit 511d9f2 into develop May 20, 2026
33 of 37 checks passed

lalalune deleted the fix/cloud-frontend-job-poller-async-routes branch May 20, 2026 02:09

standujar mentioned this pull request May 20, 2026

chore(cloud): post-merge /clean follow-up on lifecycle job code #7845

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud-frontend): track async job ids for suspend/snapshot/restart toasts#7813

fix(cloud-frontend): track async job ids for suspend/snapshot/restart toasts#7813
lalalune merged 1 commit into
developfrom
fix/cloud-frontend-job-poller-async-routes

standujar commented May 19, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026

Review skipped

Uh oh!

claude Bot commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

standujar commented May 19, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Test plan

Depends on

Follow-up

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

coderabbitai Bot commented May 19, 2026

Review skipped

Uh oh!

claude Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

standujar commented May 19, 2026 •

edited by greptile-apps Bot

Loading

claude Bot commented May 19, 2026 •

edited

Loading