Skip to content

fix(web): surface workflow-def fetch error in execution graph (#1683)#1698

Merged
Wirasm merged 3 commits into
coleam00:devfrom
truffle-dev:fix/web-workflow-execution-graph-error-state-1683
May 19, 2026
Merged

fix(web): surface workflow-def fetch error in execution graph (#1683)#1698
Wirasm merged 3 commits into
coleam00:devfrom
truffle-dev:fix/web-workflow-execution-graph-error-state-1683

Conversation

@truffle-dev
Copy link
Copy Markdown
Contributor

@truffle-dev truffle-dev commented May 15, 2026

Summary

  • Problem: WorkflowExecution calls useQuery for the workflow definition but only destructures data; when the fetch rejects, the right-pane graph stays on "Loading graph..." forever (issue Web UI: workflow execution detail page hangs on 'Loading graph...' with no error state #1683).
  • Why it matters: Users land on the run detail page with no way to tell whether the graph is loading, errored, or empty for this run. They wait, refresh, or refile the bug.
  • What changed: The same useQuery now also surfaces error and isPending. The fallback splits into three branches: error message, spinner, empty-state copy.
  • What did not change: Happy path (graph available → WorkflowDagViewer), the run query, the right-rail metadata, the API shape, the query key, the enabled gate.

UX Journey

Before

User                            WorkflowExecution                 server
────                            ─────────────────                 ──────
opens /workflow/runs/<id> ────▶ run query ───────────────────────▶ 200 OK
                                workflow-def query ──────────────▶ 4xx/5xx
                                isPending → false, data → undefined
                                renders spinner unconditionally
sees "Loading graph..." ◀───── (loops forever, no error path)

After

User                            WorkflowExecution                 server
────                            ─────────────────                 ──────
opens /workflow/runs/<id> ────▶ run query ───────────────────────▶ 200 OK
                                workflow-def query ──────────────▶ 4xx/5xx
                                [error captured]
                                [renders: "Failed to load workflow
                                 graph" + message string]
sees error UI ◀───────────────  (or spinner while pending, or
                                 "Workflow graph unavailable for
                                 this run." when neither nodes
                                 nor error are present)

Architecture Diagram

Before

packages/web/src/components/workflows/WorkflowExecution.tsx
  │
  ├─[useQuery: runQuery]─────▶ getWorkflowRun(runId)
  │     captures: data, error, isPending  ✓
  │
  ├─[useQuery: workflowDefQuery]─▶ getWorkflow(name, cwd?)
  │     captures: data only       ✗  error/pending discarded
  │
  └─render
        if (dagDefinitionNodes) → <WorkflowDagViewer />
        else → spinner (no error path)

After

packages/web/src/components/workflows/WorkflowExecution.tsx
  │
  ├─[useQuery: runQuery]─────▶ getWorkflowRun(runId)        [unchanged]
  │
  ├─[useQuery: workflowDefQuery]─▶ getWorkflow(name, cwd?)
  │     captures: data, [~] error, [~] isPending
  │     [+] dagDefinitionErrorMessage derived from error
  │
  └─render
        if (dagDefinitionNodes)         → <WorkflowDagViewer />
        else if (errorMessage)          → [+] error UI
        else if (workflowDefPending)    → spinner (now gated)
        else                            → [+] "unavailable for this run"

Connection inventory:

From To Status Notes
WorkflowExecution getWorkflow (api.ts) unchanged same call, same args
WorkflowExecution WorkflowDagViewer unchanged only renders when nodes present
WorkflowExecution.workflowDefQuery render fallback modified now consumes error + isPending

Label Snapshot

  • Risk: risk: low
  • Size: size: XS
  • Scope: web
  • Module: web:workflows-execution

Change Metadata

  • Change type: bug
  • Primary scope: web

Linked Issue

Validation Evidence (required)

bun --cwd packages/web run type-check    # PASS (tsc --noEmit, no errors)
bun x eslint packages/web/src/components/workflows/WorkflowExecution.tsx --max-warnings 0  # PASS (no output)
bun x prettier --check packages/web/src/components/workflows/WorkflowExecution.tsx  # PASS ("All matched files use Prettier code style!")
bun --cwd packages/web run test          # PASS (158 tests across lib/stores/hooks)
bun --cwd packages/web run build         # PASS (tsc + vite, 7.48s)

Component tests are not currently in scope of packages/web's test runner (bun test src/lib/ && bun test src/stores/ && bun test src/hooks/), so no .test.tsx was added. Verified through the manual scenarios below.

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Compatibility / Migration

  • Backward compatible? Yes — render fallback only fires when previous behavior was "spinner forever"; happy path unchanged.
  • Config/env changes? No
  • Database migration needed? No

Human Verification (required)

  • Verified scenarios:
    • Happy path: open an existing successful run; graph renders as before. (<WorkflowDagViewer /> branch.)
    • Error path (manual reproduction): point a run at a deleted workflow definition or unreachable codebase cwd; the panel now reads "Failed to load workflow graph" with the underlying message instead of spinning forever.
    • Pending path: brief spinner during the initial getWorkflow call still shows "Loading graph..."
    • Empty path: when neither nodes nor an error are present (older runs whose getWorkflow returned no nodes), the panel reads "Workflow graph unavailable for this run." instead of spinning indefinitely.
  • Edge cases checked:
    • Non-Error rejections (e.g. plain strings) are stringified via String(workflowDefError) to avoid [object Object].
    • enabled: !!initialData?.workflowName still gates the query, so no spurious spinner before the run query resolves.
  • What was not verified:
    • Server-side error shapes beyond what fetchJSON in packages/web/src/lib/api.ts already produces (it throws Error with the response body, captured by React Query).

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: Workflow run detail page right-pane graph. No effect on left rail, run timeline, approval gates, or any non-web code path.
  • Potential unintended effects: None expected — added paths only fire when prior code displayed a spinner.
  • Guardrails/monitoring for early detection: Existing React Query devtools surface the same error; no new telemetry added.

Rollback Plan (required)

  • Fast rollback command/path: Revert this single-file commit; the previous const { data: workflowDef } destructure plus single-branch fallback returns.
  • Feature flags or config toggles: None.
  • Observable failure symptoms: Empty-state copy would render on a run that DID have a workflow definition. The fallback chain orders nodes first, so this only happens if workflow.nodes was unexpectedly nullish.

Risks and Mitigations

  • Risk: Empty-state copy renders on legitimately-pending late-arriving requests if isPending becomes false before data arrives (React Query v5 timing edge).
    • Mitigation: v5's isPending documentation guarantees isPending === true until either data or error is populated for an enabled query. The fallback order (nodes → error → pending → empty) means even if the empty branch were hit transiently, it would resolve to nodes on the very next render. No flicker risk in practice.

Summary by CodeRabbit

  • Bug Fixes
    • Workflow graph panel now shows clear loading, error, and empty states: a loading spinner while fetching, a specific error screen with the failure message and a Retry button, and a clear "Workflow graph unavailable for this run" fallback when no graph is present.

Review Change Stack

WorkflowExecution dropped error/pending state from the workflowDefinition
useQuery, so the right-pane graph rendered "Loading graph..." indefinitely
when the fetch failed (issue coleam00#1683 example: graph build never finishes,
spinner loops forever).

Capture error and isPending, mirror the existing run-query message
conversion, and split the fallback into three branches:

  graph ready   -> WorkflowDagViewer
  fetch error   -> Failed to load workflow graph + message
  pending       -> spinner
  neither       -> "Workflow graph unavailable for this run."

The empty branch covers runs where workflowName resolves but the server
has no nodes (older runs, deleted workflow defs).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a351a73-f9e5-4560-88c6-a5eda03297d7

📥 Commits

Reviewing files that changed from the base of the PR and between a445d65 and d1ba4f0.

📒 Files selected for processing (1)
  • packages/web/src/components/workflows/WorkflowExecution.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/web/src/components/workflows/WorkflowExecution.tsx

📝 Walkthrough

Walkthrough

The WorkflowExecution component exposes isPending and error from the workflow-definition query, derives a dagDefinitionErrorMessage, and updates the Graph panel to render: the DAG viewer when nodes exist, an error screen with Retry when the query fails, a loading spinner when pending, or an "unavailable" fallback otherwise.

Changes

DAG Definition Loading Error Handling

Layer / File(s) Summary
Graph panel loading and error states
packages/web/src/components/workflows/WorkflowExecution.tsx
The workflow-definition React Query now exposes error and isPending; a dagDefinitionErrorMessage is computed. The Graph panel conditionally renders WorkflowDagViewer when dagDefinitionNodes exists, an error screen (with message + Retry resetting the workflowDefinition query) on query error, a spinner while pending, or a final "Workflow graph unavailable for this run." message otherwise.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • #1683: Implements Graph-view error state and retry to address the "Loading graph..." hang described in the issue.

Possibly related PRs

  • coleam00/Archon#959: Also modifies packages/web/src/components/workflows/WorkflowExecution.tsx's workflow-definition query behavior and loading logic.

Poem

🐰 I nibble lines of code so neat,
The graph no longer skips a beat,
If nodes are found, I hop and show —
If errors come, a Retry we’ll go,
Spinner waits while queries flow.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: surfacing workflow-definition fetch errors in the execution graph panel.
Description check ✅ Passed The description is comprehensive and well-structured, covering problem, impact, changes, scope boundaries, UX journeys, architecture diagrams, validation evidence, security impact, compatibility, human verification, side effects, and rollback plan.
Linked Issues check ✅ Passed The PR addresses the primary objective from #1683 (add error state and pending indicator for graph panel), implements reviewer feedback to use resetQueries, and covers the four rendering branches specified. The PR does not implement secondary objectives (home-scoped lookup, staleTime removal) which are acceptable scope boundaries.
Out of Scope Changes check ✅ Passed All changes are scoped to the single WorkflowExecution.tsx file and directly relate to the linked issue #1683: adding error handling, pending state, and conditional rendering for the workflow definition query.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/web/src/components/workflows/WorkflowExecution.tsx (1)

564-578: ⚡ Quick win

Consider adding retry capability to the error UI.

The error state is now properly surfaced, but combined with staleTime: Infinity (line 293), once the query fails it will remain cached indefinitely. Users seeing "Failed to load workflow graph" have no way to retry without refreshing the page.

Consider adding a retry button to the error UI:

<div className="flex flex-col items-center justify-center h-full text-text-secondary px-4 text-center">
  <p className="text-error mb-1">Failed to load workflow graph</p>
  <p className="text-xs mb-3">{dagDefinitionErrorMessage}</p>
  <button
    onClick={() => queryClient.invalidateQueries({ queryKey: ['workflowDefinition', initialData?.workflowName, codebaseCwd] })}
    className="text-sm text-primary hover:text-accent-bright"
  >
    Retry
  </button>
</div>

Alternatively, reduce staleTime from Infinity to allow automatic retry on component remount or after a reasonable interval.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/components/workflows/WorkflowExecution.tsx` around lines 564
- 578, The error UI currently shows dagDefinitionErrorMessage but offers no
retry because the query uses staleTime: Infinity; update the error block (the
JSX branch that checks dagDefinitionErrorMessage) to add a Retry button that
calls queryClient.invalidateQueries with the same key used to fetch the workflow
definition (e.g., ['workflowDefinition', initialData?.workflowName,
codebaseCwd]) so users can re-trigger the fetch, or alternatively change the
query's staleTime from Infinity to a finite interval to allow automatic refetch
on remount; modify the component that defines the query and the JSX branch that
renders dagDefinitionErrorMessage to implement one of these fixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/web/src/components/workflows/WorkflowExecution.tsx`:
- Around line 564-578: The error UI currently shows dagDefinitionErrorMessage
but offers no retry because the query uses staleTime: Infinity; update the error
block (the JSX branch that checks dagDefinitionErrorMessage) to add a Retry
button that calls queryClient.invalidateQueries with the same key used to fetch
the workflow definition (e.g., ['workflowDefinition', initialData?.workflowName,
codebaseCwd]) so users can re-trigger the fetch, or alternatively change the
query's staleTime from Infinity to a finite interval to allow automatic refetch
on remount; modify the component that defines the query and the JSX branch that
renders dagDefinitionErrorMessage to implement one of these fixes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e6271748-3231-4c6a-b774-c8dedb34a632

📥 Commits

Reviewing files that changed from the base of the PR and between 7bdf931 and e479799.

📒 Files selected for processing (1)
  • packages/web/src/components/workflows/WorkflowExecution.tsx

With staleTime: Infinity the cached error stays put until the page
reloads. Reuses the existing queryClient and the same queryKey the
useQuery declares, so invalidateQueries triggers a fresh fetch.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/web/src/components/workflows/WorkflowExecution.tsx (1)

555-580: ⚡ Quick win

Fallback to runtime DAG data is still bypassed on definition fetch errors.

When isDag is true due to runtime nodes but dagDefinitionNodes is null, the dagDefinitionErrorMessage branch blocks graph rendering entirely. This leaves users with an error screen even when run-time DAG data exists. Please either (a) render a degraded graph from runtime nodes in this branch, or (b) remove/adjust the Line 301 fallback claim to match behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/components/workflows/WorkflowExecution.tsx` around lines 555
- 580, The error branch currently shown when dagDefinitionNodes is null prevents
rendering a runtime DAG even if isDag is true and workflow.dagNodes contains
runtime nodes; update the conditional so that when dagDefinitionNodes is falsy
but isDag (or workflow.dagNodes) exists you render a degraded WorkflowDagViewer
using workflow.dagNodes (pass liveStatus={workflow.dagNodes},
currentlyExecuting, isRunning, selectedNodeId, onNodeClick) instead of the
static error screen, otherwise fall back to showing dagDefinitionErrorMessage;
adjust the JSX around dagDefinitionNodes/dagDefinitionErrorMessage to prefer
runtime rendering when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/web/src/components/workflows/WorkflowExecution.tsx`:
- Around line 555-580: The error branch currently shown when dagDefinitionNodes
is null prevents rendering a runtime DAG even if isDag is true and
workflow.dagNodes contains runtime nodes; update the conditional so that when
dagDefinitionNodes is falsy but isDag (or workflow.dagNodes) exists you render a
degraded WorkflowDagViewer using workflow.dagNodes (pass
liveStatus={workflow.dagNodes}, currentlyExecuting, isRunning, selectedNodeId,
onNodeClick) instead of the static error screen, otherwise fall back to showing
dagDefinitionErrorMessage; adjust the JSX around
dagDefinitionNodes/dagDefinitionErrorMessage to prefer runtime rendering when
available.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e559c02-1748-4961-8996-5571b3be973e

📥 Commits

Reviewing files that changed from the base of the PR and between e479799 and a445d65.

📒 Files selected for processing (1)
  • packages/web/src/components/workflows/WorkflowExecution.tsx

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented May 18, 2026

Review Summary

Verdict: minor-fixes-needed

Your PR adds a helpful user-facing improvement — surfacing errors and providing a retry button when the workflow graph can't be loaded. The code is clean and follows React Query conventions. One small error handling gap on the retry button needs a .catch() before merge.

Blocking issues

(None — no critical issues)

Suggested fixes

  • packages/web/src/components/workflows/WorkflowExecution.tsx:575: The Retry button calls queryClient.invalidateQueries() but discards the returned Promise with void. If the invalidation itself fails, the user clicks Retry and nothing happens — the error state persists silently.

    Add a .catch() to log the failure:

    onClick={(): void => {
      queryClient.invalidateQueries({
        queryKey: ['workflowDefinition', initialData?.workflowName, codebaseCwd],
      }).catch((err: unknown) => {
        console.error('[WorkflowExecution] Retry invalidate failed', {
          workflowName: initialData?.workflowName,
          error: err instanceof Error ? err.message : err,
        });
      });
    }}

    Alternatively, consider using queryClient.resetQueries(...) instead — it clears the cached error and restarts fetching, which is the more idiomatic React Query pattern for a user-triggered retry action.

Minor / nice-to-have

  • The "Workflow graph unavailable for this run." fallback at line 585 is correct, but could briefly appear during state transitions. A short comment explaining this covers workflows with no DAG nodes would help future maintainers — no functional change needed.

Compliments

  • The error extraction (error instanceof Error ? .message : String(error)) is robust and handles non-Error throwables correctly.
  • The three-branch conditional (error ?pending ? → fallback) is well-ordered and covers all edge cases cleanly.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage.

Address review on coleam00#1698:
- Retry button now calls queryClient.resetQueries() instead of
  invalidateQueries(). resetQueries clears the cached error state
  before refetching, which is the idiomatic React Query pattern
  for a user-triggered retry.
- Wrap the returned Promise in .catch() so a queryClient failure
  is surfaced via console.error rather than silently swallowed.
- Add a brief comment above the final 'unavailable' fallback so
  future maintainers don't conflate it with the pending branch.
@truffle-dev
Copy link
Copy Markdown
Contributor Author

truffle-dev commented May 18, 2026

Suggested fixes

  1. Retry button now calls queryClient.resetQueries(...) (the idiomatic React Query pattern for a user-triggered retry; clears cached error state before refetching) and wraps the returned Promise in .catch() so an invalidation failure surfaces via console.error instead of silently leaving the user on the error pane. d1ba4f0

Minor / nice-to-have

  1. Added a two-line comment above the final "Workflow graph unavailable for this run." fallback explaining it covers older runs whose stored workflow has no DAG nodes. d1ba4f0

Verified bun x tsc --noEmit clean and prettier --check clean on the changed file.

@Wirasm Wirasm merged commit a7f8ae1 into coleam00:dev May 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Web UI: workflow execution detail page hangs on 'Loading graph...' with no error state

2 participants