feat(webui): failed sectors dashboard with pipeline stage and failure details by Reiers · Pull Request #995 · filecoin-project/curio

Reiers · 2026-02-15T20:41:44Z

Motivation

When sectors fail in the sealing pipeline, operators currently have to SSH into the database and manually query sectors_sdr_pipeline to understand what went wrong, at what stage, and how long sectors have been stuck. This is one of the most common day-2 operational pain points — especially for operators running hundreds of sectors where failures are routine.

This was identified during a codebase audit as a high-impact operator experience improvement.

Changes

1. Backend: New RPC method `PipelineFailedSectors`

New file web/api/webrpc/pipeline_failed.go adds a WebRPC method that queries the 100 most recent failed sectors from sectors_sdr_pipeline, returning:

Sector identity (SP ID, sector number)
Failure details (reason code, full error message, failure timestamp)
Pipeline stage progression (all after_* boolean flags showing exactly how far the sector got)

2. Frontend: New "Failed Sectors" page

New page at /pages/pipeline_failed/ with a Lit web component that:

Polls PipelineFailedSectors every 10 seconds
Shows a table with columns: Miner, Sector #, Failed At, Stage, Reason, Details, Age
Stage is computed from the after_* flags to show the last completed pipeline stage (SDR → TreeD → TreeC → TreeR → PrecommitMsg → PoRep → Finalize → MoveStorage → CommitMsg)
Color-coded rows: red tint for failures <1h old, orange for <24h, default for older
Truncated error messages with full text on hover (tooltip)
Shows "No failed sectors 🎉" when the pipeline is clean

3. Nav menu entry

Added "Failed Sectors" with a warning triangle icon to the sidebar, positioned after "PoRep" for natural pipeline flow.

No new dependencies

Uses existing WebRPC infrastructure (auto-registered via go-jsonrpc reflection), Lit from CDN (same as all other pages), and Bootstrap 5 dark theme.

LexLuthr

Please attach screenshots

Reiers · 2026-02-16T14:15:05Z

Please attach screenshots

Don't have any failed sectors right now, so its working... maybe

How it looks on the meny side bar:

LexLuthr · 2026-02-16T14:55:23Z

Can you please fail some sectors and get me some screenshots from that page. You can use devnet for testing the messing around. I use the same for easier UI testing and screenshots.

Add a new 'Failed Sectors' page to the Curio web UI that shows sectors which have failed during the sealing pipeline, with retry details. Backend (web/api/webrpc/pipeline_failed.go): - New FailedSectorDetail struct with pipeline stage booleans and failure info - New PipelineFailedSectors RPC method querying sectors_sdr_pipeline WHERE failed = true, ordered by failed_at DESC, limited to 100 Frontend (web/static/pages/pipeline_failed/): - index.html: page shell using curio-ux wrapper - pipeline-failed.mjs: Lit component with 10s auto-refresh showing: - Miner ID, sector number, failure timestamp, last completed stage - Failure reason and details (truncated with tooltip) - Sector age since creation - Color-coded rows: red for <1h, orange for <24h failures - Green success message when no sectors have failed Navigation (web/static/ux/curio-ux.mjs): - Added 'Failed Sectors' nav item with warning triangle icon after PoRep

The Failed Sectors tab was only showing sectors with failed=true, which is set in very few cases (precommit-check, past-start-epoch, alloc-check). Most real failures (CommitMsg failure, PoRep crash, etc.) cause the poller to reset pipeline flags for retry, but the harmony_task gets cleaned up — leaving a dangling task_id reference. These sectors are stuck in a retry loop and never progress. Now detects both: - Terminal failures (failed=true) — shown as FAILED (red) - Stuck sectors with missing tasks — shown as STUCK (amber) The query checks for task_id references that point to tasks no longer in harmony_task, matching the same logic the pipeline view uses for its FAILED badges.

Replace flat 100-row list with server-side grouped summary: - PipelineFailedSectors returns groups by (status, stage, reason) with counts - PipelineFailedSectorDetails returns paginated sectors per group - No hard cap on total sectors — summary query is O(groups) not O(sectors) - Click to expand group, lazy-loads first page, 'load more' for pagination - Configurable page size (100/250/500) - Scales to 10k+ sectors without payload bloat

Reiers · 2026-02-18T12:51:18Z

Closing this — after testing, the existing Alert Manager combined with the pipeline dashboard already surfaces stuck/failed sectors well enough. Adding a separate tab creates redundancy without enough operational value to justify the extra surface area. May revisit if the need comes back.

Reiers requested a review from a team as a code owner February 15, 2026 20:41

Reiers force-pushed the feat/pipeline-retry-dashboard branch from e6f3f30 to fb54ac3 Compare February 15, 2026 20:46

LexLuthr approved these changes Feb 16, 2026

View reviewed changes

Reiers force-pushed the feat/pipeline-retry-dashboard branch from fb54ac3 to 5672e8d Compare February 18, 2026 10:47

Reiers added 2 commits February 18, 2026 12:33

Reiers closed this Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(webui): failed sectors dashboard with pipeline stage and failure details#995

feat(webui): failed sectors dashboard with pipeline stage and failure details#995
Reiers wants to merge 3 commits intomainfrom
feat/pipeline-retry-dashboard

Reiers commented Feb 15, 2026

Uh oh!

LexLuthr left a comment

Uh oh!

Reiers commented Feb 16, 2026

Uh oh!

LexLuthr commented Feb 16, 2026

Uh oh!

Reiers commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Reiers commented Feb 15, 2026

Motivation

Changes

1. Backend: New RPC method PipelineFailedSectors

2. Frontend: New "Failed Sectors" page

3. Nav menu entry

No new dependencies

Uh oh!

LexLuthr left a comment

Choose a reason for hiding this comment

Uh oh!

Reiers commented Feb 16, 2026

Uh oh!

LexLuthr commented Feb 16, 2026

Uh oh!

Reiers commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Backend: New RPC method `PipelineFailedSectors`