ci: triage_ci_failure pulls failure context from public GitHub API (#14210)

AvivYossef-starkware · claude · web-flow · commit 4cd57f76ea58 · 2026-06-03T11:45:22.000Z
The sequencer repo is public, so most of the GitHub Actions REST API is
reachable unauthenticated. Rework the triage_ci_failure skill to fetch
failure context directly (run/job/check-run metadata, annotations, re-run
history) instead of telling the user it can't see CI and asking them to
paste logs.

Key behaviors, validated live against real failures and via simulated
claude.ai-web (no-auth) runs:
- Check a pasted run isn't stale (Graphite force-pushes) before triaging it.
- Detect "already green on re-run" and report flaky-but-not-blocking.
- Use filter=all so a failing earlier attempt isn't hidden by the default.
- Treat generic "exit code 1" annotations as no-signal; scope a hypothesis
  from the PR diff rather than going silent.
- Only ask the user when the cause is genuinely behind auth-gated logs.

Splits the endpoint catalog / pagination / rate-limit / tool-tier notes into
references/github_api.md (progressive disclosure); SKILL.md is now the
79-line decision flow.

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.claude/skills/triage_ci_failure/SKILL.md b/.claude/skills/triage_ci_failure/SKILL.md
@@ -1,89 +1,79 @@
 ---
 name: triage_ci_failure
-description: Triage CI failures, flaky tests, and broken builds in the sequencer mono-repo. Auto-invoke when a user mentions a failing CI job, flaky test, red check, or pastes a GitHub Actions URL — context (PR link, CI job link, base branch) must be gathered BEFORE any code investigation begins.
+description: Triage CI failures, flaky tests, and broken builds in the sequencer mono-repo. Use when a user mentions a failing CI job, flaky test, red check, or shares a GitHub Actions / PR URL — the skill pulls failure context directly from the public GitHub REST API so you can usually diagnose and report a verdict without asking the user any follow-up questions.
 ---
 
 # Triage CI Failure
 
-When invoked (typically because someone tagged Claude in the mono-repo Slack channel about a CI failure or flaky test), follow this workflow to gather context before investigating.
+Usually invoked when someone tags Claude in the mono-repo Slack channel about a CI failure. The repo (`starkware-libs/sequencer`) is **public**, so most of the GitHub REST API is reachable without auth. Your goal: diagnose and report a verdict **without asking follow-up questions**. Pull what failed, on which step, with which annotation, on which attempts — then report. Only fall back to "please paste the logs" when the public API genuinely can't get you there; never ask to confirm context you can already fetch.
 
-## Step 1: Gather Required Context
+**Endpoint catalog, pagination, rate limits, and which tools to use in each environment live in `references/github_api.md`.** Read it when you need an exact URL; this file is the decision flow. Substitute `O=starkware-libs`, `R=sequencer` throughout.
 
-Before starting any investigation, you MUST have the following information. Check if any of these are missing from the message or thread:
+## Step 1: Resolve the input, then check it isn't stale
 
-### Required Information
+From the message, extract a PR URL, run URL (`/actions/runs/{run_id}`), job URL (`.../job/{job_id}`), check-run id, commit SHA, or branch name. Key fact: **`job.id == check_run.id`** — one numeric id bridges "I have a job link" and "I want its annotations." With only a branch name, list recent failed runs on it before asking the user anything.
 
-| Item | Why Needed | Example |
-|------|------------|---------|
-| **PR link** or **branch name** | To understand what code is being tested | `https://github.com/starkware-libs/sequencer/pull/12345` or `feature/my-branch` |
-| **Failed CI job link** | To get a `details_url` you can open and ask the user to paste relevant log lines from | `https://github.com/starkware-libs/sequencer/actions/runs/123456/job/789` |
-| **Base branch** | The branch this PR targets — check `scripts/parent_branch.txt` for the default, don't assume `main` | `main`, `release/v1.2`, `feature/epic-branch` |
-| **Is this a new failure or flaky?** | Determines investigation approach | "Started failing today" vs "Fails ~10% of runs" |
+**Triage the pasted link — it's usually accurate for the failure they want explained.** Diagnose that run/job even if the user re-ran it afterward (a common flow: paste a link, then re-run assuming it's flaky). As a *complementary* check, note whether the run is stale: this repo uses Graphite stacks, so a run's `head_sha` can lag the PR's current `head.sha` (`GET /pulls/{pr}`). If they differ, also report the current head's status — so a "merge-gatekeeper noise, harmless" verdict on an old SHA doesn't mask a genuinely red live head. Add that as context; don't discard the pasted run in favor of the current head.
 
-### Nice to Have
+## Step 2: Did it already go green on a re-run?
 
-- Error message snippet (the available GitHub MCP tools only expose check-run metadata, not raw Actions log output, so a pasted snippet often unblocks the fastest investigation)
-- Whether this was working before a recent rebase
-- Related PRs or recent merges that might have caused regression
+`GET /actions/runs/{run_id}` → check `run_attempt`, `previous_attempt_url`, `conclusion`. If it was re-run (`run_attempt > 1` or `previous_attempt_url` set) **and** the latest `conclusion` is `success`, the workflow already passed on retry — **lead with "the PR isn't blocked."**
 
----
-
-## Step 2: If Missing Information, Ask First
+But don't wave it away: a fail-then-pass with no code change is a **flaky test**, a real signal worth understanding. So still:
+1. Find the flaky job/step — `GET /actions/runs/{run_id}/jobs?filter=all` (the `filter=all` matters; the default hides the failed earlier attempt). Pull that job's annotations.
+2. Judge whether it's a known flake (see Step 4's flakiness note).
+3. Report a recommendation, e.g. *"Passed on re-run (attempt 2), PR not blocked. Attempt-1 failure was `run-integration-tests` — flaky; worth a tracking issue rather than per-PR re-runs."*
 
-If ANY required information is missing, reply in the thread (Slack or PR comment, wherever you were invoked) asking for it. Do NOT start investigating with incomplete context.
+Corollary trap: a pasted *job* link can point at a failed earlier attempt while the run is now green. Reconcile the job's `run_attempt` against the run's current one (via `filter=all`) before calling anything broken.
 
-**Template response:**
+**Diagnose the sporadic failure either way.** A green-on-latest run doesn't end the triage — if a failure happened (even once, even already re-run away), root-cause it via Steps 3–4. Continue below regardless; the only thing the green status changes is the "is the PR blocked?" answer.
 
-> To investigate this properly, I need a bit more context:
->
-> - [ ] **PR/Branch**: Which PR or branch is failing? (link preferred)
-> - [ ] **CI Job**: Link to the failed job and, if convenient, paste the relevant error lines
-> - [ ] **Base branch**: What branch is this targeting? (don't assume main)
-> - [ ] **Failure pattern**: Is this a new failure or has it been flaky?
->
-> Once I have these, I'll dig in!
+## Step 3: Fast path — PR to root cause in a few calls
 
-Adapt this based on what's already provided — only ask for what's missing.
+1. `GET /pulls/{pr}` → `head.sha`, `base.ref`
+2. `GET /commits/{head.sha}/check-runs?filter=all&per_page=100` → every check-run at that SHA
+3. Keep `conclusion in ('failure','timed_out')`; **skip `cancelled`/`skipped`/`neutral`** — a `cancelled` job usually means a sibling failed first, so the cause is elsewhere
+4. For each failing check, `GET /check-runs/{check_id}/annotations` → the inline error (file + line + text) is usually all you need
 
----
+**merge-gatekeeper / merge-gatekeeper-new** failing alone is a downstream alarm — something else failed first. Look at sibling check-runs at the same SHA or the previous attempt. Second mode: gatekeeper also fails by **timing out** waiting on a required check that never reached `success` (e.g. a `cancelled` sibling) — then there's *no* failed sibling at this SHA; the real red is usually on a newer SHA, i.e. the pasted run is stale (Step 1).
 
-## Step 3: Verify the Context
+## Step 4: When annotations aren't enough
 
-Once you have the required information:
+Annotations are the primary signal, but for test/build jobs they're often just `"Process completed with exit code 1"`. **Treat a generic exit-code annotation (or an empty one, or null `output.*`) as no signal** — the real assertion/panic is only in the raw step log, which needs auth (`/logs` is 403 unauthenticated; reach it via `gh run view --job {id} --log-failed` when authed — see `references/github_api.md`).
 
-1. **Open the PR** — use `mcp__github__pull_request_read` with `method=get` to confirm the base branch, changed files, and any existing review comments
-2. **Inspect the failed check** — use `method=get_check_runs` for status/conclusion and the `details_url`; for raw Actions logs you'll need the user to paste them (no MCP tool returns them directly)
-3. **Check if known flaky** — search CLAUDE.md "Common Gotchas" and recent Slack history for known flaky tests
-4. **Determine scope** — is this related to the PR's changes, or a pre-existing/infrastructure issue?
+When logs are unreachable, **don't go silent — narrow it from the diff.** Pull `pulls/{pr}/files`; if the failing job is `run-tests` and the PR edits `crates/foo/.../bar_test.rs` or a fixture, report a *scoped hypothesis* ("likely a `foo` test or stale fixture from this rename") plus the one confirming command. That beats both a bare "can't see logs" and a fabricated test name.
 
----
+**Flakiness check:** to tell flaky from newly-broken, see whether the same job fails in unrelated runs. Note many jobs here (`run-integration-tests`, `run-tests`) run only on `pull_request`, never `push` to `main` — so `branch=main&status=failure` won't show them and you'd wrongly conclude "not a known flake." For those, judge by (a) whether this run went green on re-run (Step 2, strongest signal) and (b) scanning recent failed runs of the same workflow across other PRs. Say which signal you used.
 
-## Step 4: Investigate and Report
+## Step 5: Report — ask only when genuinely blocked
 
-Only after completing steps 1-3, begin your investigation:
+You usually have enough to classify the failure yourself. Report directly; don't tack on reflexive questions — every needless "is this flaky for you?" trains the user to expect noise. Answer these yourself rather than asking:
+- **New or flaky?** → flakiness check above.
+- **Caused by this PR?** → diff `pulls/{pr}/files` against the failing crate/test path.
+- **Known pattern?** → see Common patterns below.
 
-1. **If it's a code issue in the PR**: identify the root cause, propose a fix
-2. **If it's a known flaky test**: link to prior discussions, explain the flakiness pattern
-3. **If it's infrastructure/transient**: suggest a re-run and explain why
-4. **If unclear**: share what you found and what you'd need to dig deeper
+Ask the user *only* when a tool genuinely can't close the gap:
+- annotation empty/generic AND `output.text` empty AND no `gh`/MCP raw-log access → ask for a paste;
+- the cause hinges on something only they know (e.g. "did your last rebase pick up commit X?") → ask that.
 
-Always report back in the thread with:
-- What you found
-- Whether action is needed
-- Proposed next steps (if any)
+Otherwise don't ask — report and move on.
 
----
+## Step 6: Classify and report
 
-## Step 5: Commit and Push
+1. **Code issue in the PR** — name the file/line, propose a fix
+2. **Known flaky test** — link prior discussion, suggest re-run
+3. **Infrastructure / transient** (network, action-download, GCloud) — suggest re-run, explain why
+4. **Pre-existing on the base branch** — call it out; the PR didn't cause it
 
-When fixing the issue, create one commit per PR.
+State what you found, whether action is needed, and the next step.
 
----
+## Step 7: Fix only if asked
 
-## Common Patterns in This Repo
+Apply a fix and commit **only** if the user explicitly asks. A triage request isn't an implicit "go patch it." Commit convention: `scope: subject` (no `feat:`/`fix:` prefix), one commit per PR.
 
-From CLAUDE.md — these failures are often NOT code bugs:
+## Common patterns in this repo
 
 - `blockifier_reexecution` — transient GCloud network issues; suggest re-run
-- `merge-gatekeeper` / `merge-gatekeeper-new` — downstream failures (other checks failed first)
-- Formatting failures — run `scripts/rust_fmt.sh` (uses pinned nightly toolchain), NOT `cargo fmt` directly
+- `merge-gatekeeper` / `merge-gatekeeper-new` — downstream/timeout failure; find the upstream cause (Step 3)
+- Formatting failures — run `scripts/rust_fmt.sh` (pinned nightly), NOT `cargo fmt` directly
+- Action-download failures from `codeload.github.com` (404/503) — GitHub-side flake; re-run
diff --git a/.claude/skills/triage_ci_failure/references/github_api.md b/.claude/skills/triage_ci_failure/references/github_api.md
@@ -0,0 +1,90 @@
+# GitHub REST API reference for CI triage
+
+All endpoints below are reachable **without authentication** on `starkware-libs/sequencer` (public repo) unless noted. Substitute `O=starkware-libs`, `R=sequencer`. Use `WebFetch` (claude.ai web), `curl`, or `mcp__github__*` tools.
+
+## Contents
+- [Around the PR](#around-the-pr)
+- [Around the commit](#around-the-commit)
+- [Around the workflow run](#around-the-workflow-run)
+- [Around the check-run](#around-the-check-run)
+- [Pagination](#pagination)
+- [filter=latest vs filter=all](#filterlatest-vs-filterall)
+- [Rate limit](#rate-limit)
+- [Where raw logs live](#where-raw-logs-live)
+- [Tool-tier cheat sheet](#tool-tier-cheat-sheet)
+- [Patterns to recognize](#patterns-to-recognize)
+
+## Around the PR
+
+| Endpoint | What you learn |
+|---|---|
+| `GET /repos/{O}/{R}/pulls/{pr}` | `head.sha`, `head.ref`, `base.ref`, mergeable/draft state |
+| `GET /repos/{O}/{R}/pulls/{pr}/files?per_page=100` | All changed files — judge whether the failing test/crate is plausibly affected by this PR |
+| `GET /repos/{O}/{R}/issues/{pr}/comments?per_page=100` | Earlier triage discussion, prior re-run requests |
+| `GET /repos/{O}/{R}/issues/{pr}/events?per_page=100` | Re-runs, force-pushes, base-branch changes |
+
+Also check `scripts/parent_branch.txt` in the checked-out repo: the base branch isn't always `main` (stacked PRs target feature branches), and assuming `main` misleads your "is this on the base branch too?" check.
+
+## Around the commit
+
+| Endpoint | What you learn |
+|---|---|
+| `GET /repos/{O}/{R}/commits/{sha}` | Author, message, files touched |
+| `GET /repos/{O}/{R}/commits/{sha}/check-runs?filter=all&per_page=100` | Every check-run at this SHA, including re-runs |
+| `GET /repos/{O}/{R}/commits/{sha}/status` | Legacy combined-status checks (some third-party integrations live here, not under `/check-runs`) |
+| `GET /repos/{O}/{R}/actions/runs?head_sha={sha}&per_page=100` | All workflow runs triggered by this commit |
+
+## Around the workflow run
+
+| Endpoint | What you learn |
+|---|---|
+| `GET /repos/{O}/{R}/actions/runs/{run_id}` | `name`, `conclusion`, `head_sha`, `head_branch`, `run_attempt`, `previous_attempt_url` |
+| `GET /repos/{O}/{R}/actions/runs/{run_id}/jobs?filter=all&per_page=100` | Every job + each step's name and conclusion → which *step* failed without log access. **Always pass `filter=all`**: the default returns only the latest attempt, so on a re-run the original failing job is hidden and you'll see all-green. |
+| `GET /repos/{O}/{R}/actions/runs/{run_id}/attempts/{n}/jobs?per_page=100` | Jobs from a specific prior attempt — compare across re-runs |
+| `GET /repos/{O}/{R}/actions/runs/{run_id}/artifacts?per_page=100` | Artifact names + URLs (artifact *download* needs auth) |
+| `GET /repos/{O}/{R}/actions/runs/{run_id}/timing` | Billable time per job — for "CI got slower" triage |
+| `GET /repos/{O}/{R}/actions/jobs/{job_id}` | Single job's status + `check_run_url` (when you have a job id but not a check id) |
+
+## Around the check-run
+
+`job.id == check_run.id` in GitHub Actions — the same numeric id works in both the Actions and Checks APIs.
+
+| Endpoint | Notes |
+|---|---|
+| `GET /repos/{O}/{R}/check-runs/{check_id}/annotations?per_page=100` | **Primary failure signal.** Returns a bare JSON array (no `total_count`); paginate via `?page=N` + the `Link` header. |
+| `GET /repos/{O}/{R}/check-runs/{check_id}` | `output.title`/`summary`/`text` — useful when set, but `null` on most Actions check-runs. Treat as fallback. |
+
+## Pagination
+
+Default page size is 30; pass `per_page=100`. Most list endpoints return `{"total_count": N, "<items>": [...]}` — paginate until you've covered `total_count`. The annotations endpoint is the exception: it returns a bare array, so use `?page=N` + the `Link` header.
+
+## filter=latest vs filter=all
+
+On `/commits/{sha}/check-runs` and `/actions/runs/{id}/jobs`, `filter=latest` (the default) collapses re-runs to the latest attempt. Use `filter=all` to see re-run history and to catch a failing earlier attempt that the default hides — essential for flakiness diagnosis and for job links that point at an old attempt.
+
+## Rate limit
+
+Unauthenticated requests share a **60-per-hour-per-IP** quota. A thorough triage (PR + files + comments + check-runs + jobs + annotations + history) can run into it. If authed (`gh` locally or GitHub MCP token), you get 5000/hr — prefer that. If unauth, prioritize the few high-signal calls (re-run check + fast path) and only pull wider context when needed.
+
+## Where raw logs live
+
+`GET /actions/jobs/{job_id}/logs` and `GET /actions/runs/{run_id}/logs` return **403 without auth**. Raw log text is only reachable via:
+
+1. **`gh` CLI, authed** — `gh run view --repo {O}/{R} --job {job_id} --log-failed` (add `--attempt {N}` for a specific attempt). Works on this public repo when `gh auth status` is logged in.
+2. **`mcp__github__*`** — exposes check-run metadata, not raw Actions logs (verify in-session; the surface evolves).
+3. **Ask the user to paste** — last resort.
+
+## Tool-tier cheat sheet
+
+| Environment | Primary fetch path | Raw logs? |
+|---|---|---|
+| Claude.ai web (no `gh`, no GitHub MCP) | `WebFetch` against the unauth endpoints above | No — ask user to paste if annotations didn't cover it |
+| Claude Code locally with `gh` authed | `gh` CLI for run/job/logs; `WebFetch`/`mcp__github__*` for the rest | Yes — `gh run view --log-failed` |
+| Claude Code, GitHub MCP only (no `gh`) | `mcp__github__*` for PR/check metadata; `WebFetch` for annotations + run/job lists | Verify if your MCP exposes Actions logs; if not, ask |
+
+The metadata endpoints work in all three — they're the portable baseline.
+
+## Patterns to recognize
+
+- `head_branch` like `gh-readonly-queue/main/pr-NNNN-...` → a merge-queue run, not a regular PR run. The PR number is in the branch name.
+- `run_attempt > 1` or a non-null `previous_attempt_url` → someone re-ran it; comparing attempts is a quick flakiness check.