jharan1
diff --git a/‎packages/gen-ai/.claude/skills/flake-check/README.md‎
Lines changed: 111 additions & 0 deletions b/‎packages/gen-ai/.claude/skills/flake-check/README.md‎
Lines changed: 111 additions & 0 deletions
@@ -0,0 +1,111 @@
+# flake-check — Skill README
+
+The `/flake-check` skill investigates CI failures on pull requests and classifies them as **confirmed flaky**, **suspected flaky**, or **likely real** using a combination of live CI data, symptom pattern matching, and the registry in `flaky-tests.yaml`.
+
+## Commands
+
+| Invocation | What it does |
+|---|---|
+| `/flake-check <PR>` | Investigate a single PR — fetches failing checks, reads logs, classifies each failure |
+| `/flake-check <PR> --deep` | Same as above, plus detects checks that previously failed then passed on the same commit SHA |
+| `/flake-check scan` | Lightweight survey of the last 20 PRs for recurring failure patterns (no log fetching) |
+| `/flake-check scan <N>d` | Survey PRs from the last N days (e.g. `scan 7d`) |
+| `/flake-check scan --file <filename>` | Find every PR in the window where a specific test file appeared in failures (fetches logs — slower) |
+| `/flake-check mark-flaky` | Register a confirmed flaky test in `flaky-tests.yaml` |
+| `/flake-check stats` | Show registry trends — most impactful tests, area breakdown, recently active entries |
+
+---
+
+## How We Identify Flaky Tests
+
+Flaky classification is never based on a single signal. The skill accumulates evidence across multiple dimensions and applies them conservatively — a real regression can produce the same symptoms as a flaky test.
+
+### Signal 1 — Registry match (strongest)
+
+The test name or file is already in `flaky-tests.yaml` with prior PR occurrences. This is the only signal that produces a **Confirmed Flaky** verdict without further investigation. Cite the entry's `resolution` field and act on it.
+
+### Signal 2 — Cross-PR recurrence (strong)
+
+The same check or test file fails on multiple PRs whose code changes are in **unrelated areas**. For example, `pipelineCreateRuns.cy.ts` failing on a PR that only touched `api-keys/maas` code, and again on a PR that only touched `model-serving/` code — neither of which touches pipelines. When a test fails repeatedly across PRs with no common code thread, the failure is almost certainly independent of the code changes.
+
+How to find this:
+- `/flake-check scan` surfaces check-level recurrence across the last N PRs
+- `/flake-check scan --file <filename>` surfaces file-level recurrence with log-level detail
+- `/flake-check <PR> --deep` + then checking other recent PRs manually
+
+### Signal 3 — No code overlap on a single PR (moderate)
+
+A test fails on a PR whose changes are entirely in a different feature area than the test exercises. For example, a pipelines test failing on a PR that only modifies authentication code. This is a moderate signal on its own — it means the failure is *likely* unrelated to the PR, but it could still be a pre-existing regression on `main`.
+
+The skill performs this analysis automatically during PR investigation: it fetches the PR's changed files and compares the domain against the failing test's directory.
+
+### Signal 4 — Rerun detection (moderate)
+
+A check **failed then passed on the same commit SHA** without any new code being pushed. This means a developer triggered a re-run and it passed — a strong behavioural indicator that the failure was transient. These hidden failures don't appear in GitHub's final check status, so they're easy to miss.
+
+How to find this:
+- `/flake-check <PR> --deep` — reports `rerun_detected` entries for the specific PR
+- `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs, identifying checks that developers routinely re-run to get past
+
+### Signal 5 — Symptom pattern match (weak, starting point only)
+
+The error message matches a known timing or infrastructure error pattern:
+
+| Pattern | What it usually indicates |
+|---|---|
+| `CypressError: Timed out retrying after` | Race condition — element didn't become interactive in time |
+| `cy.click() failed because it requires a DOM element` | Element disappeared or never mounted |
+| `cy.type() failed because it requires a DOM element` | Same as above for input fields |
+| `AssertionError: Timed out retrying` | Assertion never became true — **distinguish**: if it names a specific element that should always exist, this may be a real defect |
+| `socket hang up` / `ECONNRESET` | Network instability in CI |
+| `net::ERR_CONNECTION_REFUSED` | CI service failed to start or crashed |
+| `Cannot read properties of null` | Race condition — component unmounted or not yet mounted |
+
+**A symptom match is a starting signal, not a verdict.** Always cross-reference with code overlap and cross-PR recurrence before classifying. A broken selector, a missing mock, or a genuine product bug can produce identical error messages.
+
+---
+
+## Confidence Model
+
+| Tier | Verdict | Criteria |
+|---|---|---|
+| 1 | **Confirmed Flaky** | Registry match in `flaky-tests.yaml` |
+| 2 | **Suspected Flaky** | Symptom pattern match, or cross-PR recurrence with no code overlap, or rerun detection — but not in the registry |
+| 3 | **Likely Real** | No registry match, no symptom pattern, and/or code overlap detected |
+
+For suspected flaky, the skill further annotates based on code overlap:
+- **No overlap** — failure is likely unrelated to this PR; safe to rerun, but register if it passes
+- **Overlap detected** — a real regression is plausible; investigate before dismissing
+- **Unclear** — PR spans many areas or the test area is ambiguous; treat with caution
+
+---
+
+## What to Do When You Suspect a Flaky Test
+
+1. **Check if it's already registered** — `/flake-check stats` or look at `flaky-tests.yaml`
+2. **Rerun the failing check** — if it passes, that's strong evidence of flakiness
+3. **Check for cross-PR recurrence** — `/flake-check scan --file <filename>` to see if it's happened before
+4. **Look at the test code** — is there a missing `cy.wait('@alias')`, a missing `.should('be.visible')` guard, or an obvious race condition?
+5. **If confirmed flaky** — run `/flake-check mark-flaky` to register it, then raise a Jira to fix the root cause
+
+---
+
+## The Registry (`flaky-tests.yaml`)
+
+Machine-readable source of truth for known flaky tests. Each entry has:
+
+| Field | Description |
+|---|---|
+| `id` | Unique ID in `<area>-<NNN>` format (e.g. `pipelines-001`) |
+| `test` | Exact test name from the `it()` block |
+| `file` | Path relative to repo root |
+| `area` | Short slug (e.g. `model-catalog`, `pipelines`, `workbenches`) |
+| `symptoms` | List of error strings or patterns observed |
+| `first_seen` / `last_seen` | ISO dates |
+| `pr_occurrences` | PR numbers where this was observed |
+| `status` | `active` / `intermittent` / `resolved` |
+| `resolution` | What to do when this failure appears |
+| `jira` | Tracking ticket (optional) |
+| `notes` | Additional context (optional) |
+
+Entries are created and updated via `/flake-check mark-flaky` — do not edit by hand unless correcting a mistake.