jharan1
diff --git a/‎packages/gen-ai/.claude/skills/flake-check/README.md‎
Lines changed: 0 additions & 122 deletions b/‎packages/gen-ai/.claude/skills/flake-check/README.md‎
Lines changed: 0 additions & 122 deletions
diff --git a/‎packages/gen-ai/.claude/skills/flake-check/SKILL.md‎
Lines changed: 68 additions & 3 deletions b/‎packages/gen-ai/.claude/skills/flake-check/SKILL.md‎
Lines changed: 68 additions & 3 deletions
@@ -1,20 +1,81 @@
 ---
 name: flake-check
-description: Investigates what's blocking a PR by running scripts to deterministically collect CI data, then classifying failures as suspected flaky or likely real. Pass a PR number to investigate, or "scan [N]" to survey recent PRs for patterns.
+description: Investigates CI failures tied to github PRs and highlights suspected flaky tests. Uses live CI data, symptom pattern matching, and cross-PR scanning.
 ---
 
 # Flake check — Investigates CI failures on a github odh-dashboard PR and classifies them
 
 Investigate failing CI checks on a PR and classify each failure using live CI data, symptom pattern matching, and cross-PR overlap analysis.
 
+## Quick Reference
+
+| Invocation | What it does |
+|---|---|
+| `/flake-check <PR>` | Investigate a single PR — fetches failing checks, reads logs, classifies each failure |
+| `/flake-check <PR> --deep` | Same as above, plus detects checks that previously failed then passed on the same commit SHA |
+| `/flake-check scan` | Lightweight survey of the last 20 PRs for recurring failure patterns (no log fetching) |
+| `/flake-check scan --deep` | Same as above, plus rerun detection and cross-PR code overlap analysis for each pattern |
+| `/flake-check scan <N>d` | Survey PRs from the last N days (e.g. `scan 7d`) |
+| `/flake-check scan --file <filename>` | Find every PR in the window where a specific test file appeared in failures (fetches logs — slower) |
+
+```
+/flake-check 7301
+/flake-check 7301 --deep
+/flake-check scan
+/flake-check scan --deep
+/flake-check scan 7d
+/flake-check scan 14d --deep
+/flake-check scan --file pipelineCreateRuns.cy.ts
+/flake-check scan 30d --file pipelineCreateRuns.cy.ts
+/flake-check scan --since 14d --until 7d
+/flake-check scan --since 2026-04-01 --until 2026-04-15
+```
+
+---
+
+## How We Identify Flaky Tests
+
+Flakiness is a pattern, not a single event. The more signals you see, the more confident you can be. In rough order of strength:
+
+**Signal 1 — Cross-PR recurrence (strong)**
+The same test or check fails on multiple unrelated PRs that touch different parts of the codebase. This is the strongest signal because it rules out a real regression: if the test fails whether or not you've changed the relevant code, the test itself is the problem.
+
+*Example:* `pipelineCreateRuns.cy.ts` fails on a PR that only changes `model-serving/` — completely unrelated areas.
+
+*How to find it:*
+- `/flake-check scan` — lightweight survey of the last 20 PRs
+- `/flake-check scan 30d --file pipelineCreateRuns.cy.ts` — targeted search for a specific file
+
+**Signal 2 — Rerun detection (strong)**
+The check failed on an earlier run, then passed on a later run of the same commit SHA without any code change in between. Same code, different result — the test is non-deterministic by definition.
+
+**Signal 3 — No code overlap on a single PR (moderate)**
+The failing test exercises feature area X, but the PR only touches feature area Y. Not conclusive on its own (unrelated changes can still expose race conditions in a shared subsystem), but it lowers suspicion that the PR caused the regression.
+
+*How to find it:* `/flake-check <PR> --deep` — adds a "Previously Failed — Rerun Detected" section to the report.
+
+**Signal 4 — Symptom pattern match (weak, starting point only)**
+
+| Pattern | What it usually indicates |
+|---------|--------------------------|
+| `CypressError: Timed out retrying after` | DOM timing race |
+| `cy.click() / cy.type() failed — requires a DOM element` | Element disappeared mid-test |
+| `AssertionError: Timed out retrying` | Timing race — but check the error detail; a missing named element may be a real defect |
+| `socket hang up` / `ECONNRESET` / `ERR_CONNECTION_REFUSED` | Infrastructure hiccup |
+| `Cannot read properties of null` | Race condition in test setup |
+
+A symptom match is a starting signal, not a verdict. Real failures produce identical patterns — always check the error detail and look for corroborating signals before concluding flakiness.
+
+---
+
 ## Architecture
 
 This skill separates **data collection** (deterministic Python scripts) from **analysis** (Claude reasoning):
 
 | Phase | Who | What |
 |---|---|---|
 | Data collection | Scripts | Fetch PR state, CI logs — output clean JSON |
-| Classification | Claude | Apply confidence model using symptom patterns and code overlap |
+| Classification | Claude | Apply confidence model using symptom patterns, checking code overlap with failed tests, etc. |
 | Report | Claude | Generate structured, actionable output |
 
 Scripts live in `<base_path>/scripts/`. `<base_path>` is the absolute path to the directory containing this SKILL.md file — resolve it at skill load time from the skill's own path (e.g. `/Users/myuser/code/odh-dashboard/packages/gen-ai/.claude/skills/flake-check`). Run scripts from the repo root (the user's working directory) using their absolute path. All scripts output JSON to stdout; errors go to stderr.
@@ -29,7 +90,11 @@ Scripts have no external dependencies. They use only the Python standard library
 - A PR number (e.g. `4821`) → run the **Main Investigation** workflow
 - A PR number with `--deep` (e.g. `4821 --deep`) → Main Investigation with rerun detection
 - `scan` or `scan <N>` → run the **scan** workflow (N PRs, default 20)
+- `scan <N>d` (e.g. `scan 7d`) → scan PRs from the last N days
+- `scan --since <date/period> --until <date/period>` → scan a specific time window (e.g. `--since 14d --until 7d` or `--since 2026-04-01 --until 2026-04-15`)
+- `scan --deep` → scan with rerun detection and cross-PR code overlap analysis
 - `scan --file <filename>` (e.g. `scan --file pipelineCreateRuns.cy.ts`) → run the **file-scoped scan** workflow — fetches CI logs to find every PR where that specific test file failed
+- Any `scan` variant may combine modifiers (e.g. `scan 30d --deep --file pipelineCreateRuns.cy.ts`)
 - Empty → ask the user for a PR number, then run **Main Investigation**
 
 ---
@@ -44,7 +109,7 @@ Apply this conservatively — a real regression can produce the same symptoms as
 - If your analysis concludes the failure is likely deterministic (wrong selector, missing element, new test with a bug), classify as Likely real instead and explain why
 - Do NOT label it flaky — surface it as a possibility
 - Prompt the dev: is this related to their changes? Has this test passed on `main` recently?
-- Recommended action: gather multiple signals to confirm — see "If You Confirm It's Flaky" in any investigation report; then log a RHOAIENG Task with label `flaky-test`
+- Recommended action: gather multiple signals to confirm — see "If You Confirm It's Flaky" in any investigation report; then log a RHOAIENG Task with label `flaky-test` if not already tracked
 
 **Likely real**
 - No symptom pattern match and no cross-PR recurrence signal