jharan1
diff --git a/‎packages/gen-ai/.claude/skills/flake-check/README.md‎
Lines changed: 37 additions & 50 deletions b/‎packages/gen-ai/.claude/skills/flake-check/README.md‎
Lines changed: 37 additions & 50 deletions
@@ -1,6 +1,6 @@
 # flake-check — Skill README
 
-The `/flake-check` skill investigates CI failures on pull requests and classifies them as **confirmed flaky**, **suspected flaky**, or **likely real** using a combination of live CI data, symptom pattern matching, and the registry in `flaky-tests.yaml`.
+The `/flake-check` skill investigates CI failures tied to github PRs and highlights suspected flaky tests; using live CI data, symptom pattern matching, and cross-PR overlap analysis. A jira may be logged for a suspected/confirmed flaky test to track resolving it.
 
 ## Commands
 
@@ -9,22 +9,32 @@ The `/flake-check` skill investigates CI failures on pull requests and classifie
 | `/flake-check <PR>` | Investigate a single PR — fetches failing checks, reads logs, classifies each failure |
 | `/flake-check <PR> --deep` | Same as above, plus detects checks that previously failed then passed on the same commit SHA |
 | `/flake-check scan` | Lightweight survey of the last 20 PRs for recurring failure patterns (no log fetching) |
+| `/flake-check scan --deep` | Same as above, plus rerun detection and cross-PR code overlap analysis for each pattern |
 | `/flake-check scan <N>d` | Survey PRs from the last N days (e.g. `scan 7d`) |
 | `/flake-check scan --file <filename>` | Find every PR in the window where a specific test file appeared in failures (fetches logs — slower) |
-| `/flake-check mark-flaky` | Register a confirmed flaky test in `flaky-tests.yaml` |
-| `/flake-check stats` | Show registry trends — most impactful tests, area breakdown, recently active entries |
+
+**Examples — copy/paste and substitute your own values:**
+
+```
+/flake-check 7301
+/flake-check 7301 --deep
+/flake-check scan
+/flake-check scan --deep
+/flake-check scan 7d
+/flake-check scan 14d --deep
+/flake-check scan --file pipelineCreateRuns.cy.ts
+/flake-check scan 30d --file pipelineCreateRuns.cy.ts
+/flake-check scan --since 14d --until 7d
+/flake-check scan --since 2026-04-01 --until 2026-04-15
+```
 
 ---
 
 ## How We Identify Flaky Tests
 
 Flaky classification is never based on a single signal. The skill accumulates evidence across multiple dimensions and applies them conservatively — a real regression can produce the same symptoms as a flaky test.
 
-### Signal 1 — Registry match (strongest)
-
-The test name or file is already in `flaky-tests.yaml` with prior PR occurrences. This is the only signal that produces a **Confirmed Flaky** verdict without further investigation. Cite the entry's `resolution` field and act on it.
-
-### Signal 2 — Cross-PR recurrence (strong)
+### Signal 1 — Cross-PR recurrence (strong)
 
 The same check or test file fails on multiple PRs whose code changes are in **unrelated areas**. For example, `pipelineCreateRuns.cy.ts` failing on a PR that only touched `api-keys/maas` code, and again on a PR that only touched `model-serving/` code — neither of which touches pipelines. When a test fails repeatedly across PRs with no common code thread, the failure is almost certainly independent of the code changes.
 
@@ -33,31 +43,21 @@ How to find this:
 - `/flake-check scan --file <filename>` surfaces file-level recurrence with log-level detail
 - `/flake-check <PR> --deep` + then checking other recent PRs manually
 
-### Signal 3 — No code overlap on a single PR (moderate)
+### Signal 2 — No code overlap on a single PR (moderate)
 
 A test fails on a PR whose changes are entirely in a different feature area than the test exercises. For example, a pipelines test failing on a PR that only modifies authentication code. This is a moderate signal on its own — it means the failure is *likely* unrelated to the PR, but it could still be a pre-existing regression on `main`.
 
 The skill performs this analysis automatically during PR investigation: it fetches the PR's changed files and compares the domain against the failing test's directory.
 
-### Signal 4 — Rerun detection (moderate)
+### Signal 3 — Rerun detection (moderate)
 
-A check **failed then passed on the same commit SHA** without any new code being pushed. This means a developer triggered a re-run and it passed — a strong behavioural indicator that the failure was transient. These hidden failures don't appear in GitHub's final check status, so they're easy to miss.
+A check **failed then passed on the same commit SHA** without any new code being pushed. This means a developer triggered a re-run and it passed — a strong behavioural indicator that the failure was transient.
 
 How to find this:
 - `/flake-check <PR> --deep` — reports `rerun_detected` entries for the specific PR
-- `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs, identifying checks that developers routinely re-run to get past
-
-**Related: `/retest` PR comments (low signal, manual only)**
+- `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs
 
-When a developer posts a `/retest` comment on a PR, they are manually triggering a CI rerun — the human-visible equivalent of the automated signal above. A PR with one or more `/retest` comments *may* indicate a flaky test, but this signal is weak on its own because:
-
-- `/retest` often follows a real fix (e.g. after pushing a correction) — not every rerun is flakiness
-- A PR with multiple `/retest` comments on a check that keeps failing is *more* suggestive of an intermittent issue
-- This signal is only visible when manually reading PR comments; the skill does not scan for it automatically
-
-Use it as a prompt for investigation, not as a classification. If you notice `/retest` comments while reviewing a PR and the check eventually passed, treat that as supporting evidence alongside Signal 1–3 above.
-
-### Signal 5 — Symptom pattern match (weak, starting point only)
+### Signal 4 — Symptom pattern match (weak, starting point only)
 
 The error message matches a known timing or infrastructure error pattern:
 
@@ -71,51 +71,38 @@ The error message matches a known timing or infrastructure error pattern:
 | `net::ERR_CONNECTION_REFUSED` | CI service failed to start or crashed |
 | `Cannot read properties of null` | Race condition — component unmounted or not yet mounted |
 
-**A symptom match is a starting signal, not a verdict.** Always cross-reference with code overlap and cross-PR recurrence before classifying. A broken selector, a missing mock, or a genuine product bug can produce identical error messages.
+**A symptom match is a starting signal, not a verdict.** Always cross-reference with code overlap and cross-PR recurrence before classifying.
 
 ---
 
 ## Confidence Model
 
 | Tier | Verdict | Criteria |
 |---|---|---|
-| 1 | **Confirmed Flaky** | Registry match in `flaky-tests.yaml` |
-| 2 | **Suspected Flaky** | Symptom pattern match, or cross-PR recurrence with no code overlap, or rerun detection — but not in the registry |
-| 3 | **Likely Real** | No registry match, no symptom pattern, and/or code overlap detected |
+| 1 | **Suspected Flaky** | Symptom pattern match, or cross-PR recurrence with no code overlap, or rerun detection |
+| 2 | **Likely Real** | No symptom pattern and/or code overlap detected |
 
 For suspected flaky, the skill further annotates based on code overlap:
-- **No overlap** — failure is likely unrelated to this PR; safe to rerun, but register if it passes
+- **No overlap** — failure is likely unrelated to this PR; safe to rerun, but log a Jira if it passes
 - **Overlap detected** — a real regression is plausible; investigate before dismissing
 - **Unclear** — PR spans many areas or the test area is ambiguous; treat with caution
 
 ---
 
 ## What to Do When You Suspect a Flaky Test
 
-1. **Check if it's already registered** — `/flake-check stats` or look at `flaky-tests.yaml`
-2. **Rerun the failing check** — if it passes, that's strong evidence of flakiness
-3. **Check for cross-PR recurrence** — `/flake-check scan --file <filename>` to see if it's happened before
-4. **Look at the test code** — is there a missing `cy.wait('@alias')`, a missing `.should('be.visible')` guard, or an obvious race condition?
-5. **If confirmed flaky** — run `/flake-check mark-flaky` to register it, then raise a Jira to fix the root cause
+1. **Rerun the failing check** — if it passes, that's strong evidence of flakiness
+2. **Check for cross-PR recurrence** — `/flake-check scan --file <filename>` to see if it's happened before
+3. **Look at the test code** — is there a missing `cy.wait('@alias')`, a missing `.should('be.visible')` guard, or an obvious race condition?
+4. **If confirmed flaky** — say "create a Jira" and I'll search for an existing ticket and file a Task with label `flaky-test` if none exists, then fix the root cause
 
 ---
 
-## The Registry (`flaky-tests.yaml`)
+## Tracking Flaky Tests
 
-Machine-readable source of truth for known flaky tests. Each entry has:
+Suspected/Confirmed flaky tests can be tracked as **Jira Tasks** with the `flaky-test` label in RHOAIENG. When an investigation identifies a suspected flaky test with strong evidence, the report includes pre-filled Jira fields — say "create the Jira" to file it.
 
-| Field | Description |
-|---|---|
-| `id` | Unique ID in `<area>-<NNN>` format (e.g. `pipelines-001`) |
-| `test` | Exact test name from the `it()` block |
-| `file` | Path relative to repo root |
-| `area` | Short slug (e.g. `model-catalog`, `pipelines`, `workbenches`) |
-| `symptoms` | List of error strings or patterns observed |
-| `first_seen` / `last_seen` | ISO dates |
-| `pr_occurrences` | PR numbers where this was observed |
-| `status` | `suspected` / `confirmed` / `resolved` |
-| `resolution` | What to do when this failure appears |
-| `jira` | Tracking ticket (optional) |
-| `notes` | Additional context (optional) |
-
-Entries are created and updated via `/flake-check mark-flaky` — do not edit by hand unless correcting a mistake.
+To find all open flaky test tickets:
+```
+project = RHOAIENG AND labels = "flaky-test" AND status != Done
+```