update statuses used in flaky-tests.yaml (and related scripts)

jharan1 · jharan1 · commit fcf58fbeb0d3 · 2026-04-20T17:46:14.000+01:00
diff --git a/packages/gen-ai/.claude/skills/flake-check/README.md b/packages/gen-ai/.claude/skills/flake-check/README.md
@@ -47,6 +47,16 @@ How to find this:
 - `/flake-check <PR> --deep` — reports `rerun_detected` entries for the specific PR
 - `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs, identifying checks that developers routinely re-run to get past
 
+**Related: `/retest` PR comments (low signal, manual only)**
+
+When a developer posts a `/retest` comment on a PR, they are manually triggering a CI rerun — the human-visible equivalent of the automated signal above. A PR with one or more `/retest` comments *may* indicate a flaky test, but this signal is weak on its own because:
+
+- `/retest` often follows a real fix (e.g. after pushing a correction) — not every rerun is flakiness
+- A PR with multiple `/retest` comments on a check that keeps failing is *more* suggestive of an intermittent issue
+- This signal is only visible when manually reading PR comments; the skill does not scan for it automatically
+
+Use it as a prompt for investigation, not as a classification. If you notice `/retest` comments while reviewing a PR and the check eventually passed, treat that as supporting evidence alongside Signal 1–3 above.
+
 ### Signal 5 — Symptom pattern match (weak, starting point only)
 
 The error message matches a known timing or infrastructure error pattern:
@@ -103,7 +113,7 @@ Machine-readable source of truth for known flaky tests. Each entry has:
 | `symptoms` | List of error strings or patterns observed |
 | `first_seen` / `last_seen` | ISO dates |
 | `pr_occurrences` | PR numbers where this was observed |
-| `status` | `active` / `intermittent` / `resolved` |
+| `status` | `suspected` / `confirmed` / `resolved` |
 | `resolution` | What to do when this failure appears |
 | `jira` | Tracking ticket (optional) |
 | `notes` | Additional context (optional) |
diff --git a/packages/gen-ai/.claude/skills/flake-check/SKILL.md b/packages/gen-ai/.claude/skills/flake-check/SKILL.md
@@ -144,7 +144,7 @@ If `parse_warnings` is non-empty, note that in the report and explain what likel
 python3 <base_path>/scripts/check_registry.py --active-only
 ```
 
-This returns all active and intermittent registry entries as JSON. Load this once and use it for all Confirmed flaky checks across every failing test.
+This returns all suspected and confirmed registry entries as JSON. Load this once and use it for all registry-match checks across every failing test.
 
 ### Step 4 — Classify each failing test
 
@@ -254,7 +254,7 @@ For each suspected flaky test, compare the PR's changed file paths against the f
 
 ## `mark-flaky` Sub-command
 
-Register a confirmed flaky test in `packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml` after a developer has verified the test is genuinely intermittent.
+Register a flaky test in `packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml` after a developer has seen it fail in a way that looks intermittent. Only genuine flaky tests belong here — consistent product bug failures should be tracked in Jira and quarantined with `@Bug` in the test suite.
 
 ### Step 1 — Gather information
 
@@ -265,7 +265,7 @@ Ask the user for (or infer from a just-completed investigation):
 - **PR numbers** — where it was observed (e.g. `#4821,#4897`)
 - **Symptom** — the actual error message seen
 - **Resolution** — what to do when it appears (e.g. `Rerun — passes on retry`)
-- **Status** — `intermittent` (default) or `active` (consistent blocker) or `resolved`
+- **Status** — `suspected` (default, seen once or twice) or `confirmed` (verified across multiple PRs) or `resolved`
 - **Jira** — ticket key if one exists (optional)
 
 ### Step 2 — Write the registry entry
@@ -277,7 +277,7 @@ python3 <base_path>/scripts/mark_flaky.py \
     --area <area> \
     --prs "#4821,#4897" \
     --symptom "<error pattern>" \
-    --status intermittent \
+    --status suspected \
     --resolution "<what to do>" \
     --jira "RHOAIENG-12345"
 ```
@@ -430,7 +430,7 @@ Use this format instead of the standard report when `filters.file_filter` is non
   - PRs: #<n> (author: <author>), #<n> (author: <author>), ...
   - <If N >= 3 PRs and authors differ:> Strong cross-PR recurrence across unrelated changes — high flaky signal
   - <If N == 1:> Single occurrence — insufficient data to classify; investigate before assuming flaky
-  - <If N >= 2:> Consider running `/flake-check mark-flaky` if confirmed intermittent
+  - <If N >= 2:> Consider running `/flake-check mark-flaky` if confirmed flaky
 
 ### No matches found
 <Only present when with_failures == 0:>
@@ -465,21 +465,21 @@ python3 <base_path>/scripts/check_registry.py
 ### Step 2 — Compute and output stats
 
 From the returned entries, calculate:
-- Total tracked tests by status (`active`, `intermittent`, `resolved`)
+- Total tracked tests by status (`suspected`, `confirmed`, `resolved`)
 - Count of tests per area
 - Total PR occurrences per area (sum of `pr_occurrences` list lengths)
 - Tests with the most PR occurrences (top 5)
-- Most recently seen (`last_seen`) active tests
+- Most recently seen (`last_seen`) suspected/confirmed tests
 
 ```
 ## Flaky Test Registry Stats
 
-**Total tracked:** <N> tests (<N> active, <N> intermittent, <N> resolved)
+**Total tracked:** <N> tests (<N> suspected, <N> confirmed, <N> resolved)
 
 ### By Area
 
-| Area | Active | Intermittent | Total PRs Impacted | Most Recent |
-|------|--------|--------------|--------------------|-------------|
+| Area | Suspected | Confirmed | Total PRs Impacted | Most Recent |
+|------|-----------|-----------|--------------------|--------------
 | <area> | <N> | <N> | <N> | <date> |
 
 ### Most Impactful (by PR occurrences)
diff --git a/packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml b/packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml
@@ -3,9 +3,9 @@
 # Updated automatically via /flake-check mark-flaky in Claude Code.
 #
 # Status values:
-#   active       — currently failing intermittently or consistently
-#   intermittent — fails occasionally; rerun usually resolves it
-#   resolved     — root cause fixed; entry kept for historical reference
+#   suspected  — seen failing in a flaky pattern but not yet confirmed across multiple PRs
+#   confirmed  — verified flaky across multiple unrelated PRs; rerun with confidence
+#   resolved   — root cause fixed; entry kept for historical reference
 #
 # Schema:
 #   id             — unique ID in <area>-<NNN> format (e.g. model-catalog-003)
@@ -16,27 +16,13 @@
 #   first_seen     — ISO date first observed (YYYY-MM-DD)
 #   last_seen      — ISO date last observed (YYYY-MM-DD)
 #   pr_occurrences — list of PR numbers affected (e.g. ["#4821", "#4897"])
-#   status         — active | intermittent | resolved
+#   status         — suspected | confirmed | resolved
 #   resolution     — what to do when this failure appears
 #   jira           — Jira ticket key (optional)
 #   notes          — additional context (optional)
 
 entries:
   - id: model-catalog-001
-    test: "Admin can enable and disable model catalog sources"
-    file: "packages/cypress/cypress/tests/e2e/modelCatalog/testSourceEnableDisable.cy.ts"
-    area: model-catalog
-    symptoms:
-      - "Timed out retrying"
-    first_seen: "2026-04-13"
-    last_seen: "2026-04-13"
-    pr_occurrences: []
-    status: active
-    resolution: "Product bug — do not rerun expecting it to pass. Consistent failure until RHOAIENG-53704 is resolved."
-    jira: "RHOAIENG-53704"
-    notes: "Tagged @Bug in test suite. Root cause is a product defect, not a flaky test."
-
-  - id: model-catalog-002
     test: "Performance view toggle enables filters and shows benchmark data on validated models"
     file: "packages/cypress/cypress/tests/e2e/modelCatalog/testPerformanceFiltersAvailable.cy.ts"
     area: model-catalog
@@ -46,7 +32,7 @@ entries:
     first_seen: "2026-04-13"
     last_seen: "2026-04-13"
     pr_occurrences: []
-    status: intermittent
+    status: suspected
     resolution: "Rerun may help. Underlying automation issue tracked in RHOAIENG-55621."
     jira: "RHOAIENG-55621"
     notes: "Automation bug tagged @Maintain. Intermittent failures related to performance filter rendering timing."
diff --git a/packages/gen-ai/.claude/skills/flake-check/scripts/check_registry.py b/packages/gen-ai/.claude/skills/flake-check/scripts/check_registry.py
@@ -24,7 +24,7 @@
     # Combine (either match qualifies)
     python3 check_registry.py --test "Admin can enable" --file "testSourceEnableDisable.cy.ts"
 
-    # Only active/intermittent entries (exclude resolved)
+    # Only suspected/confirmed entries (exclude resolved)
     python3 check_registry.py --active-only
 
 Output JSON shape:
@@ -99,7 +99,7 @@ def main() -> None:
     parser.add_argument(
         "--active-only",
         action="store_true",
-        help="Only return active or intermittent entries (exclude resolved)",
+        help="Only return suspected or confirmed entries (exclude resolved)",
     )
     args = parser.parse_args()
 
@@ -121,7 +121,7 @@ def main() -> None:
     entries: list[dict] = data.get("entries", []) if data else []
 
     if args.active_only:
-        entries = [e for e in entries if e.get("status", "active") in ("active", "intermittent")]
+        entries = [e for e in entries if e.get("status", "suspected") in ("suspected", "confirmed")]
 
     matched = [
         e for e in entries
diff --git a/packages/gen-ai/.claude/skills/flake-check/scripts/mark_flaky.py b/packages/gen-ai/.claude/skills/flake-check/scripts/mark_flaky.py
@@ -136,9 +136,9 @@ def main() -> None:
     parser.add_argument("--symptom", default=None, help="Error message or pattern observed")
     parser.add_argument(
         "--status",
-        default="intermittent",
-        choices=["active", "intermittent", "resolved"],
-        help="Entry status (default: intermittent)",
+        default="suspected",
+        choices=["suspected", "confirmed", "resolved"],
+        help="Entry status (default: suspected)",
     )
     parser.add_argument("--resolution", required=True, help="What to do when this failure appears")
     parser.add_argument("--jira", default=None, help="Jira ticket key (optional)")