Skip to content

Commit fcf58fb

Browse files
committed
update statuses used in flaky-tests.yaml (and related scripts)
1 parent 9c01c62 commit fcf58fb

5 files changed

Lines changed: 32 additions & 36 deletions

File tree

packages/gen-ai/.claude/skills/flake-check/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,16 @@ How to find this:
4747
- `/flake-check <PR> --deep` — reports `rerun_detected` entries for the specific PR
4848
- `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs, identifying checks that developers routinely re-run to get past
4949

50+
**Related: `/retest` PR comments (low signal, manual only)**
51+
52+
When a developer posts a `/retest` comment on a PR, they are manually triggering a CI rerun — the human-visible equivalent of the automated signal above. A PR with one or more `/retest` comments *may* indicate a flaky test, but this signal is weak on its own because:
53+
54+
- `/retest` often follows a real fix (e.g. after pushing a correction) — not every rerun is flakiness
55+
- A PR with multiple `/retest` comments on a check that keeps failing is *more* suggestive of an intermittent issue
56+
- This signal is only visible when manually reading PR comments; the skill does not scan for it automatically
57+
58+
Use it as a prompt for investigation, not as a classification. If you notice `/retest` comments while reviewing a PR and the check eventually passed, treat that as supporting evidence alongside Signal 1–3 above.
59+
5060
### Signal 5 — Symptom pattern match (weak, starting point only)
5161

5262
The error message matches a known timing or infrastructure error pattern:
@@ -103,7 +113,7 @@ Machine-readable source of truth for known flaky tests. Each entry has:
103113
| `symptoms` | List of error strings or patterns observed |
104114
| `first_seen` / `last_seen` | ISO dates |
105115
| `pr_occurrences` | PR numbers where this was observed |
106-
| `status` | `active` / `intermittent` / `resolved` |
116+
| `status` | `suspected` / `confirmed` / `resolved` |
107117
| `resolution` | What to do when this failure appears |
108118
| `jira` | Tracking ticket (optional) |
109119
| `notes` | Additional context (optional) |

packages/gen-ai/.claude/skills/flake-check/SKILL.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ If `parse_warnings` is non-empty, note that in the report and explain what likel
144144
python3 <base_path>/scripts/check_registry.py --active-only
145145
```
146146

147-
This returns all active and intermittent registry entries as JSON. Load this once and use it for all Confirmed flaky checks across every failing test.
147+
This returns all suspected and confirmed registry entries as JSON. Load this once and use it for all registry-match checks across every failing test.
148148

149149
### Step 4 — Classify each failing test
150150

@@ -254,7 +254,7 @@ For each suspected flaky test, compare the PR's changed file paths against the f
254254

255255
## `mark-flaky` Sub-command
256256

257-
Register a confirmed flaky test in `packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml` after a developer has verified the test is genuinely intermittent.
257+
Register a flaky test in `packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml` after a developer has seen it fail in a way that looks intermittent. Only genuine flaky tests belong here — consistent product bug failures should be tracked in Jira and quarantined with `@Bug` in the test suite.
258258

259259
### Step 1 — Gather information
260260

@@ -265,7 +265,7 @@ Ask the user for (or infer from a just-completed investigation):
265265
- **PR numbers** — where it was observed (e.g. `#4821,#4897`)
266266
- **Symptom** — the actual error message seen
267267
- **Resolution** — what to do when it appears (e.g. `Rerun — passes on retry`)
268-
- **Status**`intermittent` (default) or `active` (consistent blocker) or `resolved`
268+
- **Status**`suspected` (default, seen once or twice) or `confirmed` (verified across multiple PRs) or `resolved`
269269
- **Jira** — ticket key if one exists (optional)
270270

271271
### Step 2 — Write the registry entry
@@ -277,7 +277,7 @@ python3 <base_path>/scripts/mark_flaky.py \
277277
--area <area> \
278278
--prs "#4821,#4897" \
279279
--symptom "<error pattern>" \
280-
--status intermittent \
280+
--status suspected \
281281
--resolution "<what to do>" \
282282
--jira "RHOAIENG-12345"
283283
```
@@ -430,7 +430,7 @@ Use this format instead of the standard report when `filters.file_filter` is non
430430
- PRs: #<n> (author: <author>), #<n> (author: <author>), ...
431431
- <If N >= 3 PRs and authors differ:> Strong cross-PR recurrence across unrelated changes — high flaky signal
432432
- <If N == 1:> Single occurrence — insufficient data to classify; investigate before assuming flaky
433-
- <If N >= 2:> Consider running `/flake-check mark-flaky` if confirmed intermittent
433+
- <If N >= 2:> Consider running `/flake-check mark-flaky` if confirmed flaky
434434
435435
### No matches found
436436
<Only present when with_failures == 0:>
@@ -465,21 +465,21 @@ python3 <base_path>/scripts/check_registry.py
465465
### Step 2 — Compute and output stats
466466

467467
From the returned entries, calculate:
468-
- Total tracked tests by status (`active`, `intermittent`, `resolved`)
468+
- Total tracked tests by status (`suspected`, `confirmed`, `resolved`)
469469
- Count of tests per area
470470
- Total PR occurrences per area (sum of `pr_occurrences` list lengths)
471471
- Tests with the most PR occurrences (top 5)
472-
- Most recently seen (`last_seen`) active tests
472+
- Most recently seen (`last_seen`) suspected/confirmed tests
473473

474474
```
475475
## Flaky Test Registry Stats
476476
477-
**Total tracked:** <N> tests (<N> active, <N> intermittent, <N> resolved)
477+
**Total tracked:** <N> tests (<N> suspected, <N> confirmed, <N> resolved)
478478
479479
### By Area
480480
481-
| Area | Active | Intermittent | Total PRs Impacted | Most Recent |
482-
|------|--------|--------------|--------------------|-------------|
481+
| Area | Suspected | Confirmed | Total PRs Impacted | Most Recent |
482+
|------|-----------|-----------|--------------------|--------------
483483
| <area> | <N> | <N> | <N> | <date> |
484484
485485
### Most Impactful (by PR occurrences)

packages/gen-ai/.claude/skills/flake-check/flaky-tests.yaml

Lines changed: 5 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
# Updated automatically via /flake-check mark-flaky in Claude Code.
44
#
55
# Status values:
6-
# active — currently failing intermittently or consistently
7-
# intermittent — fails occasionally; rerun usually resolves it
8-
# resolved — root cause fixed; entry kept for historical reference
6+
# suspected — seen failing in a flaky pattern but not yet confirmed across multiple PRs
7+
# confirmed — verified flaky across multiple unrelated PRs; rerun with confidence
8+
# resolved — root cause fixed; entry kept for historical reference
99
#
1010
# Schema:
1111
# id — unique ID in <area>-<NNN> format (e.g. model-catalog-003)
@@ -16,27 +16,13 @@
1616
# first_seen — ISO date first observed (YYYY-MM-DD)
1717
# last_seen — ISO date last observed (YYYY-MM-DD)
1818
# pr_occurrences — list of PR numbers affected (e.g. ["#4821", "#4897"])
19-
# status — active | intermittent | resolved
19+
# status — suspected | confirmed | resolved
2020
# resolution — what to do when this failure appears
2121
# jira — Jira ticket key (optional)
2222
# notes — additional context (optional)
2323

2424
entries:
2525
- id: model-catalog-001
26-
test: "Admin can enable and disable model catalog sources"
27-
file: "packages/cypress/cypress/tests/e2e/modelCatalog/testSourceEnableDisable.cy.ts"
28-
area: model-catalog
29-
symptoms:
30-
- "Timed out retrying"
31-
first_seen: "2026-04-13"
32-
last_seen: "2026-04-13"
33-
pr_occurrences: []
34-
status: active
35-
resolution: "Product bug — do not rerun expecting it to pass. Consistent failure until RHOAIENG-53704 is resolved."
36-
jira: "RHOAIENG-53704"
37-
notes: "Tagged @Bug in test suite. Root cause is a product defect, not a flaky test."
38-
39-
- id: model-catalog-002
4026
test: "Performance view toggle enables filters and shows benchmark data on validated models"
4127
file: "packages/cypress/cypress/tests/e2e/modelCatalog/testPerformanceFiltersAvailable.cy.ts"
4228
area: model-catalog
@@ -46,7 +32,7 @@ entries:
4632
first_seen: "2026-04-13"
4733
last_seen: "2026-04-13"
4834
pr_occurrences: []
49-
status: intermittent
35+
status: suspected
5036
resolution: "Rerun may help. Underlying automation issue tracked in RHOAIENG-55621."
5137
jira: "RHOAIENG-55621"
5238
notes: "Automation bug tagged @Maintain. Intermittent failures related to performance filter rendering timing."

packages/gen-ai/.claude/skills/flake-check/scripts/check_registry.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
# Combine (either match qualifies)
2525
python3 check_registry.py --test "Admin can enable" --file "testSourceEnableDisable.cy.ts"
2626
27-
# Only active/intermittent entries (exclude resolved)
27+
# Only suspected/confirmed entries (exclude resolved)
2828
python3 check_registry.py --active-only
2929
3030
Output JSON shape:
@@ -99,7 +99,7 @@ def main() -> None:
9999
parser.add_argument(
100100
"--active-only",
101101
action="store_true",
102-
help="Only return active or intermittent entries (exclude resolved)",
102+
help="Only return suspected or confirmed entries (exclude resolved)",
103103
)
104104
args = parser.parse_args()
105105

@@ -121,7 +121,7 @@ def main() -> None:
121121
entries: list[dict] = data.get("entries", []) if data else []
122122

123123
if args.active_only:
124-
entries = [e for e in entries if e.get("status", "active") in ("active", "intermittent")]
124+
entries = [e for e in entries if e.get("status", "suspected") in ("suspected", "confirmed")]
125125

126126
matched = [
127127
e for e in entries

packages/gen-ai/.claude/skills/flake-check/scripts/mark_flaky.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -136,9 +136,9 @@ def main() -> None:
136136
parser.add_argument("--symptom", default=None, help="Error message or pattern observed")
137137
parser.add_argument(
138138
"--status",
139-
default="intermittent",
140-
choices=["active", "intermittent", "resolved"],
141-
help="Entry status (default: intermittent)",
139+
default="suspected",
140+
choices=["suspected", "confirmed", "resolved"],
141+
help="Entry status (default: suspected)",
142142
)
143143
parser.add_argument("--resolution", required=True, help="What to do when this failure appears")
144144
parser.add_argument("--jira", default=None, help="Jira ticket key (optional)")

0 commit comments

Comments
 (0)