Skip to content

Commit 5de3024

Browse files
committed
remove flaky-tests.yaml and local flaky tests register to simplify and try more jira based approach initially, skill readme updates
1 parent fcf58fb commit 5de3024

5 files changed

Lines changed: 96 additions & 626 deletions

File tree

Lines changed: 37 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# flake-check — Skill README
22

3-
The `/flake-check` skill investigates CI failures on pull requests and classifies them as **confirmed flaky**, **suspected flaky**, or **likely real** using a combination of live CI data, symptom pattern matching, and the registry in `flaky-tests.yaml`.
3+
The `/flake-check` skill investigates CI failures tied to github PRs and highlights suspected flaky tests; using live CI data, symptom pattern matching, and cross-PR overlap analysis. A jira may be logged for a suspected/confirmed flaky test to track resolving it.
44

55
## Commands
66

@@ -9,22 +9,32 @@ The `/flake-check` skill investigates CI failures on pull requests and classifie
99
| `/flake-check <PR>` | Investigate a single PR — fetches failing checks, reads logs, classifies each failure |
1010
| `/flake-check <PR> --deep` | Same as above, plus detects checks that previously failed then passed on the same commit SHA |
1111
| `/flake-check scan` | Lightweight survey of the last 20 PRs for recurring failure patterns (no log fetching) |
12+
| `/flake-check scan --deep` | Same as above, plus rerun detection and cross-PR code overlap analysis for each pattern |
1213
| `/flake-check scan <N>d` | Survey PRs from the last N days (e.g. `scan 7d`) |
1314
| `/flake-check scan --file <filename>` | Find every PR in the window where a specific test file appeared in failures (fetches logs — slower) |
14-
| `/flake-check mark-flaky` | Register a confirmed flaky test in `flaky-tests.yaml` |
15-
| `/flake-check stats` | Show registry trends — most impactful tests, area breakdown, recently active entries |
15+
16+
**Examples — copy/paste and substitute your own values:**
17+
18+
```
19+
/flake-check 7301
20+
/flake-check 7301 --deep
21+
/flake-check scan
22+
/flake-check scan --deep
23+
/flake-check scan 7d
24+
/flake-check scan 14d --deep
25+
/flake-check scan --file pipelineCreateRuns.cy.ts
26+
/flake-check scan 30d --file pipelineCreateRuns.cy.ts
27+
/flake-check scan --since 14d --until 7d
28+
/flake-check scan --since 2026-04-01 --until 2026-04-15
29+
```
1630

1731
---
1832

1933
## How We Identify Flaky Tests
2034

2135
Flaky classification is never based on a single signal. The skill accumulates evidence across multiple dimensions and applies them conservatively — a real regression can produce the same symptoms as a flaky test.
2236

23-
### Signal 1 — Registry match (strongest)
24-
25-
The test name or file is already in `flaky-tests.yaml` with prior PR occurrences. This is the only signal that produces a **Confirmed Flaky** verdict without further investigation. Cite the entry's `resolution` field and act on it.
26-
27-
### Signal 2 — Cross-PR recurrence (strong)
37+
### Signal 1 — Cross-PR recurrence (strong)
2838

2939
The same check or test file fails on multiple PRs whose code changes are in **unrelated areas**. For example, `pipelineCreateRuns.cy.ts` failing on a PR that only touched `api-keys/maas` code, and again on a PR that only touched `model-serving/` code — neither of which touches pipelines. When a test fails repeatedly across PRs with no common code thread, the failure is almost certainly independent of the code changes.
3040

@@ -33,31 +43,21 @@ How to find this:
3343
- `/flake-check scan --file <filename>` surfaces file-level recurrence with log-level detail
3444
- `/flake-check <PR> --deep` + then checking other recent PRs manually
3545

36-
### Signal 3 — No code overlap on a single PR (moderate)
46+
### Signal 2 — No code overlap on a single PR (moderate)
3747

3848
A test fails on a PR whose changes are entirely in a different feature area than the test exercises. For example, a pipelines test failing on a PR that only modifies authentication code. This is a moderate signal on its own — it means the failure is *likely* unrelated to the PR, but it could still be a pre-existing regression on `main`.
3949

4050
The skill performs this analysis automatically during PR investigation: it fetches the PR's changed files and compares the domain against the failing test's directory.
4151

42-
### Signal 4 — Rerun detection (moderate)
52+
### Signal 3 — Rerun detection (moderate)
4353

44-
A check **failed then passed on the same commit SHA** without any new code being pushed. This means a developer triggered a re-run and it passed — a strong behavioural indicator that the failure was transient. These hidden failures don't appear in GitHub's final check status, so they're easy to miss.
54+
A check **failed then passed on the same commit SHA** without any new code being pushed. This means a developer triggered a re-run and it passed — a strong behavioural indicator that the failure was transient.
4555

4656
How to find this:
4757
- `/flake-check <PR> --deep` — reports `rerun_detected` entries for the specific PR
48-
- `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs, identifying checks that developers routinely re-run to get past
49-
50-
**Related: `/retest` PR comments (low signal, manual only)**
58+
- `/flake-check scan --deep` — surfaces `rerun_patterns` across many PRs
5159

52-
When a developer posts a `/retest` comment on a PR, they are manually triggering a CI rerun — the human-visible equivalent of the automated signal above. A PR with one or more `/retest` comments *may* indicate a flaky test, but this signal is weak on its own because:
53-
54-
- `/retest` often follows a real fix (e.g. after pushing a correction) — not every rerun is flakiness
55-
- A PR with multiple `/retest` comments on a check that keeps failing is *more* suggestive of an intermittent issue
56-
- This signal is only visible when manually reading PR comments; the skill does not scan for it automatically
57-
58-
Use it as a prompt for investigation, not as a classification. If you notice `/retest` comments while reviewing a PR and the check eventually passed, treat that as supporting evidence alongside Signal 1–3 above.
59-
60-
### Signal 5 — Symptom pattern match (weak, starting point only)
60+
### Signal 4 — Symptom pattern match (weak, starting point only)
6161

6262
The error message matches a known timing or infrastructure error pattern:
6363

@@ -71,51 +71,38 @@ The error message matches a known timing or infrastructure error pattern:
7171
| `net::ERR_CONNECTION_REFUSED` | CI service failed to start or crashed |
7272
| `Cannot read properties of null` | Race condition — component unmounted or not yet mounted |
7373

74-
**A symptom match is a starting signal, not a verdict.** Always cross-reference with code overlap and cross-PR recurrence before classifying. A broken selector, a missing mock, or a genuine product bug can produce identical error messages.
74+
**A symptom match is a starting signal, not a verdict.** Always cross-reference with code overlap and cross-PR recurrence before classifying.
7575

7676
---
7777

7878
## Confidence Model
7979

8080
| Tier | Verdict | Criteria |
8181
|---|---|---|
82-
| 1 | **Confirmed Flaky** | Registry match in `flaky-tests.yaml` |
83-
| 2 | **Suspected Flaky** | Symptom pattern match, or cross-PR recurrence with no code overlap, or rerun detection — but not in the registry |
84-
| 3 | **Likely Real** | No registry match, no symptom pattern, and/or code overlap detected |
82+
| 1 | **Suspected Flaky** | Symptom pattern match, or cross-PR recurrence with no code overlap, or rerun detection |
83+
| 2 | **Likely Real** | No symptom pattern and/or code overlap detected |
8584

8685
For suspected flaky, the skill further annotates based on code overlap:
87-
- **No overlap** — failure is likely unrelated to this PR; safe to rerun, but register if it passes
86+
- **No overlap** — failure is likely unrelated to this PR; safe to rerun, but log a Jira if it passes
8887
- **Overlap detected** — a real regression is plausible; investigate before dismissing
8988
- **Unclear** — PR spans many areas or the test area is ambiguous; treat with caution
9089

9190
---
9291

9392
## What to Do When You Suspect a Flaky Test
9493

95-
1. **Check if it's already registered**`/flake-check stats` or look at `flaky-tests.yaml`
96-
2. **Rerun the failing check** — if it passes, that's strong evidence of flakiness
97-
3. **Check for cross-PR recurrence**`/flake-check scan --file <filename>` to see if it's happened before
98-
4. **Look at the test code** — is there a missing `cy.wait('@alias')`, a missing `.should('be.visible')` guard, or an obvious race condition?
99-
5. **If confirmed flaky** — run `/flake-check mark-flaky` to register it, then raise a Jira to fix the root cause
94+
1. **Rerun the failing check** — if it passes, that's strong evidence of flakiness
95+
2. **Check for cross-PR recurrence**`/flake-check scan --file <filename>` to see if it's happened before
96+
3. **Look at the test code** — is there a missing `cy.wait('@alias')`, a missing `.should('be.visible')` guard, or an obvious race condition?
97+
4. **If confirmed flaky** — say "create a Jira" and I'll search for an existing ticket and file a Task with label `flaky-test` if none exists, then fix the root cause
10098

10199
---
102100

103-
## The Registry (`flaky-tests.yaml`)
101+
## Tracking Flaky Tests
104102

105-
Machine-readable source of truth for known flaky tests. Each entry has:
103+
Suspected/Confirmed flaky tests can be tracked as **Jira Tasks** with the `flaky-test` label in RHOAIENG. When an investigation identifies a suspected flaky test with strong evidence, the report includes pre-filled Jira fields — say "create the Jira" to file it.
106104

107-
| Field | Description |
108-
|---|---|
109-
| `id` | Unique ID in `<area>-<NNN>` format (e.g. `pipelines-001`) |
110-
| `test` | Exact test name from the `it()` block |
111-
| `file` | Path relative to repo root |
112-
| `area` | Short slug (e.g. `model-catalog`, `pipelines`, `workbenches`) |
113-
| `symptoms` | List of error strings or patterns observed |
114-
| `first_seen` / `last_seen` | ISO dates |
115-
| `pr_occurrences` | PR numbers where this was observed |
116-
| `status` | `suspected` / `confirmed` / `resolved` |
117-
| `resolution` | What to do when this failure appears |
118-
| `jira` | Tracking ticket (optional) |
119-
| `notes` | Additional context (optional) |
120-
121-
Entries are created and updated via `/flake-check mark-flaky` — do not edit by hand unless correcting a mistake.
105+
To find all open flaky test tickets:
106+
```
107+
project = RHOAIENG AND labels = "flaky-test" AND status != Done
108+
```

0 commit comments

Comments
 (0)