You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: packages/gen-ai/.claude/skills/flake-check/README.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,6 +98,20 @@ For suspected flaky, the skill further annotates based on code overlap:
98
98
99
99
---
100
100
101
+
## Supporting Scripts
102
+
103
+
The skill uses four Python scripts in `scripts/` for deterministic data collection. All are zero-dependency (Python stdlib + `gh` CLI).
104
+
105
+
| Script | Purpose |
106
+
|---|---|
107
+
|`fetch_pr_failures.py`| Fetches PR metadata and the list of failing/pending checks from GitHub |
108
+
|`fetch_test_failures.py`| Downloads CI job logs and extracts individual failing test names and errors |
109
+
|`scan_prs.py`| Surveys recent PRs for recurring failure patterns without fetching logs |
110
+
111
+
Check name classification (deterministic vs. non-deterministic, flaky risk level) is handled via an inline whitelist table in SKILL.md. The whitelist covers the set of check names that appear in practice — including non-obvious ones like `test-and-build` (a high-flaky-risk Cypress suite) and `test` (a Go BFF test). If a new check type is added to CI, update the table in SKILL.md.
112
+
113
+
---
114
+
101
115
## Tracking Flaky Tests
102
116
103
117
Suspected/Confirmed flaky tests can be tracked as **Jira Tasks** with the `flaky-test` label in RHOAIENG. When an investigation identifies a suspected flaky test with strong evidence, the report includes pre-filled Jira fields — say "create the Jira" to file it.
Copy file name to clipboardExpand all lines: packages/gen-ai/.claude/skills/flake-check/SKILL.md
+15-11Lines changed: 15 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,16 +90,20 @@ This returns a JSON object with:
90
90
91
91
If there are no `failing_checks` and no `pending_checks`, report that all checks are passing and note any remaining review blockers from `review_decision`.
92
92
93
-
For each `failing_check`, use judgment about whether to fetch logs:
94
-
-**Self-explanatory deterministic checks** (`Lint`, `Type-Check`, `Build *`, `Application Quality Gate`) — classify immediately as a real failure without fetching logs; these jobs don't flake
95
-
-**Ambiguous or unrecognised check names** — if the check name doesn't clearly indicate its job type, run the workflow catalog tool (see below) before deciding whether to fetch logs
96
-
-**Non-deterministic checks** (`Cypress-Mock-Tests`, `Unit-Tests`, `Contract-Tests`, and any check with a matrix suffix in parentheses) — proceed to Step 2
97
-
98
-
**Workflow catalog tool** — use when a check name is ambiguous or unrecognised:
The `check_prefix_index` maps check name prefixes to `{category, deterministic, flaky_risk, description}`. Match by finding the longest prefix that is a prefix of the failing check name (matrix jobs append values in parentheses, e.g. `Cypress-Mock-Tests (mcpCatalog, ...)` matches prefix `Cypress-Mock-Tests`).
93
+
For each `failing_check`, use this table to decide whether to fetch logs. Matrix jobs append values in parentheses — match by prefix (e.g. `Cypress-Mock-Tests (mcpCatalog, ...)` matches `Cypress-Mock-Tests`):
94
+
95
+
| Check name prefix | Type | Flaky risk | Action |
|`Lint`| lint | none | Skip — deterministic, classify as ❌ Likely Real |
103
+
|`Type-Check`| type-check | none | Skip — deterministic, classify as ❌ Likely Real |
104
+
|`Build *`| build | none | Skip — deterministic, classify as ❌ Likely Real |
105
+
|`Application Quality Gate`| quality-gate | none | Skip — deterministic, classify as ❌ Likely Real |
106
+
| anything else | unknown | unknown | Use judgment — if name suggests infra/setup, skip; otherwise fetch logs |
103
107
104
108
### Step 2 — Fetch test failure details for each non-deterministic failing check
105
109
@@ -312,7 +316,7 @@ Run this once per PR number across all patterns (deduplicate PR numbers to avoid
312
316
**Signal assignment logic (in priority order):**
313
317
1.`⚠️ Suspected Flaky` — check name appears in `patterns` (cross-PR recurrence); annotate with overlap verdict when `--deep` was used
314
318
2.`❌ Likely Real` — check name is clearly deterministic (Lint, Type-Check, Build, kustomize) — these don't flake
315
-
3.`❓ Unknown` — check name is ambiguous or is a test runner check with no cross-PR pattern yet; run `load_workflows.py --pr-only` and look up the name in `check_prefix_index`to resolve it; note that no pattern at scan level doesn't mean the failure isn't flaky — it means there isn't enough signal yet without fetching logs
319
+
3.`❓ Unknown` — check name is ambiguous or is a test runner check with no cross-PR pattern yet; look it up in the check name table in Step 1 of Main Investigation to classify it; note that no pattern at scan level doesn't mean the failure isn't flaky — it means there isn't enough signal yet without fetching logs
0 commit comments