Skip to content

Commit f95c890

Browse files
committed
remove load_workflows.py in favor of inline whitelist in SKILL.md
1 parent 5de3024 commit f95c890

3 files changed

Lines changed: 29 additions & 338 deletions

File tree

packages/gen-ai/.claude/skills/flake-check/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,20 @@ For suspected flaky, the skill further annotates based on code overlap:
9898

9999
---
100100

101+
## Supporting Scripts
102+
103+
The skill uses four Python scripts in `scripts/` for deterministic data collection. All are zero-dependency (Python stdlib + `gh` CLI).
104+
105+
| Script | Purpose |
106+
|---|---|
107+
| `fetch_pr_failures.py` | Fetches PR metadata and the list of failing/pending checks from GitHub |
108+
| `fetch_test_failures.py` | Downloads CI job logs and extracts individual failing test names and errors |
109+
| `scan_prs.py` | Surveys recent PRs for recurring failure patterns without fetching logs |
110+
111+
Check name classification (deterministic vs. non-deterministic, flaky risk level) is handled via an inline whitelist table in SKILL.md. The whitelist covers the set of check names that appear in practice — including non-obvious ones like `test-and-build` (a high-flaky-risk Cypress suite) and `test` (a Go BFF test). If a new check type is added to CI, update the table in SKILL.md.
112+
113+
---
114+
101115
## Tracking Flaky Tests
102116

103117
Suspected/Confirmed flaky tests can be tracked as **Jira Tasks** with the `flaky-test` label in RHOAIENG. When an investigation identifies a suspected flaky test with strong evidence, the report includes pre-filled Jira fields — say "create the Jira" to file it.

packages/gen-ai/.claude/skills/flake-check/SKILL.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -90,16 +90,20 @@ This returns a JSON object with:
9090

9191
If there are no `failing_checks` and no `pending_checks`, report that all checks are passing and note any remaining review blockers from `review_decision`.
9292

93-
For each `failing_check`, use judgment about whether to fetch logs:
94-
- **Self-explanatory deterministic checks** (`Lint`, `Type-Check`, `Build *`, `Application Quality Gate`) — classify immediately as a real failure without fetching logs; these jobs don't flake
95-
- **Ambiguous or unrecognised check names** — if the check name doesn't clearly indicate its job type, run the workflow catalog tool (see below) before deciding whether to fetch logs
96-
- **Non-deterministic checks** (`Cypress-Mock-Tests`, `Unit-Tests`, `Contract-Tests`, and any check with a matrix suffix in parentheses) — proceed to Step 2
97-
98-
**Workflow catalog tool** — use when a check name is ambiguous or unrecognised:
99-
```bash
100-
python3 <base_path>/scripts/load_workflows.py --pr-only
101-
```
102-
The `check_prefix_index` maps check name prefixes to `{category, deterministic, flaky_risk, description}`. Match by finding the longest prefix that is a prefix of the failing check name (matrix jobs append values in parentheses, e.g. `Cypress-Mock-Tests (mcpCatalog, ...)` matches prefix `Cypress-Mock-Tests`).
93+
For each `failing_check`, use this table to decide whether to fetch logs. Matrix jobs append values in parentheses — match by prefix (e.g. `Cypress-Mock-Tests (mcpCatalog, ...)` matches `Cypress-Mock-Tests`):
94+
95+
| Check name prefix | Type | Flaky risk | Action |
96+
|---|---|---|---|
97+
| `Cypress-Mock-Tests` | cypress-mock | high | Fetch logs → Step 2 |
98+
| `test-and-build` | cypress-mock | high | Fetch logs → Step 2 |
99+
| `Unit-Tests` | jest | low | Fetch logs → Step 2 |
100+
| `Contract-Tests` | contract | medium | Fetch logs → Step 2 |
101+
| `test` | go-test | low | Fetch logs → Step 2 |
102+
| `Lint` | lint | none | Skip — deterministic, classify as ❌ Likely Real |
103+
| `Type-Check` | type-check | none | Skip — deterministic, classify as ❌ Likely Real |
104+
| `Build *` | build | none | Skip — deterministic, classify as ❌ Likely Real |
105+
| `Application Quality Gate` | quality-gate | none | Skip — deterministic, classify as ❌ Likely Real |
106+
| anything else | unknown | unknown | Use judgment — if name suggests infra/setup, skip; otherwise fetch logs |
103107

104108
### Step 2 — Fetch test failure details for each non-deterministic failing check
105109

@@ -312,7 +316,7 @@ Run this once per PR number across all patterns (deduplicate PR numbers to avoid
312316
**Signal assignment logic (in priority order):**
313317
1. `⚠️ Suspected Flaky` — check name appears in `patterns` (cross-PR recurrence); annotate with overlap verdict when `--deep` was used
314318
2. `❌ Likely Real` — check name is clearly deterministic (Lint, Type-Check, Build, kustomize) — these don't flake
315-
3. `❓ Unknown` — check name is ambiguous or is a test runner check with no cross-PR pattern yet; run `load_workflows.py --pr-only` and look up the name in `check_prefix_index` to resolve it; note that no pattern at scan level doesn't mean the failure isn't flaky — it means there isn't enough signal yet without fetching logs
319+
3. `❓ Unknown` — check name is ambiguous or is a test runner check with no cross-PR pattern yet; look it up in the check name table in Step 1 of Main Investigation to classify it; note that no pattern at scan level doesn't mean the failure isn't flaky — it means there isn't enough signal yet without fetching logs
316320

317321
```
318322
## Recent PR Scan — <since>–<until> | <N> PRs scanned (<bots_excluded> bots excluded)

packages/gen-ai/.claude/skills/flake-check/scripts/load_workflows.py

Lines changed: 0 additions & 327 deletions
This file was deleted.

0 commit comments

Comments
 (0)