You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: packages/gen-ai/.claude/skills/flake-check/SKILL.md
+7-9Lines changed: 7 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -354,15 +354,13 @@ When `--file` is passed, `scan_prs.py` fetches CI logs for all failing checks no
354
354
355
355
`scan_prs.py` returns:
356
356
-`prs` — list of PRs with their failing check names
357
-
-`patterns` — check names that **visibly failed** on more than one PR, each with `failure_rate` (failures / appearances) and the list of PR numbers it failed on
358
-
-`rerun_patterns` — *(with `--deep` or `--file`)* check names that **failed then passed on the same commit SHA** on two or more PRs, with `rerun_count` and `rerun_rate` (reruns / PRs scanned). This surfaces flaky tests that devs routinely re-run — they won't appear in `patterns` because `statusCheckRollup` only shows the final passing state.
359
-
-`test_patterns` — *(with `--deep` or `--file`)***individual test names** that failed on two or more PRs, each with `failure_count`, `failure_rate`, `pr_numbers`, and distinct `errors` seen. A test appearing here means the *same specific `it()` block* recurred across PRs — much stronger evidence than the same job recurring. With `--deep`: fetches logs for all failing checks not excluded by `_DETERMINISTIC_PREFIXES` across all scanned PRs (the same test can recur across different check matrix variants — this catches it). With `--file`: same log fetching but filtered to a specific file.
357
+
-`patterns` — check names that **visibly failed** on more than one PR, each with `failure_count` and the list of PR numbers it failed on
358
+
-`rerun_patterns` — *(with `--deep` or `--file`)* check names that **failed then passed on the same commit SHA** on two or more PRs, with `rerun_count` and PR numbers. This surfaces flaky tests that devs routinely re-run — they won't appear in `patterns` because `statusCheckRollup` only shows the final passing state.
359
+
-`test_patterns` — *(with `--deep` or `--file`)***individual test names** that failed on two or more PRs, each with `failure_count`, `pr_numbers`, and distinct `errors` seen. A test appearing here means the *same specific `it()` block* recurred across PRs — much stronger evidence than the same job recurring. With `--deep`: fetches logs for all failing checks not excluded by `_DETERMINISTIC_PREFIXES` across all scanned PRs (the same test can recur across different check matrix variants — this catches it). With `--file`: same log fetching but filtered to a specific file.
360
360
-`bots_excluded` — count of bot PRs filtered out
361
361
-`all_passing_count` — PRs where everything passed
362
362
-`filters` — the resolved `since`/`until`/`limit`/`deep` values actually used
363
363
364
-
**Note on rates:** both `failure_rate` and `rerun_rate` are relative to the scan window. Always read them alongside the counts (e.g. `3/20 PRs = 15%`) rather than comparing rates across different scans.
*Skip this step when neither `--deep` nor `--file` was passed, or when `test_patterns` is empty.*
@@ -402,14 +400,14 @@ Run this once per PR number across all test patterns (deduplicate to avoid redun
402
400
403
401
### Patterns Observed (visible failures)
404
402
<For each entry in patterns:>
405
-
- "<check_name>" (<job_type>) failed in <N>/<scanned> PRs (<rate>%)
403
+
- "<check_name>" (<job_type>) failed in <N>/<scanned> PRs
406
404
- PRs: #<n>, #<n>, ...
407
405
- Classify as: ⚠️ Suspected Flaky
408
406
409
407
### Test-Level Patterns — only present with --deep or --file (when test_patterns is non-empty)
410
408
<The test_name field is the full Cypress name: describe chain + it() block concatenated. Split it for display using this approach: if the it() description starts with "should", locate the first "should" in the test_name and treat everything from that word onward as the it() description, and everything before as the suite path. If the it() description does NOT start with "should", look up the actual test file (using the `file` field) with Grep to find the exact `it('...')` string that matches the test_name suffix — this gives the correct it() boundary. Never guess the split point when the test doesn't follow the "should…" convention. Format each entry as shown below.>
411
409
<For each entry in test_patterns, incorporating overlap verdict from Step 3:>
0 commit comments