Skip to content

Commit aacbd17

Browse files
committed
remove surfacing of rates
1 parent a0e9139 commit aacbd17

2 files changed

Lines changed: 8 additions & 17 deletions

File tree

packages/gen-ai/.claude/skills/flake-check/SKILL.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -354,15 +354,13 @@ When `--file` is passed, `scan_prs.py` fetches CI logs for all failing checks no
354354

355355
`scan_prs.py` returns:
356356
- `prs` — list of PRs with their failing check names
357-
- `patterns` — check names that **visibly failed** on more than one PR, each with `failure_rate` (failures / appearances) and the list of PR numbers it failed on
358-
- `rerun_patterns`*(with `--deep` or `--file`)* check names that **failed then passed on the same commit SHA** on two or more PRs, with `rerun_count` and `rerun_rate` (reruns / PRs scanned). This surfaces flaky tests that devs routinely re-run — they won't appear in `patterns` because `statusCheckRollup` only shows the final passing state.
359-
- `test_patterns`*(with `--deep` or `--file`)* **individual test names** that failed on two or more PRs, each with `failure_count`, `failure_rate`, `pr_numbers`, and distinct `errors` seen. A test appearing here means the *same specific `it()` block* recurred across PRs — much stronger evidence than the same job recurring. With `--deep`: fetches logs for all failing checks not excluded by `_DETERMINISTIC_PREFIXES` across all scanned PRs (the same test can recur across different check matrix variants — this catches it). With `--file`: same log fetching but filtered to a specific file.
357+
- `patterns` — check names that **visibly failed** on more than one PR, each with `failure_count` and the list of PR numbers it failed on
358+
- `rerun_patterns`*(with `--deep` or `--file`)* check names that **failed then passed on the same commit SHA** on two or more PRs, with `rerun_count` and PR numbers. This surfaces flaky tests that devs routinely re-run — they won't appear in `patterns` because `statusCheckRollup` only shows the final passing state.
359+
- `test_patterns`*(with `--deep` or `--file`)* **individual test names** that failed on two or more PRs, each with `failure_count`, `pr_numbers`, and distinct `errors` seen. A test appearing here means the *same specific `it()` block* recurred across PRs — much stronger evidence than the same job recurring. With `--deep`: fetches logs for all failing checks not excluded by `_DETERMINISTIC_PREFIXES` across all scanned PRs (the same test can recur across different check matrix variants — this catches it). With `--file`: same log fetching but filtered to a specific file.
360360
- `bots_excluded` — count of bot PRs filtered out
361361
- `all_passing_count` — PRs where everything passed
362362
- `filters` — the resolved `since`/`until`/`limit`/`deep` values actually used
363363

364-
**Note on rates:** both `failure_rate` and `rerun_rate` are relative to the scan window. Always read them alongside the counts (e.g. `3/20 PRs = 15%`) rather than comparing rates across different scans.
365-
366364
### Step 3 — Test-file overlap analysis (deep mode and file mode)
367365

368366
*Skip this step when neither `--deep` nor `--file` was passed, or when `test_patterns` is empty.*
@@ -402,14 +400,14 @@ Run this once per PR number across all test patterns (deduplicate to avoid redun
402400
403401
### Patterns Observed (visible failures)
404402
<For each entry in patterns:>
405-
- "<check_name>" (<job_type>) failed in <N>/<scanned> PRs (<rate>%)
403+
- "<check_name>" (<job_type>) failed in <N>/<scanned> PRs
406404
- PRs: #<n>, #<n>, ...
407405
- Classify as: ⚠️ Suspected Flaky
408406
409407
### Test-Level Patterns — only present with --deep or --file (when test_patterns is non-empty)
410408
<The test_name field is the full Cypress name: describe chain + it() block concatenated. Split it for display using this approach: if the it() description starts with "should", locate the first "should" in the test_name and treat everything from that word onward as the it() description, and everything before as the suite path. If the it() description does NOT start with "should", look up the actual test file (using the `file` field) with Grep to find the exact `it('...')` string that matches the test_name suffix — this gives the correct it() boundary. Never guess the split point when the test doesn't follow the "should…" convention. Format each entry as shown below.>
411409
<For each entry in test_patterns, incorporating overlap verdict from Step 3:>
412-
- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs (<rate>%)
410+
- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs
413411
- Suite: `<describe_chain>`
414412
- PRs: #<n>, #<n>, ...
415413
- Error(s): `<errors[0]>` <and any additional distinct errors>
@@ -423,7 +421,7 @@ No individual test recurred across multiple PRs — different tests failed withi
423421
424422
### Rerun Patterns (hidden failures) — only present with --deep
425423
<For each entry in rerun_patterns:>
426-
- "<check_name>" reran on <N>/<scanned> PRs (<rate>%) — failed then passed on the same commit SHA; not visible in statusCheckRollup
424+
- "<check_name>" reran on <N>/<scanned> PRs — failed then passed on the same commit SHA; not visible in statusCheckRollup
427425
428426
### PRs with no failures
429427
<N> PRs had all checks passing.
@@ -450,7 +448,7 @@ Use this format instead of the standard report when `filters.file_filter` is non
450448
### Recurring Tests (same test on multiple PRs)
451449
<For each entry in test_patterns, run the file-level overlap check from Step 3 using test_patterns[].file:>
452450
<Split test_name into it() and describe chain as described in the Test-Level Patterns section above.>
453-
- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs (<rate>%)
451+
- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs
454452
- Suite: `<describe_chain>`
455453
- PRs: #<n> (author: <author>), #<n> (author: <author>), ...
456454
- Error(s): `<errors[0]>` <and any additional distinct errors>

packages/gen-ai/.claude/skills/flake-check/scripts/scan_prs.py

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,13 @@
6161
{
6262
"check_name": "Cypress-Mock-Tests (mcpCatalog, ...)",
6363
"failure_count": 3,
64-
"failure_rate": 0.15,
6564
"pr_numbers": [7190, 7191, 7194]
6665
}
6766
],
6867
"rerun_patterns": [ # present with --deep or --file (--file implies --deep)
6968
{
7069
"check_name": "Cypress-Mock-Tests (mcpCatalog, ...)",
7170
"rerun_count": 4,
72-
"rerun_rate": 0.20,
7371
"pr_numbers": [7190, 7191, 7194, 7200]
7472
}
7573
],
@@ -78,7 +76,6 @@
7876
"test_name": "User can create a pipeline run",
7977
"file": "packages/cypress/cypress/tests/mocked/pipelines/pipelineCreateRuns.cy.ts",
8078
"failure_count": 3,
81-
"failure_rate": 0.15,
8279
"pr_numbers": [7200, 7300, 7350],
8380
"errors": ["Timed out retrying after 4000ms"]
8481
}
@@ -88,9 +85,9 @@
8885

8986
import argparse
9087
import json
88+
import os
9189
import re
9290
import subprocess
93-
import os
9491
import sys
9592
from collections import defaultdict
9693
from concurrent.futures import ThreadPoolExecutor, as_completed
@@ -425,7 +422,6 @@ def main() -> None:
425422
{
426423
"check_name": name,
427424
"failure_count": len(pr_nums),
428-
"failure_rate": round(len(pr_nums) / scanned_count, 2) if scanned_count else 1.0,
429425
"pr_numbers": sorted(pr_nums, reverse=True),
430426
}
431427
for name, pr_nums in sorted(failure_index.items(), key=lambda kv: -len(kv[1]))
@@ -454,12 +450,10 @@ def main() -> None:
454450
}
455451

456452
if run_deep:
457-
total_prs = len(pr_results)
458453
output["rerun_patterns"] = [
459454
{
460455
"check_name": name,
461456
"rerun_count": len(pr_nums),
462-
"rerun_rate": round(len(pr_nums) / total_prs, 2) if total_prs else 0.0,
463457
"pr_numbers": sorted(pr_nums, reverse=True),
464458
}
465459
for name, pr_nums in sorted(rerun_index.items(), key=lambda kv: -len(kv[1]))
@@ -490,7 +484,6 @@ def main() -> None:
490484
"test_name": name,
491485
"file": data["file"],
492486
"failure_count": len(data["pr_numbers"]),
493-
"failure_rate": round(len(data["pr_numbers"]) / scanned_count, 2) if scanned_count else 1.0,
494487
"pr_numbers": sorted(data["pr_numbers"], reverse=True),
495488
"errors": data["errors"],
496489
}

0 commit comments

Comments
 (0)