remove surfacing of rates

jharan1 · jharan1 · commit aacbd175550b · 2026-04-22T14:03:13.000+01:00
diff --git a/packages/gen-ai/.claude/skills/flake-check/SKILL.md b/packages/gen-ai/.claude/skills/flake-check/SKILL.md
@@ -354,15 +354,13 @@ When `--file` is passed, `scan_prs.py` fetches CI logs for all failing checks no
 
 `scan_prs.py` returns:
 - `prs` — list of PRs with their failing check names
-- `patterns` — check names that **visibly failed** on more than one PR, each with `failure_rate` (failures / appearances) and the list of PR numbers it failed on
-- `rerun_patterns` — *(with `--deep` or `--file`)* check names that **failed then passed on the same commit SHA** on two or more PRs, with `rerun_count` and `rerun_rate` (reruns / PRs scanned). This surfaces flaky tests that devs routinely re-run — they won't appear in `patterns` because `statusCheckRollup` only shows the final passing state.
-- `test_patterns` — *(with `--deep` or `--file`)* **individual test names** that failed on two or more PRs, each with `failure_count`, `failure_rate`, `pr_numbers`, and distinct `errors` seen. A test appearing here means the *same specific `it()` block* recurred across PRs — much stronger evidence than the same job recurring. With `--deep`: fetches logs for all failing checks not excluded by `_DETERMINISTIC_PREFIXES` across all scanned PRs (the same test can recur across different check matrix variants — this catches it). With `--file`: same log fetching but filtered to a specific file.
+- `patterns` — check names that **visibly failed** on more than one PR, each with `failure_count` and the list of PR numbers it failed on
+- `rerun_patterns` — *(with `--deep` or `--file`)* check names that **failed then passed on the same commit SHA** on two or more PRs, with `rerun_count` and PR numbers. This surfaces flaky tests that devs routinely re-run — they won't appear in `patterns` because `statusCheckRollup` only shows the final passing state.
+- `test_patterns` — *(with `--deep` or `--file`)* **individual test names** that failed on two or more PRs, each with `failure_count`, `pr_numbers`, and distinct `errors` seen. A test appearing here means the *same specific `it()` block* recurred across PRs — much stronger evidence than the same job recurring. With `--deep`: fetches logs for all failing checks not excluded by `_DETERMINISTIC_PREFIXES` across all scanned PRs (the same test can recur across different check matrix variants — this catches it). With `--file`: same log fetching but filtered to a specific file.
 - `bots_excluded` — count of bot PRs filtered out
 - `all_passing_count` — PRs where everything passed
 - `filters` — the resolved `since`/`until`/`limit`/`deep` values actually used
 
-**Note on rates:** both `failure_rate` and `rerun_rate` are relative to the scan window. Always read them alongside the counts (e.g. `3/20 PRs = 15%`) rather than comparing rates across different scans.
-
 ### Step 3 — Test-file overlap analysis (deep mode and file mode)
 
 *Skip this step when neither `--deep` nor `--file` was passed, or when `test_patterns` is empty.*
@@ -402,14 +400,14 @@ Run this once per PR number across all test patterns (deduplicate to avoid redun
 
 ### Patterns Observed (visible failures)
 <For each entry in patterns:>
-- "<check_name>" (<job_type>) failed in <N>/<scanned> PRs (<rate>%)
+- "<check_name>" (<job_type>) failed in <N>/<scanned> PRs
   - PRs: #<n>, #<n>, ...
   - Classify as: ⚠️ Suspected Flaky
 
 ### Test-Level Patterns — only present with --deep or --file (when test_patterns is non-empty)
 <The test_name field is the full Cypress name: describe chain + it() block concatenated. Split it for display using this approach: if the it() description starts with "should", locate the first "should" in the test_name and treat everything from that word onward as the it() description, and everything before as the suite path. If the it() description does NOT start with "should", look up the actual test file (using the `file` field) with Grep to find the exact `it('...')` string that matches the test_name suffix — this gives the correct it() boundary. Never guess the split point when the test doesn't follow the "should…" convention. Format each entry as shown below.>
 <For each entry in test_patterns, incorporating overlap verdict from Step 3:>
-- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs (<rate>%)
+- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs
   - Suite: `<describe_chain>`
   - PRs: #<n>, #<n>, ...
   - Error(s): `<errors[0]>` <and any additional distinct errors>
@@ -423,7 +421,7 @@ No individual test recurred across multiple PRs — different tests failed withi
 
 ### Rerun Patterns (hidden failures) — only present with --deep
 <For each entry in rerun_patterns:>
-- "<check_name>" reran on <N>/<scanned> PRs (<rate>%) — failed then passed on the same commit SHA; not visible in statusCheckRollup
+- "<check_name>" reran on <N>/<scanned> PRs — failed then passed on the same commit SHA; not visible in statusCheckRollup
 
 ### PRs with no failures
 <N> PRs had all checks passing.
@@ -450,7 +448,7 @@ Use this format instead of the standard report when `filters.file_filter` is non
 ### Recurring Tests (same test on multiple PRs)
 <For each entry in test_patterns, run the file-level overlap check from Step 3 using test_patterns[].file:>
 <Split test_name into it() and describe chain as described in the Test-Level Patterns section above.>
-- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs (<rate>%)
+- **it:** `<it_description>` — `<file>` — <N>/<scanned> PRs
   - Suite: `<describe_chain>`
   - PRs: #<n> (author: <author>), #<n> (author: <author>), ...
   - Error(s): `<errors[0]>` <and any additional distinct errors>
diff --git a/packages/gen-ai/.claude/skills/flake-check/scripts/scan_prs.py b/packages/gen-ai/.claude/skills/flake-check/scripts/scan_prs.py
@@ -61,15 +61,13 @@
             {
                 "check_name": "Cypress-Mock-Tests (mcpCatalog, ...)",
                 "failure_count": 3,
-                "failure_rate": 0.15,
                 "pr_numbers": [7190, 7191, 7194]
             }
         ],
         "rerun_patterns": [   # present with --deep or --file (--file implies --deep)
             {
                 "check_name": "Cypress-Mock-Tests (mcpCatalog, ...)",
                 "rerun_count": 4,
-                "rerun_rate": 0.20,
                 "pr_numbers": [7190, 7191, 7194, 7200]
             }
         ],
@@ -78,7 +76,6 @@
                 "test_name": "User can create a pipeline run",
                 "file": "packages/cypress/cypress/tests/mocked/pipelines/pipelineCreateRuns.cy.ts",
                 "failure_count": 3,
-                "failure_rate": 0.15,
                 "pr_numbers": [7200, 7300, 7350],
                 "errors": ["Timed out retrying after 4000ms"]
             }
@@ -88,9 +85,9 @@
 
 import argparse
 import json
+import os
 import re
 import subprocess
-import os
 import sys
 from collections import defaultdict
 from concurrent.futures import ThreadPoolExecutor, as_completed
@@ -425,7 +422,6 @@ def main() -> None:
         {
             "check_name": name,
             "failure_count": len(pr_nums),
-            "failure_rate": round(len(pr_nums) / scanned_count, 2) if scanned_count else 1.0,
             "pr_numbers": sorted(pr_nums, reverse=True),
         }
         for name, pr_nums in sorted(failure_index.items(), key=lambda kv: -len(kv[1]))
@@ -454,12 +450,10 @@ def main() -> None:
     }
 
     if run_deep:
-        total_prs = len(pr_results)
         output["rerun_patterns"] = [
             {
                 "check_name": name,
                 "rerun_count": len(pr_nums),
-                "rerun_rate": round(len(pr_nums) / total_prs, 2) if total_prs else 0.0,
                 "pr_numbers": sorted(pr_nums, reverse=True),
             }
             for name, pr_nums in sorted(rerun_index.items(), key=lambda kv: -len(kv[1]))
@@ -490,7 +484,6 @@ def main() -> None:
                 "test_name": name,
                 "file": data["file"],
                 "failure_count": len(data["pr_numbers"]),
-                "failure_rate": round(len(data["pr_numbers"]) / scanned_count, 2) if scanned_count else 1.0,
                 "pr_numbers": sorted(data["pr_numbers"], reverse=True),
                 "errors": data["errors"],
             }

Original file line number	Diff line number	Diff line change
`@@ -61,15 +61,13 @@`
`61`	`61`	`{`
`62`	`62`	`"check_name": "Cypress-Mock-Tests (mcpCatalog, ...)",`
`63`	`63`	`"failure_count": 3,`
`64`		`- "failure_rate": 0.15,`
`65`	`64`	`"pr_numbers": [7190, 7191, 7194]`
`66`	`65`	`}`
`67`	`66`	`],`
`68`	`67`	`"rerun_patterns": [ # present with --deep or --file (--file implies --deep)`
`69`	`68`	`{`
`70`	`69`	`"check_name": "Cypress-Mock-Tests (mcpCatalog, ...)",`
`71`	`70`	`"rerun_count": 4,`
`72`		`- "rerun_rate": 0.20,`
`73`	`71`	`"pr_numbers": [7190, 7191, 7194, 7200]`
`74`	`72`	`}`
`75`	`73`	`],`
`@@ -78,7 +76,6 @@`
`78`	`76`	`"test_name": "User can create a pipeline run",`
`79`	`77`	`"file": "packages/cypress/cypress/tests/mocked/pipelines/pipelineCreateRuns.cy.ts",`
`80`	`78`	`"failure_count": 3,`
`81`		`- "failure_rate": 0.15,`
`82`	`79`	`"pr_numbers": [7200, 7300, 7350],`
`83`	`80`	`"errors": ["Timed out retrying after 4000ms"]`
`84`	`81`	`}`
`@@ -88,9 +85,9 @@`
`88`	`85`
`89`	`86`	`import argparse`
`90`	`87`	`import json`
	`88`	`+import os`
`91`	`89`	`import re`
`92`	`90`	`import subprocess`
`93`		`-import os`
`94`	`91`	`import sys`
`95`	`92`	`from collections import defaultdict`
`96`	`93`	`from concurrent.futures import ThreadPoolExecutor, as_completed`
`@@ -425,7 +422,6 @@ def main() -> None:`
`425`	`422`	`{`
`426`	`423`	`"check_name": name,`
`427`	`424`	`"failure_count": len(pr_nums),`
`428`		`- "failure_rate": round(len(pr_nums) / scanned_count, 2) if scanned_count else 1.0,`
`429`	`425`	`"pr_numbers": sorted(pr_nums, reverse=True),`
`430`	`426`	`}`
`431`	`427`	`for name, pr_nums in sorted(failure_index.items(), key=lambda kv: -len(kv[1]))`
`@@ -454,12 +450,10 @@ def main() -> None:`
`454`	`450`	`}`
`455`	`451`
`456`	`452`	`if run_deep:`
`457`		`- total_prs = len(pr_results)`
`458`	`453`	`output["rerun_patterns"] = [`
`459`	`454`	`{`
`460`	`455`	`"check_name": name,`
`461`	`456`	`"rerun_count": len(pr_nums),`
`462`		`- "rerun_rate": round(len(pr_nums) / total_prs, 2) if total_prs else 0.0,`
`463`	`457`	`"pr_numbers": sorted(pr_nums, reverse=True),`
`464`	`458`	`}`
`465`	`459`	`for name, pr_nums in sorted(rerun_index.items(), key=lambda kv: -len(kv[1]))`
`@@ -490,7 +484,6 @@ def main() -> None:`
`490`	`484`	`"test_name": name,`
`491`	`485`	`"file": data["file"],`
`492`	`486`	`"failure_count": len(data["pr_numbers"]),`
`493`		`- "failure_rate": round(len(data["pr_numbers"]) / scanned_count, 2) if scanned_count else 1.0,`
`494`	`487`	`"pr_numbers": sorted(data["pr_numbers"], reverse=True),`
`495`	`488`	`"errors": data["errors"],`
`496`	`489`	`}`