Commit 81c4393
ci(perf-gate): loosen per-kernel thresholds to absorb between-run drift
The first real-CI run of the perf gate (CI run 25614982597 on this
PR's own no-op change) tripped a hard fail on `Magnitude` at 0.840x
and soft-warned on five other kernels in [0.913, 0.965], despite
this PR not changing any kernel code. The PR-side and main-side
binaries should be byte-identical (same source, same auto-detected
SIMD features, same `-C target-cpu=x86-64-v3`), so the only
explanation is between-run noise on the same VM.
Empirical observation: between-run drift on otherwise-identical
binaries on the same runner VM hits ~10-15% per kernel, which is
considerably higher than the within-run CV% the bench reports
(typically <1%). Cache state across the two consecutive
`./openvx-mark` process launches, thermal headroom, and VM-host
neighbour load are the usual culprits. The within-run CV% filter
(`--max-cv 5.0`) doesn't catch this because it only inspects
samples within a single bench process.
Recalibration:
--kernel-floor 0.85 -> 0.75
Per-kernel hard fail now requires >25% regression. Generous
enough to absorb the worst between-run drift we've observed
(the 16% Magnitude blip on the failed CI run sits comfortably
above the new floor).
--warn-floor 0.97 -> 0.90
Soft-warn band moves from "any kernel slower than 3%" to
"individual kernels in [-25%, -10%)". Below 10% is treated as
noise and not flagged.
--geomean-floor 0.97 (unchanged)
Aggregate move > 3% across 50+ verified kernels stays the
primary gate signal. That magnitude of aggregate drift is
essentially impossible to fake with single-kernel noise: it
requires a real software-side regression that touches the hot
path. Keeping this strict.
Self-tests on the four reference input pairs (PR12 vs pre-PR12
main, reversed, identity, same-side) still behave correctly with
the new thresholds: PASS with verdict 1.375x on the real perf wins,
FAIL with verdict 0.727x and 7 hard-failed kernels on the simulated
regression, PASS with 1.000x on the identity pair. Applying the new
thresholds to the offending CI run's data turns its 1 hard-fail /
5 soft-warn output into the PASS verdict it should have had on a
no-op PR.
Co-authored-by: Cursor <cursoragent@cursor.com>1 parent 068ea31 commit 81c4393
2 files changed
Lines changed: 45 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
19 | 33 | | |
20 | 34 | | |
21 | 35 | | |
| |||
284 | 298 | | |
285 | 299 | | |
286 | 300 | | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
291 | 307 | | |
292 | 308 | | |
293 | 309 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
846 | 846 | | |
847 | 847 | | |
848 | 848 | | |
849 | | - | |
850 | | - | |
851 | | - | |
852 | | - | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
853 | 871 | | |
854 | 872 | | |
855 | 873 | | |
| |||
892 | 910 | | |
893 | 911 | | |
894 | 912 | | |
895 | | - | |
896 | | - | |
| 913 | + | |
| 914 | + | |
897 | 915 | | |
898 | 916 | | |
0 commit comments