@@ -917,32 +917,12 @@ jobs:
917917 exit 1
918918 fi
919919
920- # MinMaxLoc / Not / Xor are skipped from the per-kernel gate as a
921- # documented code-alignment workaround. Empirical finding on this PR:
922- # adding ~50 KB of additive UDO code at the back of `.text` in
923- # libopenvx_ffi.so shifts these three kernels' hot-loop entry
924- # points onto less-favorable cache-line alignments, producing
925- # 0.77x / 0.86x / 0.89x ratios that are bit-reproducible across
926- # reruns on different EPYC VMs. The same kernels showed 0.999x /
927- # 0.995x / 1.001x on PR #25 (also a non-kernel-touching change)
928- # under the same methodology, confirming the noise floor is
929- # sub-1% when layout is incidentally favorable — i.e. the
930- # variance source is purely binary-layout, not algorithm.
931- # The geomean gate (0.97x) and all other kernel-floor checks
932- # still apply, so a real regression would still trip the gate.
933- # TODO(perf-gate-layout): permanent fix is to pin function
934- # alignment in the cdylib build (e.g.
935- # `-Cllvm-args=-align-all-functions=6`) so additive PRs can't
936- # perturb downstream kernel layout. Tracked in a follow-up issue.
937920 python3 ${{ github.workspace }}/.github/scripts/perf_gate.py \
938921 "$MAIN" "$PR" \
939922 --geomean-floor 0.97 \
940923 --kernel-floor 0.90 \
941924 --warn-floor 0.95 \
942925 --max-cv 5.0 \
943- --skip-name MinMaxLoc \
944- --skip-name Not \
945- --skip-name Xor \
946926 --summary-out "$GITHUB_STEP_SUMMARY"
947927
948928 - name : Upload PR rustVX benchmark results (perf-gate)
0 commit comments