Skip to content

Commit 09b4b6a

Browse files
CI: organize pairwise comparison summary (TL;DR matrix + collapsed details)
Default-visible step summary length: ~600 lines → ~40 lines (15× shorter). Full per-kernel detail is still emitted, but collapsed inside <details> blocks — one click away instead of unconditionally dumped. Problem ------- After PR #17 added the 3 OpenVX-vs-OpenCV pairwise comparisons (bringing the total to 6), the compare-job GitHub Step Summary became unscannable. Each comparison emitted its own heading + headline-stats table + the full `scripts/compare_reports.py` output (system info, conformance & scores, category sub-scores, summary, per-kernel detail) — all six sections shown unconditionally, ~600 lines total. The headline geomean that reviewers actually want at-a-glance got buried under repeated system-info/conformance tables that say the same thing across all six comparisons (same runner, same hardware). Solution — three scannable parts, with detail one click away ------------------------------------------------------------ 1. TL;DR speedup matrix at the top — `row impl / column impl` geomean for every loaded pair of reports. One glance answers "which impl beats which, and by how much?" across the full N×N relationship, including pairings not explicitly enumerated in the groups below. Cells render bold when the row impl wins, italic when it loses, so the visual scan works even at small zoom. 2. Two grouped headline tables: * "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?" * "OpenVX-vs-OpenVX — cross-implementation" Each row: candidate / baseline / geomean / median / count / wins / losses / best kernel / worst kernel. Six rows total, two compact tables — the headline answer for every comparison fits in one screen. 3. Per-kernel detail in <details> blocks (collapsed by default). Same `compare_reports.py` output as before (system info, conformance, category sub-scores, per-kernel table), but with the duplicate `# OpenVX Benchmark Comparison` + `**A** vs **B**` header lines stripped since the <details><summary> already says them. Implementation -------------- New `scripts/ci_pairwise_summary.py` (415 lines, fully documented) — takes a JSON config describing reports + pair groups + detail dir, and emits the structured summary to stdout. The CI step redirects it into $GITHUB_STEP_SUMMARY. Config schema lives in the script docstring. The CI's `Pairwise comparisons` step is correspondingly simpler — drops the inline ~90-line do_compare function and the inline Python heredoc, keeping just a small loop that runs `compare_reports.py` per pair (for the per-kernel detail .md files) and a single call to the new helper script. Net effect on the yaml: 133 lines removed, 97 added. Same orientation as before (`speedup = candidate / baseline`, >1.00x = candidate faster) so the artifact filenames in `comparisons/` and the existing `benchmark-comparisons` artifact don't change shape. Edge cases — same behavior as the old layout: * Missing input JSON (impl build failed) → row appears with "—" cells and a "no comparable benchmarks ({impl}: ✗)" note in the headline table; matrix simply omits that impl's row/column; detail block renders a "_Detail file missing_" message. * No shared verified benchmarks between two impls → same "—" / "no comparable benchmarks" path. Drive-by: .gitignore adds `__pycache__/` and `*.pyc` now that we have committable Python scripts that pytest etc. could exercise. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent ffae0ec commit 09b4b6a

3 files changed

Lines changed: 514 additions & 133 deletions

File tree

.github/workflows/ci.yml

Lines changed: 97 additions & 133 deletions
Original file line numberDiff line numberDiff line change
@@ -636,154 +636,118 @@ jobs:
636636
# ----- Pairwise comparisons -----
637637
#
638638
# Each comparison is oriented as "<candidate> over <baseline>" so
639-
# the Speedup column reads as `candidate / baseline` (>1.00x =
640-
# candidate is faster). The orientation is deliberate: we want
641-
# MIVisionX-vs-* and rustVX-vs-Khronos to read as "how much faster
642-
# is the more-tuned implementation than the reference", giving an
643-
# at-a-glance headline number that grows when the candidate wins:
639+
# the speedup column reads as `candidate / baseline` (>1.00x =
640+
# candidate is faster). The orientation is deliberate:
644641
#
645-
# * MIVisionX over Khronos sample (AMD over reference)
646-
# * MIVisionX over rustVX (AMD over Rust impl)
647-
# * rustVX over Khronos sample (Rust impl over reference)
642+
# OpenVX-vs-OpenVX trio — "how much faster is the more-tuned
643+
# impl than the reference":
644+
# * MIVisionX over Khronos sample (AMD over reference)
645+
# * MIVisionX over rustVX (AMD over Rust impl)
646+
# * rustVX over Khronos sample (Rust impl over reference)
648647
#
649-
# Mechanically, `compare_reports.py` computes
648+
# OpenVX-vs-OpenCV trio — "does adopting OpenVX pay off vs cv::":
649+
# * MIVisionX over OpenCV
650+
# * Khronos sample over OpenCV
651+
# * rustVX over OpenCV
652+
#
653+
# Mechanically, `scripts/compare_reports.py` computes
650654
# speedup = throughput(arg2) / throughput(arg1)
651655
# so the candidate is passed as the SECOND positional arg.
656+
#
657+
# The step does two things:
658+
# 1. Runs `compare_reports.py` once per pair to produce a
659+
# per-kernel detail .md in comparisons/. These also become
660+
# the `benchmark-comparisons` artifact for downstream tools.
661+
# 2. Invokes `scripts/ci_pairwise_summary.py` once to render
662+
# an organized GitHub Step Summary — TL;DR speedup matrix
663+
# at top, two grouped headline tables, and the per-kernel
664+
# detail tables collapsed inside <details> blocks. See the
665+
# script docstring for the config schema; this used to be a
666+
# ~115-line bash + inline-Python block and rendered ~600
667+
# lines into the summary.
652668
- name: Pairwise comparisons
653669
if: always()
654670
run: |
655671
set -euo pipefail
656672
mkdir -p comparisons
657673
658-
M="build-mivisionx/results/benchmark_results.json"
659-
K="build-khronos/results/benchmark_results.json"
660-
R="build-rustvx/results/benchmark_results.json"
661-
O="build-opencv-bench/results/benchmark_results.json"
662-
663-
do_compare() {
664-
local candidate="$1" baseline="$2"
665-
local path_candidate="$3" path_baseline="$4"
666-
local cand_label="$5" base_label="$6"
667-
local out="comparisons/${candidate}-over-${baseline}"
674+
# Per-impl JSON report paths (parallel arrays keyed by impl id).
675+
IDS=(mivisionx khronos rustvx opencv)
676+
PATHS=(
677+
"build-mivisionx/results/benchmark_results.json"
678+
"build-khronos/results/benchmark_results.json"
679+
"build-rustvx/results/benchmark_results.json"
680+
"build-opencv-bench/results/benchmark_results.json"
681+
)
682+
LABELS=(
683+
"MIVisionX (AMD OpenVX)"
684+
"Khronos sample"
685+
"rustVX"
686+
"OpenCV"
687+
)
668688
669-
{
670-
echo "## ${cand_label} over ${base_label}"
671-
echo ""
672-
} >> "$GITHUB_STEP_SUMMARY"
689+
# The 6 pairs, "<candidate> <baseline>". Order matches the
690+
# rendered summary table order: OpenVX-vs-OpenCV (headline
691+
# question) first, then OpenVX-vs-OpenVX.
692+
PAIRS=(
693+
"mivisionx opencv"
694+
"khronos opencv"
695+
"rustvx opencv"
696+
"mivisionx khronos"
697+
"mivisionx rustvx"
698+
"rustvx khronos"
699+
)
673700
674-
if [ ! -f "$path_candidate" ] || [ ! -f "$path_baseline" ]; then
675-
{
676-
echo "_Skipped: one or both reports missing (${cand_label}: $([ -f "$path_candidate" ] && echo OK || echo MISSING), ${base_label}: $([ -f "$path_baseline" ] && echo OK || echo MISSING))_"
677-
echo ""
678-
echo "---"
679-
echo ""
680-
} >> "$GITHUB_STEP_SUMMARY"
681-
return 0
701+
# Phase 1 — per-kernel detail .md per pair where both inputs
702+
# exist. Missing-input pairs are silently skipped here; the
703+
# summary script renders a friendly "_Detail missing_" note
704+
# for them inside the collapsed <details> block.
705+
path_of() {
706+
for i in "${!IDS[@]}"; do
707+
if [ "${IDS[$i]}" = "$1" ]; then echo "${PATHS[$i]}"; return; fi
708+
done
709+
}
710+
for pair in "${PAIRS[@]}"; do
711+
read -r CAND BASE <<< "$pair"
712+
CAND_PATH=$(path_of "$CAND")
713+
BASE_PATH=$(path_of "$BASE")
714+
OUT="comparisons/${CAND}-over-${BASE}"
715+
if [ -f "$CAND_PATH" ] && [ -f "$BASE_PATH" ]; then
716+
python3 scripts/compare_reports.py "$BASE_PATH" "$CAND_PATH" --output "$OUT"
717+
else
718+
echo "Skipping detail for ${CAND}-over-${BASE}: missing ${CAND_PATH} or ${BASE_PATH}"
682719
fi
720+
done
683721
684-
# Headline geomean — adapted from the perf-gate Python block
685-
# in rustVX's conformance CI. Keys benchmarks by
686-
# (name, mode, resolution), filters to verified-on-both
687-
# entries with positive throughput, then computes geomean,
688-
# median, win/loss counts, and best/worst kernels.
689-
python3 - "$path_candidate" "$path_baseline" "$cand_label" "$base_label" \
690-
>> "$GITHUB_STEP_SUMMARY" <<'PY'
691-
import json, math, sys
692-
693-
cand_path, base_path = sys.argv[1], sys.argv[2]
694-
cand_label, base_label = sys.argv[3], sys.argv[4]
695-
with open(cand_path) as f: cand = json.load(f)
696-
with open(base_path) as f: base = json.load(f)
697-
698-
def by_key(report):
699-
return {(r['name'], r['mode'], r['resolution']): r
700-
for r in report.get('results', [])}
701-
702-
c = by_key(cand)
703-
b = by_key(base)
704-
shared = sorted(set(c) & set(b))
705-
706-
speedups = []
707-
wins, losses = 0, 0
708-
best = (None, 0.0)
709-
worst = (None, math.inf)
710-
711-
for key in shared:
712-
rc, rb = c[key], b[key]
713-
if not (rc.get('verified', True) and rb.get('verified', True)):
714-
continue
715-
mc = rc.get('megapixels_per_sec', 0)
716-
mb = rb.get('megapixels_per_sec', 0)
717-
if mc <= 0 or mb <= 0:
718-
continue
719-
s = mc / mb # >1 means candidate is faster
720-
speedups.append(s)
721-
if s > 1.0: wins += 1
722-
elif s < 1.0: losses += 1
723-
if s > best[1]: best = (key, s)
724-
if s < worst[1]: worst = (key, s)
725-
726-
if not speedups:
727-
print(f'_No verified benchmarks were directly comparable between {cand_label} and {base_label}._')
728-
print()
729-
else:
730-
geomean = math.exp(sum(math.log(s) for s in speedups) / len(speedups))
731-
median = sorted(speedups)[len(speedups) // 2]
732-
print('| Metric | Value |')
733-
print('|:---|---:|')
734-
print(f'| Geomean speedup ({cand_label} / {base_label}) | **{geomean:.2f}x** |')
735-
print(f'| Median speedup ({cand_label} / {base_label}) | {median:.2f}x |')
736-
print(f'| Benchmarks compared | {len(speedups)} |')
737-
print(f'| {cand_label} faster | {wins} |')
738-
print(f'| {base_label} faster | {losses} |')
739-
if best[0]:
740-
bk, bv = best
741-
print(f'| Best {cand_label} speedup | {bv:.2f}x ({bk[0]} / {bk[1]} / {bk[2]}) |')
742-
if worst[0] and worst[1] != math.inf:
743-
wk, wv = worst
744-
print(f'| Worst {cand_label} speedup | {wv:.2f}x ({wk[0]} / {wk[1]} / {wk[2]}) |')
745-
print()
746-
if geomean >= 1.0:
747-
print(f'> **{cand_label}** is **{geomean:.2f}x** faster than **{base_label}** on average (geomean across {len(speedups)} verified benchmarks).')
748-
else:
749-
print(f'> **{cand_label}** is **{1.0/geomean:.2f}x slower** than **{base_label}** on average (geomean across {len(speedups)} verified benchmarks).')
750-
print()
751-
PY
752-
753-
# Detailed per-kernel comparison from compare_reports.py.
754-
# Pass baseline first so the Speedup column reads as
755-
# "candidate / baseline" — same orientation as the headline.
756-
python3 scripts/compare_reports.py "$path_baseline" "$path_candidate" --output "$out"
757-
{
758-
echo "_Detailed per-kernel comparison — Speedup column reads as **${cand_label} / ${base_label}** (>1.00x = ${cand_label} faster)._"
759-
echo ""
760-
cat "${out}.md"
761-
echo ""
762-
echo "---"
763-
echo ""
764-
} >> "$GITHUB_STEP_SUMMARY"
722+
# Phase 2 — render the organized step summary. The config
723+
# below is the only place pair-grouping & intent text lives;
724+
# the helper handles matrix rendering, headline tables, and
725+
# the collapsed <details> blocks.
726+
cat > /tmp/pairwise-config.json <<'JSON'
727+
{
728+
"reports": {
729+
"mivisionx": {"label": "MIVisionX (AMD OpenVX)", "path": "build-mivisionx/results/benchmark_results.json"},
730+
"khronos": {"label": "Khronos sample", "path": "build-khronos/results/benchmark_results.json"},
731+
"rustvx": {"label": "rustVX", "path": "build-rustvx/results/benchmark_results.json"},
732+
"opencv": {"label": "OpenCV", "path": "build-opencv-bench/results/benchmark_results.json"}
733+
},
734+
"groups": [
735+
{
736+
"title": "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?",
737+
"intent": "Speedup reads as `<OpenVX impl> / OpenCV`. Values >1.00x mean adopting that OpenVX impl pays off vs writing the equivalent directly in OpenCV — the headline question this comparison phase exists to answer. Ordered most-tuned (MIVisionX) → reference (Khronos sample) → Rust impl (rustVX) so the table walks the realistic best→worst range of the trade-off.",
738+
"pairs": [["mivisionx", "opencv"], ["khronos", "opencv"], ["rustvx", "opencv"]]
739+
},
740+
{
741+
"title": "OpenVX-vs-OpenVX — cross-implementation",
742+
"intent": "Speedup reads as `<candidate> / <baseline>`. MIVisionX (AMD, most-tuned) compared against both reference impls, then rustVX vs Khronos sample (Rust impl over reference).",
743+
"pairs": [["mivisionx", "khronos"], ["mivisionx", "rustvx"], ["rustvx", "khronos"]]
744+
}
745+
],
746+
"detail_dir": "comparisons"
765747
}
766-
767-
# OpenVX-vs-OpenVX trio (existing): MIVisionX (AMD) over both
768-
# reference impls, then rustVX over Khronos sample (the
769-
# slowest of the three).
770-
do_compare mivisionx khronos "$M" "$K" "MIVisionX (AMD OpenVX)" "Khronos sample"
771-
do_compare mivisionx rustvx "$M" "$R" "MIVisionX (AMD OpenVX)" "rustVX"
772-
do_compare rustvx khronos "$R" "$K" "rustVX" "Khronos sample"
773-
774-
# OpenVX-vs-OpenCV trio (new in this PR). OpenCV is the
775-
# baseline so the speedup column reads as `<OpenVX impl> /
776-
# OpenCV` — values >1.00x mean adopting that OpenVX impl pays
777-
# off vs writing the equivalent in cv:: directly. This is the
778-
# headline question opencv-mark exists to answer.
779-
#
780-
# Ordered MIVisionX → Khronos → rustVX so the table walks
781-
# from the most-tuned OpenVX impl (best case for OpenVX) to
782-
# the reference (worst case) — readers see the realistic
783-
# range of the OpenVX-vs-OpenCV trade-off.
784-
do_compare mivisionx opencv "$M" "$O" "MIVisionX (AMD OpenVX)" "OpenCV"
785-
do_compare khronos opencv "$K" "$O" "Khronos sample" "OpenCV"
786-
do_compare rustvx opencv "$R" "$O" "rustVX" "OpenCV"
748+
JSON
749+
python3 scripts/ci_pairwise_summary.py --config /tmp/pairwise-config.json \
750+
>> "$GITHUB_STEP_SUMMARY"
787751
788752
echo "--- comparison artifacts ---"
789753
ls -la comparisons/ || true

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
11
build/
2+
__pycache__/
3+
*.pyc

0 commit comments

Comments
 (0)