You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CI: organize pairwise comparison summary (TL;DR matrix + collapsed details)
Default-visible step summary length: ~600 lines → ~40 lines (15× shorter).
Full per-kernel detail is still emitted, but collapsed inside <details>
blocks — one click away instead of unconditionally dumped.
Problem
-------
After PR #17 added the 3 OpenVX-vs-OpenCV pairwise comparisons (bringing
the total to 6), the compare-job GitHub Step Summary became unscannable.
Each comparison emitted its own heading + headline-stats table + the
full `scripts/compare_reports.py` output (system info, conformance &
scores, category sub-scores, summary, per-kernel detail) — all six
sections shown unconditionally, ~600 lines total. The headline geomean
that reviewers actually want at-a-glance got buried under repeated
system-info/conformance tables that say the same thing across all six
comparisons (same runner, same hardware).
Solution — three scannable parts, with detail one click away
------------------------------------------------------------
1. TL;DR speedup matrix at the top — `row impl / column impl` geomean
for every loaded pair of reports. One glance answers "which impl
beats which, and by how much?" across the full N×N relationship,
including pairings not explicitly enumerated in the groups below.
Cells render bold when the row impl wins, italic when it loses, so
the visual scan works even at small zoom.
2. Two grouped headline tables:
* "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?"
* "OpenVX-vs-OpenVX — cross-implementation"
Each row: candidate / baseline / geomean / median / count / wins /
losses / best kernel / worst kernel. Six rows total, two compact
tables — the headline answer for every comparison fits in one
screen.
3. Per-kernel detail in <details> blocks (collapsed by default). Same
`compare_reports.py` output as before (system info, conformance,
category sub-scores, per-kernel table), but with the duplicate
`# OpenVX Benchmark Comparison` + `**A** vs **B**` header lines
stripped since the <details><summary> already says them.
Implementation
--------------
New `scripts/ci_pairwise_summary.py` (415 lines, fully documented) —
takes a JSON config describing reports + pair groups + detail dir, and
emits the structured summary to stdout. The CI step redirects it into
$GITHUB_STEP_SUMMARY. Config schema lives in the script docstring.
The CI's `Pairwise comparisons` step is correspondingly simpler — drops
the inline ~90-line do_compare function and the inline Python heredoc,
keeping just a small loop that runs `compare_reports.py` per pair (for
the per-kernel detail .md files) and a single call to the new helper
script. Net effect on the yaml: 133 lines removed, 97 added.
Same orientation as before (`speedup = candidate / baseline`, >1.00x =
candidate faster) so the artifact filenames in `comparisons/` and the
existing `benchmark-comparisons` artifact don't change shape.
Edge cases — same behavior as the old layout:
* Missing input JSON (impl build failed) → row appears with "—" cells
and a "no comparable benchmarks ({impl}: ✗)" note in the headline
table; matrix simply omits that impl's row/column; detail block
renders a "_Detail file missing_" message.
* No shared verified benchmarks between two impls → same "—" /
"no comparable benchmarks" path.
Drive-by: .gitignore adds `__pycache__/` and `*.pyc` now that we have
committable Python scripts that pytest etc. could exercise.
Co-authored-by: Cursor <cursoragent@cursor.com>
"title": "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?",
737
+
"intent": "Speedup reads as `<OpenVX impl> / OpenCV`. Values >1.00x mean adopting that OpenVX impl pays off vs writing the equivalent directly in OpenCV — the headline question this comparison phase exists to answer. Ordered most-tuned (MIVisionX) → reference (Khronos sample) → Rust impl (rustVX) so the table walks the realistic best→worst range of the trade-off.",
"intent": "Speedup reads as `<candidate> / <baseline>`. MIVisionX (AMD, most-tuned) compared against both reference impls, then rustVX vs Khronos sample (Rust impl over reference).",
0 commit comments