Skip to content

opencv-mark #1: scaffold OpenCV baseline companion + 5 sentinel kernels#15

Merged
kiritigowda merged 2 commits into
kg/opencv-baselinefrom
kg/opencv-baseline-01-scaffold
May 16, 2026
Merged

opencv-mark #1: scaffold OpenCV baseline companion + 5 sentinel kernels#15
kiritigowda merged 2 commits into
kg/opencv-baselinefrom
kg/opencv-baseline-01-scaffold

Conversation

@kiritigowda

Copy link
Copy Markdown
Owner

First PR of the kg/opencv-baseline stack. Targets the umbrella branch, not main.
See the umbrella PR description for the full 3-PR roadmap.

Why this exists

openvx-mark answers "which OpenVX implementation is fastest?" It cannot answer the question every team actually asks first when evaluating OpenVX adoption: "is OpenVX faster than the OpenCV code I already have?"

This stack adds an OpenCV-backed companion binary (opencv-mark) so users can compare each OpenVX implementation against the de facto vision baseline. The end state (after PR #2 + #3) is a 4-way pairwise matrix in CI:

                  vs MIVisionX  vs Khronos  vs rustVX  vs OpenCV
MIVisionX             —          (#14)       (#14)      (new!)
Khronos             (#14)          —         (#14)      (new!)
rustVX              (#14)        (#14)         —        (new!)
OpenCV              (new!)       (new!)      (new!)       —

What this PR does (PR #1: scaffold + 5 sentinel kernels)

Architecture

  • New bench_core static library carved out of the existing reporter infra (BenchmarkStats, BenchmarkTimer, BenchmarkReport, SystemInfo). Linked by both `openvx-mark` and the new `opencv-mark` binary so JSON schema, comparison logic, and stats math are guaranteed identical across implementations.
  • `BenchmarkReport` now takes a POD `BenchmarkCatalog` snapshot instead of a live `KernelRegistry&`, decoupling the reporter from any OpenVX-specific kernel-enum probing. `openvx-mark` builds its catalog from `KernelRegistry::snapshot()`; `opencv-mark` builds its catalog from the registered cv:: benchmark list.
  • `benchmark_timer.h` no longer transitively pulls in `<VX/vx.h>`; the `vx_perf_t` graph/node query helpers move to a new `openvx_perf_query.h` header owned solely by `openvx-mark`.
  • Optional CMake target `opencv-mark` is built when OpenCV 4 is found (`-DOPENVX_MARK_BUILD_OPENCV=OFF` to disable). Skipped silently on hosts without OpenCV — existing OpenVX-only build environments are unaffected.

opencv-mark scope (PR1)

  • CLI mirrors openvx-mark — same shell scripts and CI steps drive both binaries with identical flags.
  • Output JSON is byte-compatible with openvx-mark — drops straight into `scripts/compare_reports.py`. `opencv-mark --compare a.json b.json` delegates to the same shared comparator.
  • Mode = "graph" — the (name, mode, resolution) join key in `compare_reports.py` lines up with the typical openvx-mark graph-mode entry. `setup_fn` pre-allocates `cv::Mat` buffers outside the timing budget; only the cv:: kernel call itself is timed, mirroring openvx-mark's policy of timing only `vxProcessGraph`.
  • PSNR + max-diff verification helpers exposed for a future cross-impl tolerance check (PSNR ≥ 30 dB OR max-diff ≤ 5 grey levels per the project's verification policy). PR1 only uses these for self-verification inside each benchmark's `verify_fn`.
  • 5 sentinel kernels across 3 categories with parameter-mapping comments documenting what's apples-to-apples vs documented difference (border modes, integer rounding, etc.):
    • `filters` — Box3x3, Gaussian3x3, Sobel3x3
    • `color` — ColorConvert_RGB2IYUV (swapped from the originally proposed `RGB2Gray` — RGB2Gray isn't a separate OpenVX kernel; RGB2IYUV is, so the JSON join key matches a real openvx-mark entry)
    • `geometric` — WarpAffine

Local verification

Built and ran end-to-end on macOS against OpenCV 4.13.0 + MIVisionX:

```bash
openvx-mark --kernel Box3x3,Gaussian3x3,Sobel3x3,ColorConvert_RGB2IYUV,WarpAffine \
--resolution VGA --output-dir /tmp/openvx
opencv-mark --resolution VGA --output-dir /tmp/opencv
python3 scripts/compare_reports.py /tmp/openvx/benchmark_results.json \
/tmp/opencv/benchmark_results.json \
--output /tmp/cv-vs-vx
```

The cross-comparison renders cleanly with all 5 benchmarks both verified. Representative result on this host:

Kernel MIVisionX OpenCV Speedup (OpenCV/VX)
Box3x3 5895 MP/s 1378 MP/s 0.23x → MIVisionX wins
Gaussian3x3 5658 MP/s 14084 MP/s 2.49x → OpenCV wins
Sobel3x3 2841 MP/s 1797 MP/s 0.63x → MIVisionX wins
WarpAffine 1506 MP/s 549 MP/s 0.36x → MIVisionX wins
ColorConvert_RGB2IYUV 566 MP/s 7622 MP/s 13.47x → OpenCV wins

This is the exact "does adopting OpenVX pay off?" insight the feature is designed to surface — at the per-kernel level, MIVisionX wins 3/5 sentinels but loses badly on color conversion and Gaussian.

Test plan

  • `openvx-mark` (post-refactor) builds clean and runs end-to-end
  • `opencv-mark` builds clean against OpenCV 4.13.0 (Homebrew, macOS)
  • `opencv-mark` runs end-to-end and produces JSON / CSV / Markdown reports
  • All 5 sentinel kernels pass self-verification (`verify_fn` returns true)
  • `scripts/compare_reports.py` ingests both JSONs and produces a rendered cross-comparison Markdown
  • CI on this PR: confirm refactor doesn't break the existing OpenVX build matrix

Not in scope (deferred)

  • Cross-impl tolerance check at runtime — PSNR/max-diff helpers are exposed but not wired into a cross-comparator yet. Planned follow-up.
  • CI integration — adding a `build-opencv` parallel job and the 3 new pairwise comparisons against the OpenVX implementations is PR Add output verification for conformance gating #3 of the stack.
  • Full kernel coverage — the remaining ~30 vision kernels (Median3x3, Erode/Dilate, Magnitude, Phase, Add/Sub/Mul, AbsDiff, MinMaxLoc, Histogram, EqualizeHist, ScaleImage, ChannelExtract/Combine, Threshold, IntegralImage, ConvertDepth, Canny, Harris, FAST, OpticalFlow, Remap, WarpPerspective, GaussianPyramid, HalfScaleGaussian, LaplacianPyramid, NonLinearFilter, MeanStdDev, TableLookup, WeightedAverage, etc.) are PR Add benchmark versioning and MIVisionX CPU CI job #2 of the stack.
  • "Conformance" semantics for OpenCV-side reports — currently `opencv-mark` reports conformance as `PASS (5/5)` because every registered cv:: benchmark runs. This is technically meaningless for OpenCV (it doesn't have a kernel-enum spec to conform to) but harmless. May suppress in PR Add output verification for conformance gating #3 if it confuses downstream readers.

Made with Cursor

First PR of the kg/opencv-baseline stack — adds an OpenCV-backed
companion binary so users can answer "does adopting OpenVX actually
pay off vs the cv:: code I already have?" at the per-kernel level.

Architecture
------------
* New `bench_core` static library carved out of the existing reporter
  infrastructure (BenchmarkStats, BenchmarkTimer, BenchmarkReport,
  SystemInfo). Linked by both openvx-mark and the new opencv-mark
  binary so JSON schema, comparison logic, and stats math are
  guaranteed identical across implementations.
* `BenchmarkReport` now takes a POD `BenchmarkCatalog` snapshot
  instead of a live `KernelRegistry&`, decoupling the reporter from
  any OpenVX-specific kernel-enum probing. openvx-mark continues to
  build its catalog from `KernelRegistry::snapshot()`; opencv-mark
  builds its catalog from the registered cv:: benchmark list.
* `benchmark_timer.h` no longer transitively pulls in <VX/vx.h>; the
  vx_perf_t graph/node query helpers move to a new
  `openvx_perf_query.h` header owned solely by openvx-mark.
* Optional new CMake target `opencv-mark` is built when OpenCV 4 is
  found (-DOPENVX_MARK_BUILD_OPENCV=OFF to disable). Skipped silently
  on hosts without OpenCV so existing OpenVX-only build environments
  are unaffected.

opencv-mark scope (PR1)
-----------------------
* CLI mirrors openvx-mark so the same shell scripts and CI steps
  drive both binaries with identical flags.
* Output JSON is byte-compatible with openvx-mark — drops straight
  into scripts/compare_reports.py for cross-vendor comparison.
* Kernel mode is "graph" so the (name, mode, resolution) join key in
  compare_reports.py lines up with the typical openvx-mark graph-mode
  entry. setup_fn pre-allocates cv::Mat buffers outside the timing
  budget; only the cv:: kernel call itself is timed, mirroring the
  openvx-mark policy of timing only vxProcessGraph.
* PSNR + max-diff verification helpers exposed for a future
  cross-implementation tolerance check (PSNR >= 30dB OR max-diff <= 5
  grey levels per the project's verification policy). PR1 only uses
  these for self-verification inside each benchmark's verify_fn.
* 5 sentinel kernels across 3 categories with parameter-mapping
  comments explaining what's apples-to-apples vs documented difference:
    * filters    — Box3x3, Gaussian3x3, Sobel3x3
    * color      — ColorConvert_RGB2IYUV
    * geometric  — WarpAffine

Local verification
------------------
Built and ran end-to-end on macOS against OpenCV 4.13.0:
  $ openvx-mark --kernel Box3x3,Gaussian3x3,Sobel3x3,ColorConvert_RGB2IYUV,WarpAffine --resolution VGA
  $ opencv-mark --resolution VGA
  $ compare_reports.py openvx.json opencv.json

Cross-comparison renders cleanly with all 5 benchmarks both verified;
representative result on this host: MIVisionX wins Box3x3/Sobel3x3/
WarpAffine, OpenCV wins Gaussian3x3/ColorConvert.

Next in the stack
-----------------
* PR2 (kg/opencv-baseline-02-coverage) — expand from 5 sentinel
  kernels to ~35 covering the full vision feature set.
* PR3 (kg/opencv-baseline-03-ci) — wire opencv-mark into the existing
  parallel CI build matrix and add 3 new pairwise comparisons
  (mivisionx-over-opencv, khronos-over-opencv, rustvx-over-opencv).

Co-authored-by: Cursor <cursoragent@cursor.com>
PR #15 silently passed CI because no job ever built opencv-mark — the
existing matrix only installs OpenVX impls, and CMake's `find_package
(OpenCV 4 QUIET)` no-ops on those runners. So a broken opencv-mark
target would never show up red.

Add `build-opencv` as a 4th parallel Phase-1 job:
  * apt-installs `libopencv-dev`
  * builds the `opencv-mark` target via the parent CMake
  * hard-fails if the binary isn't produced (catches the silent
    `find_package ... QUIET / return()` skip path)
  * runs the same VGA × 5-iter smoke shape as the OpenVX-impl jobs
    so opencv-mark's 5 sentinel kernels are actually exercised on
    every PR
  * uploads the JSON report as `smoke-results-opencv` for the future
    PR3 comparison phase to consume

Drive-by fix: opencv-mark's `--help` was returning exit 1 because
`parseArgs` collapsed both "asked for help" and "parse error" into
`return false`. Under `set -eo pipefail` that kills the CI step. Make
`--help` exit 0 inline (UNIX convention) and reserve `false` for
genuine parse failures.

Updates the architecture comment block to document the four-job
Phase-1 layout and why the OpenCV baseline differs in shape (no
from-source build, no impl tarball — stages JSON directly).

Co-authored-by: Cursor <cursoragent@cursor.com>
@kiritigowda kiritigowda merged commit 8d629fe into kg/opencv-baseline May 16, 2026
5 checks passed
kiritigowda added a commit that referenced this pull request May 16, 2026
* opencv-mark #1: scaffold OpenCV baseline companion + 5 sentinel kernels (#15)

* opencv-mark: scaffold OpenCV baseline companion + 5 sentinel kernels

First PR of the kg/opencv-baseline stack — adds an OpenCV-backed
companion binary so users can answer "does adopting OpenVX actually
pay off vs the cv:: code I already have?" at the per-kernel level.

Architecture
------------
* New `bench_core` static library carved out of the existing reporter
  infrastructure (BenchmarkStats, BenchmarkTimer, BenchmarkReport,
  SystemInfo). Linked by both openvx-mark and the new opencv-mark
  binary so JSON schema, comparison logic, and stats math are
  guaranteed identical across implementations.
* `BenchmarkReport` now takes a POD `BenchmarkCatalog` snapshot
  instead of a live `KernelRegistry&`, decoupling the reporter from
  any OpenVX-specific kernel-enum probing. openvx-mark continues to
  build its catalog from `KernelRegistry::snapshot()`; opencv-mark
  builds its catalog from the registered cv:: benchmark list.
* `benchmark_timer.h` no longer transitively pulls in <VX/vx.h>; the
  vx_perf_t graph/node query helpers move to a new
  `openvx_perf_query.h` header owned solely by openvx-mark.
* Optional new CMake target `opencv-mark` is built when OpenCV 4 is
  found (-DOPENVX_MARK_BUILD_OPENCV=OFF to disable). Skipped silently
  on hosts without OpenCV so existing OpenVX-only build environments
  are unaffected.

opencv-mark scope (PR1)
-----------------------
* CLI mirrors openvx-mark so the same shell scripts and CI steps
  drive both binaries with identical flags.
* Output JSON is byte-compatible with openvx-mark — drops straight
  into scripts/compare_reports.py for cross-vendor comparison.
* Kernel mode is "graph" so the (name, mode, resolution) join key in
  compare_reports.py lines up with the typical openvx-mark graph-mode
  entry. setup_fn pre-allocates cv::Mat buffers outside the timing
  budget; only the cv:: kernel call itself is timed, mirroring the
  openvx-mark policy of timing only vxProcessGraph.
* PSNR + max-diff verification helpers exposed for a future
  cross-implementation tolerance check (PSNR >= 30dB OR max-diff <= 5
  grey levels per the project's verification policy). PR1 only uses
  these for self-verification inside each benchmark's verify_fn.
* 5 sentinel kernels across 3 categories with parameter-mapping
  comments explaining what's apples-to-apples vs documented difference:
    * filters    — Box3x3, Gaussian3x3, Sobel3x3
    * color      — ColorConvert_RGB2IYUV
    * geometric  — WarpAffine

Local verification
------------------
Built and ran end-to-end on macOS against OpenCV 4.13.0:
  $ openvx-mark --kernel Box3x3,Gaussian3x3,Sobel3x3,ColorConvert_RGB2IYUV,WarpAffine --resolution VGA
  $ opencv-mark --resolution VGA
  $ compare_reports.py openvx.json opencv.json

Cross-comparison renders cleanly with all 5 benchmarks both verified;
representative result on this host: MIVisionX wins Box3x3/Sobel3x3/
WarpAffine, OpenCV wins Gaussian3x3/ColorConvert.

Next in the stack
-----------------
* PR2 (kg/opencv-baseline-02-coverage) — expand from 5 sentinel
  kernels to ~35 covering the full vision feature set.
* PR3 (kg/opencv-baseline-03-ci) — wire opencv-mark into the existing
  parallel CI build matrix and add 3 new pairwise comparisons
  (mivisionx-over-opencv, khronos-over-opencv, rustvx-over-opencv).

Co-authored-by: Cursor <cursoragent@cursor.com>

* CI: add opencv-mark smoke job to PR1 (4th parallel build)

PR #15 silently passed CI because no job ever built opencv-mark — the
existing matrix only installs OpenVX impls, and CMake's `find_package
(OpenCV 4 QUIET)` no-ops on those runners. So a broken opencv-mark
target would never show up red.

Add `build-opencv` as a 4th parallel Phase-1 job:
  * apt-installs `libopencv-dev`
  * builds the `opencv-mark` target via the parent CMake
  * hard-fails if the binary isn't produced (catches the silent
    `find_package ... QUIET / return()` skip path)
  * runs the same VGA × 5-iter smoke shape as the OpenVX-impl jobs
    so opencv-mark's 5 sentinel kernels are actually exercised on
    every PR
  * uploads the JSON report as `smoke-results-opencv` for the future
    PR3 comparison phase to consume

Drive-by fix: opencv-mark's `--help` was returning exit 1 because
`parseArgs` collapsed both "asked for help" and "parse error" into
`return false`. Under `set -eo pipefail` that kills the CI step. Make
`--help` exit 0 inline (UNIX convention) and reserve `false` for
genuine parse failures.

Updates the architecture comment block to document the four-job
Phase-1 layout and why the OpenCV baseline differs in shape (no
from-source build, no impl tarball — stages JSON directly).

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* opencv-mark: full vision feature-set coverage (5 -> 41 kernels) (#16)

Second PR of the kg/opencv-baseline stack. Builds on PR #1's scaffold
to expand opencv-mark from 5 sentinel kernels to 41, covering the
full openvx-mark vision feature-set with cv:: equivalents.

Kernel additions by category
----------------------------
* filters    +4: Median3x3, Erode3x3, Dilate3x3, CustomConvolution
* color      +3: ChannelExtract, ChannelCombine, ConvertDepth
* geometric  +4: ScaleImage_Half, ScaleImage_Double, WarpPerspective, Remap
* pixelwise  +8: And, Or, Xor, Not, Add, Subtract, Multiply, AbsDiff
* statistical +5: Histogram, EqualizeHist, MeanStdDev, MinMaxLoc, IntegralImage
* misc       +6: Magnitude, Phase, TableLookup, Threshold_Binary,
                 Threshold_Range, WeightedAverage
* multiscale +3: GaussianPyramid, LaplacianPyramid, HalfScaleGaussian
* feature    +3: CannyEdgeDetector, HarrisCorners, FastCorners

Total: +36 kernels across 5 new files (cv_pixelwise.cpp,
cv_statistical.cpp, cv_misc.cpp, cv_multiscale.cpp, cv_feature.cpp)
and extensions to cv_filters.cpp / cv_color.cpp / cv_geometric.cpp.
Each new benchmark carries an apples-to-apples parameter-mapping
comment (border mode, kernel weights, interpolation, saturation
policy) so a reader of a comparison report can tell what is — and
isn't — directly comparable to the OpenVX side.

OpenCV linkage now requires the features2d module (for cv::FAST).

Test data generator extensions
------------------------------
opencv_test_data.h gains five new generators sized by the new
benchmarks: makeS16 (signed 16-bit input for Magnitude/Phase),
makePerspectiveMatrix (3x3 near-identity homography),
makeRemap (dst-sized CV_32FC1 mapX/mapY pair), makeConvolution3x3
(matching openvx-mark's CustomConvolution weights), and makeLUT
(256-entry CV_8UC1 lookup table).

Documented exclusions
---------------------
Three openvx-mark vision kernels are intentionally NOT mirrored into
opencv-mark; rationale documented in the per-category source headers:
* NonLinearFilter — no clean OpenCV equivalent (OpenCV's
  morphology API doesn't generalise to arbitrary order-statistic
  filters in a single kernel call).
* OpticalFlowPyrLK — needs sparse keypoint input which doesn't fit
  the single-call per-iteration timing model. Could be added as a
  future extension with proper test fixture.
* ColorConvert_RGB2NV12 — OpenCV has no direct cvtColor for the
  forward RGB->NV12 path; emulating it would require a manual U/V
  interleave step that isn't an apples-to-apples cv:: kernel call.

Local verification
------------------
$ opencv-mark --resolution VGA --iterations 30 --warmup 5
  -> Summary: 41 total | 41 passed | 0 skipped | 0 failed
$ openvx-mark --feature-set vision --resolution VGA --iterations 30 --warmup 5
$ scripts/compare_reports.py openvx.json opencv.json --output cmp
  -> Compared 2 implementations across 52 benchmarks
  -> 41 / 41 opencv-mark kernels join an openvx-mark counterpart
  -> 11 openvx-mark vision kernels have no opencv-mark counterpart
     (the 8 multi-node pipelines + the 3 documented exclusions)

Sample of the resulting cross-comparison (MIVisionX vs OpenCV, VGA):
| Kernel              | speedup (OpenCV/VX) |
|---------------------|--------------------:|
| Box3x3              |               0.23x | (MIVisionX 4.3x faster)
| EqualizeHist        |               0.33x | (MIVisionX 3.0x faster)
| Multiply            |               2.46x | (OpenCV 2.5x faster)
| Magnitude           |               1.65x | (OpenCV 1.7x faster)
| HarrisCorners       |               5.01x | (OpenCV 5.0x faster)
| LaplacianPyramid    |             210.03x | (OpenCV 210x faster)

The pattern that emerges — OpenVX wins on tight per-pixel filter
loops, OpenCV wins on multi-pass statistics and feature detection
where its IPP/SIMD backends shine — is the kind of insight this
benchmark suite is designed to surface.

Next in the stack
-----------------
PR #3 (kg/opencv-baseline-03-ci) — wire opencv-mark into the existing
parallel CI build matrix and add 3 new pairwise comparisons
(mivisionx-over-opencv, khronos-over-opencv, rustvx-over-opencv).

Co-authored-by: Cursor <cursoragent@cursor.com>

* CI: wire opencv-mark into Phase 2 — adds 3 OpenVX-vs-OpenCV pairwise reports (#17)

Promotes opencv-mark from "Phase 1 smoke only" (PR1) to a full
participant in the same-runner pairwise comparison phase, alongside
the three OpenVX impls. This is what makes the headline question —
"does adopting OpenVX actually pay off vs the cv:: code I already
have?" — surface as a numbered geomean speedup on every PR's CI
summary, instead of being a manual local exercise.

Phase 1
-------
* `build-opencv` job comment block updated: drops the stale "PR3 will
  extend this" reference, documents that smoke is fast-feedback only
  and that the comparison-grade FHD × 20 iter benchmark now lives in
  Phase 2 alongside the OpenVX impl benches (strict same-runner
  fairness).

Phase 2 (compare job)
---------------------
* Renamed: "Pairwise comparison (MIVisionX, Khronos, rustVX)" →
  "Pairwise comparison (MIVisionX, Khronos, rustVX, OpenCV)".
* `needs:` adds `build-opencv` so a broken opencv-mark smoke gates
  the long comparison run.
* `Install dependencies` adds `libopencv-dev` so opencv-mark can
  re-link on this runner (no impl-tarball staging needed — opencv-
  mark IS the OpenCV-side binary).
* New step: `Build & bench opencv-mark` runs `--feature-set vision
  --resolution FHD --iterations 20 --warmup 5` (same shape as the
  OpenVX impl benches) and writes JSON to
  `build-opencv-bench/results/`.
* Three new `do_compare` invocations:
    * MIVisionX (AMD OpenVX) over OpenCV   — best-tuned OpenVX vs cv::
    * Khronos sample          over OpenCV   — reference OpenVX vs cv::
    * rustVX                   over OpenCV   — Rust OpenVX vs cv::
  Speedup column reads `<OpenVX impl> / OpenCV` — values >1.00x mean
  adopting that OpenVX impl pays off vs writing the equivalent
  directly in OpenCV. Ordered MIVisionX → Khronos → rustVX so the
  table walks from best-case to worst-case OpenVX positioning.
* `benchmark-results` artifact upload now also includes
  `build-opencv-bench/results/` so reviewers can inspect the raw
  OpenCV-side JSON.

Top-of-file architecture comment updated to document the four-job
Phase-1 layout and the six-comparison Phase-2 layout (3 OpenVX-vs-
OpenVX + 3 OpenVX-vs-OpenCV).

Out of scope
------------
* Cross-impl numerical conformance gating (PSNR/max-abs-diff between
  OpenVX and OpenCV outputs of the same kernel) — opencv-mark already
  exposes the verify helpers, but threading them through the report
  schema and the comparison gate is its own follow-up.
* Adding `enhanced_vision` to the OpenCV side — left intentionally
  undone; same Tensor* impl-quirk reasoning as the existing OpenVX
  comparison shape.

Co-authored-by: Cursor <cursoragent@cursor.com>

* CI: organize pairwise comparison summary (TL;DR matrix + collapsed details)

Default-visible step summary length: ~600 lines → ~40 lines (15× shorter).
Full per-kernel detail is still emitted, but collapsed inside <details>
blocks — one click away instead of unconditionally dumped.

Problem
-------
After PR #17 added the 3 OpenVX-vs-OpenCV pairwise comparisons (bringing
the total to 6), the compare-job GitHub Step Summary became unscannable.
Each comparison emitted its own heading + headline-stats table + the
full `scripts/compare_reports.py` output (system info, conformance &
scores, category sub-scores, summary, per-kernel detail) — all six
sections shown unconditionally, ~600 lines total. The headline geomean
that reviewers actually want at-a-glance got buried under repeated
system-info/conformance tables that say the same thing across all six
comparisons (same runner, same hardware).

Solution — three scannable parts, with detail one click away
------------------------------------------------------------
1. TL;DR speedup matrix at the top — `row impl / column impl` geomean
   for every loaded pair of reports. One glance answers "which impl
   beats which, and by how much?" across the full N×N relationship,
   including pairings not explicitly enumerated in the groups below.
   Cells render bold when the row impl wins, italic when it loses, so
   the visual scan works even at small zoom.

2. Two grouped headline tables:
     * "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?"
     * "OpenVX-vs-OpenVX — cross-implementation"
   Each row: candidate / baseline / geomean / median / count / wins /
   losses / best kernel / worst kernel. Six rows total, two compact
   tables — the headline answer for every comparison fits in one
   screen.

3. Per-kernel detail in <details> blocks (collapsed by default). Same
   `compare_reports.py` output as before (system info, conformance,
   category sub-scores, per-kernel table), but with the duplicate
   `# OpenVX Benchmark Comparison` + `**A** vs **B**` header lines
   stripped since the <details><summary> already says them.

Implementation
--------------
New `scripts/ci_pairwise_summary.py` (415 lines, fully documented) —
takes a JSON config describing reports + pair groups + detail dir, and
emits the structured summary to stdout. The CI step redirects it into
$GITHUB_STEP_SUMMARY. Config schema lives in the script docstring.

The CI's `Pairwise comparisons` step is correspondingly simpler — drops
the inline ~90-line do_compare function and the inline Python heredoc,
keeping just a small loop that runs `compare_reports.py` per pair (for
the per-kernel detail .md files) and a single call to the new helper
script. Net effect on the yaml: 133 lines removed, 97 added.

Same orientation as before (`speedup = candidate / baseline`, >1.00x =
candidate faster) so the artifact filenames in `comparisons/` and the
existing `benchmark-comparisons` artifact don't change shape.

Edge cases — same behavior as the old layout:
  * Missing input JSON (impl build failed) → row appears with "—" cells
    and a "no comparable benchmarks ({impl}: ✗)" note in the headline
    table; matrix simply omits that impl's row/column; detail block
    renders a "_Detail file missing_" message.
  * No shared verified benchmarks between two impls → same "—" /
    "no comparable benchmarks" path.

Drive-by: .gitignore adds `__pycache__/` and `*.pyc` now that we have
committable Python scripts that pytest etc. could exercise.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant