opencv-mark #1: scaffold OpenCV baseline companion + 5 sentinel kernels#15
Merged
Merged
Conversation
First PR of the kg/opencv-baseline stack — adds an OpenCV-backed
companion binary so users can answer "does adopting OpenVX actually
pay off vs the cv:: code I already have?" at the per-kernel level.
Architecture
------------
* New `bench_core` static library carved out of the existing reporter
infrastructure (BenchmarkStats, BenchmarkTimer, BenchmarkReport,
SystemInfo). Linked by both openvx-mark and the new opencv-mark
binary so JSON schema, comparison logic, and stats math are
guaranteed identical across implementations.
* `BenchmarkReport` now takes a POD `BenchmarkCatalog` snapshot
instead of a live `KernelRegistry&`, decoupling the reporter from
any OpenVX-specific kernel-enum probing. openvx-mark continues to
build its catalog from `KernelRegistry::snapshot()`; opencv-mark
builds its catalog from the registered cv:: benchmark list.
* `benchmark_timer.h` no longer transitively pulls in <VX/vx.h>; the
vx_perf_t graph/node query helpers move to a new
`openvx_perf_query.h` header owned solely by openvx-mark.
* Optional new CMake target `opencv-mark` is built when OpenCV 4 is
found (-DOPENVX_MARK_BUILD_OPENCV=OFF to disable). Skipped silently
on hosts without OpenCV so existing OpenVX-only build environments
are unaffected.
opencv-mark scope (PR1)
-----------------------
* CLI mirrors openvx-mark so the same shell scripts and CI steps
drive both binaries with identical flags.
* Output JSON is byte-compatible with openvx-mark — drops straight
into scripts/compare_reports.py for cross-vendor comparison.
* Kernel mode is "graph" so the (name, mode, resolution) join key in
compare_reports.py lines up with the typical openvx-mark graph-mode
entry. setup_fn pre-allocates cv::Mat buffers outside the timing
budget; only the cv:: kernel call itself is timed, mirroring the
openvx-mark policy of timing only vxProcessGraph.
* PSNR + max-diff verification helpers exposed for a future
cross-implementation tolerance check (PSNR >= 30dB OR max-diff <= 5
grey levels per the project's verification policy). PR1 only uses
these for self-verification inside each benchmark's verify_fn.
* 5 sentinel kernels across 3 categories with parameter-mapping
comments explaining what's apples-to-apples vs documented difference:
* filters — Box3x3, Gaussian3x3, Sobel3x3
* color — ColorConvert_RGB2IYUV
* geometric — WarpAffine
Local verification
------------------
Built and ran end-to-end on macOS against OpenCV 4.13.0:
$ openvx-mark --kernel Box3x3,Gaussian3x3,Sobel3x3,ColorConvert_RGB2IYUV,WarpAffine --resolution VGA
$ opencv-mark --resolution VGA
$ compare_reports.py openvx.json opencv.json
Cross-comparison renders cleanly with all 5 benchmarks both verified;
representative result on this host: MIVisionX wins Box3x3/Sobel3x3/
WarpAffine, OpenCV wins Gaussian3x3/ColorConvert.
Next in the stack
-----------------
* PR2 (kg/opencv-baseline-02-coverage) — expand from 5 sentinel
kernels to ~35 covering the full vision feature set.
* PR3 (kg/opencv-baseline-03-ci) — wire opencv-mark into the existing
parallel CI build matrix and add 3 new pairwise comparisons
(mivisionx-over-opencv, khronos-over-opencv, rustvx-over-opencv).
Co-authored-by: Cursor <cursoragent@cursor.com>
5 tasks
PR #15 silently passed CI because no job ever built opencv-mark — the existing matrix only installs OpenVX impls, and CMake's `find_package (OpenCV 4 QUIET)` no-ops on those runners. So a broken opencv-mark target would never show up red. Add `build-opencv` as a 4th parallel Phase-1 job: * apt-installs `libopencv-dev` * builds the `opencv-mark` target via the parent CMake * hard-fails if the binary isn't produced (catches the silent `find_package ... QUIET / return()` skip path) * runs the same VGA × 5-iter smoke shape as the OpenVX-impl jobs so opencv-mark's 5 sentinel kernels are actually exercised on every PR * uploads the JSON report as `smoke-results-opencv` for the future PR3 comparison phase to consume Drive-by fix: opencv-mark's `--help` was returning exit 1 because `parseArgs` collapsed both "asked for help" and "parse error" into `return false`. Under `set -eo pipefail` that kills the CI step. Make `--help` exit 0 inline (UNIX convention) and reserve `false` for genuine parse failures. Updates the architecture comment block to document the four-job Phase-1 layout and why the OpenCV baseline differs in shape (no from-source build, no impl tarball — stages JSON directly). Co-authored-by: Cursor <cursoragent@cursor.com>
6 tasks
4 tasks
kiritigowda
added a commit
that referenced
this pull request
May 16, 2026
* opencv-mark #1: scaffold OpenCV baseline companion + 5 sentinel kernels (#15) * opencv-mark: scaffold OpenCV baseline companion + 5 sentinel kernels First PR of the kg/opencv-baseline stack — adds an OpenCV-backed companion binary so users can answer "does adopting OpenVX actually pay off vs the cv:: code I already have?" at the per-kernel level. Architecture ------------ * New `bench_core` static library carved out of the existing reporter infrastructure (BenchmarkStats, BenchmarkTimer, BenchmarkReport, SystemInfo). Linked by both openvx-mark and the new opencv-mark binary so JSON schema, comparison logic, and stats math are guaranteed identical across implementations. * `BenchmarkReport` now takes a POD `BenchmarkCatalog` snapshot instead of a live `KernelRegistry&`, decoupling the reporter from any OpenVX-specific kernel-enum probing. openvx-mark continues to build its catalog from `KernelRegistry::snapshot()`; opencv-mark builds its catalog from the registered cv:: benchmark list. * `benchmark_timer.h` no longer transitively pulls in <VX/vx.h>; the vx_perf_t graph/node query helpers move to a new `openvx_perf_query.h` header owned solely by openvx-mark. * Optional new CMake target `opencv-mark` is built when OpenCV 4 is found (-DOPENVX_MARK_BUILD_OPENCV=OFF to disable). Skipped silently on hosts without OpenCV so existing OpenVX-only build environments are unaffected. opencv-mark scope (PR1) ----------------------- * CLI mirrors openvx-mark so the same shell scripts and CI steps drive both binaries with identical flags. * Output JSON is byte-compatible with openvx-mark — drops straight into scripts/compare_reports.py for cross-vendor comparison. * Kernel mode is "graph" so the (name, mode, resolution) join key in compare_reports.py lines up with the typical openvx-mark graph-mode entry. setup_fn pre-allocates cv::Mat buffers outside the timing budget; only the cv:: kernel call itself is timed, mirroring the openvx-mark policy of timing only vxProcessGraph. * PSNR + max-diff verification helpers exposed for a future cross-implementation tolerance check (PSNR >= 30dB OR max-diff <= 5 grey levels per the project's verification policy). PR1 only uses these for self-verification inside each benchmark's verify_fn. * 5 sentinel kernels across 3 categories with parameter-mapping comments explaining what's apples-to-apples vs documented difference: * filters — Box3x3, Gaussian3x3, Sobel3x3 * color — ColorConvert_RGB2IYUV * geometric — WarpAffine Local verification ------------------ Built and ran end-to-end on macOS against OpenCV 4.13.0: $ openvx-mark --kernel Box3x3,Gaussian3x3,Sobel3x3,ColorConvert_RGB2IYUV,WarpAffine --resolution VGA $ opencv-mark --resolution VGA $ compare_reports.py openvx.json opencv.json Cross-comparison renders cleanly with all 5 benchmarks both verified; representative result on this host: MIVisionX wins Box3x3/Sobel3x3/ WarpAffine, OpenCV wins Gaussian3x3/ColorConvert. Next in the stack ----------------- * PR2 (kg/opencv-baseline-02-coverage) — expand from 5 sentinel kernels to ~35 covering the full vision feature set. * PR3 (kg/opencv-baseline-03-ci) — wire opencv-mark into the existing parallel CI build matrix and add 3 new pairwise comparisons (mivisionx-over-opencv, khronos-over-opencv, rustvx-over-opencv). Co-authored-by: Cursor <cursoragent@cursor.com> * CI: add opencv-mark smoke job to PR1 (4th parallel build) PR #15 silently passed CI because no job ever built opencv-mark — the existing matrix only installs OpenVX impls, and CMake's `find_package (OpenCV 4 QUIET)` no-ops on those runners. So a broken opencv-mark target would never show up red. Add `build-opencv` as a 4th parallel Phase-1 job: * apt-installs `libopencv-dev` * builds the `opencv-mark` target via the parent CMake * hard-fails if the binary isn't produced (catches the silent `find_package ... QUIET / return()` skip path) * runs the same VGA × 5-iter smoke shape as the OpenVX-impl jobs so opencv-mark's 5 sentinel kernels are actually exercised on every PR * uploads the JSON report as `smoke-results-opencv` for the future PR3 comparison phase to consume Drive-by fix: opencv-mark's `--help` was returning exit 1 because `parseArgs` collapsed both "asked for help" and "parse error" into `return false`. Under `set -eo pipefail` that kills the CI step. Make `--help` exit 0 inline (UNIX convention) and reserve `false` for genuine parse failures. Updates the architecture comment block to document the four-job Phase-1 layout and why the OpenCV baseline differs in shape (no from-source build, no impl tarball — stages JSON directly). Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * opencv-mark: full vision feature-set coverage (5 -> 41 kernels) (#16) Second PR of the kg/opencv-baseline stack. Builds on PR #1's scaffold to expand opencv-mark from 5 sentinel kernels to 41, covering the full openvx-mark vision feature-set with cv:: equivalents. Kernel additions by category ---------------------------- * filters +4: Median3x3, Erode3x3, Dilate3x3, CustomConvolution * color +3: ChannelExtract, ChannelCombine, ConvertDepth * geometric +4: ScaleImage_Half, ScaleImage_Double, WarpPerspective, Remap * pixelwise +8: And, Or, Xor, Not, Add, Subtract, Multiply, AbsDiff * statistical +5: Histogram, EqualizeHist, MeanStdDev, MinMaxLoc, IntegralImage * misc +6: Magnitude, Phase, TableLookup, Threshold_Binary, Threshold_Range, WeightedAverage * multiscale +3: GaussianPyramid, LaplacianPyramid, HalfScaleGaussian * feature +3: CannyEdgeDetector, HarrisCorners, FastCorners Total: +36 kernels across 5 new files (cv_pixelwise.cpp, cv_statistical.cpp, cv_misc.cpp, cv_multiscale.cpp, cv_feature.cpp) and extensions to cv_filters.cpp / cv_color.cpp / cv_geometric.cpp. Each new benchmark carries an apples-to-apples parameter-mapping comment (border mode, kernel weights, interpolation, saturation policy) so a reader of a comparison report can tell what is — and isn't — directly comparable to the OpenVX side. OpenCV linkage now requires the features2d module (for cv::FAST). Test data generator extensions ------------------------------ opencv_test_data.h gains five new generators sized by the new benchmarks: makeS16 (signed 16-bit input for Magnitude/Phase), makePerspectiveMatrix (3x3 near-identity homography), makeRemap (dst-sized CV_32FC1 mapX/mapY pair), makeConvolution3x3 (matching openvx-mark's CustomConvolution weights), and makeLUT (256-entry CV_8UC1 lookup table). Documented exclusions --------------------- Three openvx-mark vision kernels are intentionally NOT mirrored into opencv-mark; rationale documented in the per-category source headers: * NonLinearFilter — no clean OpenCV equivalent (OpenCV's morphology API doesn't generalise to arbitrary order-statistic filters in a single kernel call). * OpticalFlowPyrLK — needs sparse keypoint input which doesn't fit the single-call per-iteration timing model. Could be added as a future extension with proper test fixture. * ColorConvert_RGB2NV12 — OpenCV has no direct cvtColor for the forward RGB->NV12 path; emulating it would require a manual U/V interleave step that isn't an apples-to-apples cv:: kernel call. Local verification ------------------ $ opencv-mark --resolution VGA --iterations 30 --warmup 5 -> Summary: 41 total | 41 passed | 0 skipped | 0 failed $ openvx-mark --feature-set vision --resolution VGA --iterations 30 --warmup 5 $ scripts/compare_reports.py openvx.json opencv.json --output cmp -> Compared 2 implementations across 52 benchmarks -> 41 / 41 opencv-mark kernels join an openvx-mark counterpart -> 11 openvx-mark vision kernels have no opencv-mark counterpart (the 8 multi-node pipelines + the 3 documented exclusions) Sample of the resulting cross-comparison (MIVisionX vs OpenCV, VGA): | Kernel | speedup (OpenCV/VX) | |---------------------|--------------------:| | Box3x3 | 0.23x | (MIVisionX 4.3x faster) | EqualizeHist | 0.33x | (MIVisionX 3.0x faster) | Multiply | 2.46x | (OpenCV 2.5x faster) | Magnitude | 1.65x | (OpenCV 1.7x faster) | HarrisCorners | 5.01x | (OpenCV 5.0x faster) | LaplacianPyramid | 210.03x | (OpenCV 210x faster) The pattern that emerges — OpenVX wins on tight per-pixel filter loops, OpenCV wins on multi-pass statistics and feature detection where its IPP/SIMD backends shine — is the kind of insight this benchmark suite is designed to surface. Next in the stack ----------------- PR #3 (kg/opencv-baseline-03-ci) — wire opencv-mark into the existing parallel CI build matrix and add 3 new pairwise comparisons (mivisionx-over-opencv, khronos-over-opencv, rustvx-over-opencv). Co-authored-by: Cursor <cursoragent@cursor.com> * CI: wire opencv-mark into Phase 2 — adds 3 OpenVX-vs-OpenCV pairwise reports (#17) Promotes opencv-mark from "Phase 1 smoke only" (PR1) to a full participant in the same-runner pairwise comparison phase, alongside the three OpenVX impls. This is what makes the headline question — "does adopting OpenVX actually pay off vs the cv:: code I already have?" — surface as a numbered geomean speedup on every PR's CI summary, instead of being a manual local exercise. Phase 1 ------- * `build-opencv` job comment block updated: drops the stale "PR3 will extend this" reference, documents that smoke is fast-feedback only and that the comparison-grade FHD × 20 iter benchmark now lives in Phase 2 alongside the OpenVX impl benches (strict same-runner fairness). Phase 2 (compare job) --------------------- * Renamed: "Pairwise comparison (MIVisionX, Khronos, rustVX)" → "Pairwise comparison (MIVisionX, Khronos, rustVX, OpenCV)". * `needs:` adds `build-opencv` so a broken opencv-mark smoke gates the long comparison run. * `Install dependencies` adds `libopencv-dev` so opencv-mark can re-link on this runner (no impl-tarball staging needed — opencv- mark IS the OpenCV-side binary). * New step: `Build & bench opencv-mark` runs `--feature-set vision --resolution FHD --iterations 20 --warmup 5` (same shape as the OpenVX impl benches) and writes JSON to `build-opencv-bench/results/`. * Three new `do_compare` invocations: * MIVisionX (AMD OpenVX) over OpenCV — best-tuned OpenVX vs cv:: * Khronos sample over OpenCV — reference OpenVX vs cv:: * rustVX over OpenCV — Rust OpenVX vs cv:: Speedup column reads `<OpenVX impl> / OpenCV` — values >1.00x mean adopting that OpenVX impl pays off vs writing the equivalent directly in OpenCV. Ordered MIVisionX → Khronos → rustVX so the table walks from best-case to worst-case OpenVX positioning. * `benchmark-results` artifact upload now also includes `build-opencv-bench/results/` so reviewers can inspect the raw OpenCV-side JSON. Top-of-file architecture comment updated to document the four-job Phase-1 layout and the six-comparison Phase-2 layout (3 OpenVX-vs- OpenVX + 3 OpenVX-vs-OpenCV). Out of scope ------------ * Cross-impl numerical conformance gating (PSNR/max-abs-diff between OpenVX and OpenCV outputs of the same kernel) — opencv-mark already exposes the verify helpers, but threading them through the report schema and the comparison gate is its own follow-up. * Adding `enhanced_vision` to the OpenCV side — left intentionally undone; same Tensor* impl-quirk reasoning as the existing OpenVX comparison shape. Co-authored-by: Cursor <cursoragent@cursor.com> * CI: organize pairwise comparison summary (TL;DR matrix + collapsed details) Default-visible step summary length: ~600 lines → ~40 lines (15× shorter). Full per-kernel detail is still emitted, but collapsed inside <details> blocks — one click away instead of unconditionally dumped. Problem ------- After PR #17 added the 3 OpenVX-vs-OpenCV pairwise comparisons (bringing the total to 6), the compare-job GitHub Step Summary became unscannable. Each comparison emitted its own heading + headline-stats table + the full `scripts/compare_reports.py` output (system info, conformance & scores, category sub-scores, summary, per-kernel detail) — all six sections shown unconditionally, ~600 lines total. The headline geomean that reviewers actually want at-a-glance got buried under repeated system-info/conformance tables that say the same thing across all six comparisons (same runner, same hardware). Solution — three scannable parts, with detail one click away ------------------------------------------------------------ 1. TL;DR speedup matrix at the top — `row impl / column impl` geomean for every loaded pair of reports. One glance answers "which impl beats which, and by how much?" across the full N×N relationship, including pairings not explicitly enumerated in the groups below. Cells render bold when the row impl wins, italic when it loses, so the visual scan works even at small zoom. 2. Two grouped headline tables: * "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?" * "OpenVX-vs-OpenVX — cross-implementation" Each row: candidate / baseline / geomean / median / count / wins / losses / best kernel / worst kernel. Six rows total, two compact tables — the headline answer for every comparison fits in one screen. 3. Per-kernel detail in <details> blocks (collapsed by default). Same `compare_reports.py` output as before (system info, conformance, category sub-scores, per-kernel table), but with the duplicate `# OpenVX Benchmark Comparison` + `**A** vs **B**` header lines stripped since the <details><summary> already says them. Implementation -------------- New `scripts/ci_pairwise_summary.py` (415 lines, fully documented) — takes a JSON config describing reports + pair groups + detail dir, and emits the structured summary to stdout. The CI step redirects it into $GITHUB_STEP_SUMMARY. Config schema lives in the script docstring. The CI's `Pairwise comparisons` step is correspondingly simpler — drops the inline ~90-line do_compare function and the inline Python heredoc, keeping just a small loop that runs `compare_reports.py` per pair (for the per-kernel detail .md files) and a single call to the new helper script. Net effect on the yaml: 133 lines removed, 97 added. Same orientation as before (`speedup = candidate / baseline`, >1.00x = candidate faster) so the artifact filenames in `comparisons/` and the existing `benchmark-comparisons` artifact don't change shape. Edge cases — same behavior as the old layout: * Missing input JSON (impl build failed) → row appears with "—" cells and a "no comparable benchmarks ({impl}: ✗)" note in the headline table; matrix simply omits that impl's row/column; detail block renders a "_Detail file missing_" message. * No shared verified benchmarks between two impls → same "—" / "no comparable benchmarks" path. Drive-by: .gitignore adds `__pycache__/` and `*.pyc` now that we have committable Python scripts that pytest etc. could exercise. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this exists
openvx-mark answers "which OpenVX implementation is fastest?" It cannot answer the question every team actually asks first when evaluating OpenVX adoption: "is OpenVX faster than the OpenCV code I already have?"
This stack adds an OpenCV-backed companion binary (
opencv-mark) so users can compare each OpenVX implementation against the de facto vision baseline. The end state (after PR #2 + #3) is a 4-way pairwise matrix in CI:What this PR does (PR #1: scaffold + 5 sentinel kernels)
Architecture
bench_corestatic library carved out of the existing reporter infra (BenchmarkStats, BenchmarkTimer, BenchmarkReport, SystemInfo). Linked by both `openvx-mark` and the new `opencv-mark` binary so JSON schema, comparison logic, and stats math are guaranteed identical across implementations.opencv-mark scope (PR1)
Local verification
Built and ran end-to-end on macOS against OpenCV 4.13.0 + MIVisionX:
```bash
openvx-mark --kernel Box3x3,Gaussian3x3,Sobel3x3,ColorConvert_RGB2IYUV,WarpAffine \
--resolution VGA --output-dir /tmp/openvx
opencv-mark --resolution VGA --output-dir /tmp/opencv
python3 scripts/compare_reports.py /tmp/openvx/benchmark_results.json \
/tmp/opencv/benchmark_results.json \
--output /tmp/cv-vs-vx
```
The cross-comparison renders cleanly with all 5 benchmarks both verified. Representative result on this host:
This is the exact "does adopting OpenVX pay off?" insight the feature is designed to surface — at the per-kernel level, MIVisionX wins 3/5 sentinels but loses badly on color conversion and Gaussian.
Test plan
Not in scope (deferred)
Made with Cursor