openvx-mark is a vendor-agnostic benchmark suite for OpenVX implementations (1.0 through 1.3+). It measures the performance of individual vision kernels, multi-node pipelines, and immediate-mode operations across configurable resolutions, producing composite scores, conformance reports, and detailed analytics.
openvx-mark works with any conformant OpenVX implementation — AMD OpenVX (MIVisionX), Intel OpenVX, Khronos Sample Implementation, or any other vendor's runtime.
- 60 standard OpenVX kernels across vision and enhanced vision feature sets
- Graph mode and immediate mode benchmarking
- Multi-resolution testing — VGA, HD, FHD, 4K, 8K, or custom
- Composite scoring — geometric mean of megapixels/sec (OpenVX Vision Score)
- Conformance checking — verifies all available kernels produce valid results
- Stability gating — CV% threshold with automatic retries for unstable results
- Multi-resolution scaling analysis — measures throughput scaling efficiency across resolutions
- Peak vs sustained performance — compares best-case to typical latency
- Baseline comparison — compare JSON reports across runs or vendors
- Reports — JSON, CSV, and Markdown output with glossary
It is recommended that the OpenVX implementation first passes the Khronos OpenVX Conformance Test Suite before running openvx-mark. Benchmarking results are only meaningful when the underlying implementation is conformant — non-conformant implementations may produce incorrect outputs, which will be flagged by openvx-mark's output verification and excluded from composite scores.
- C++17 compiler
- CMake 3.10+
- An OpenVX implementation with
libopenvxandlibvxulibraries
If your OpenVX implementation is installed in a standard location (/opt/rocm, /usr/local, /usr), CMake will find it automatically:
mkdir build && cd build
cmake ..
cmake --build .mkdir build && cd build
cmake -DROCM_PATH=/opt/rocm ..
cmake --build .Point CMake to your OpenVX headers and libraries:
mkdir build && cd build
cmake -DOPENVX_INCLUDES=/path/to/openvx/include \
-DOPENVX_LIB_DIR=/path/to/openvx/lib ..
cmake --build .mkdir build && cd build
cmake -DOPENVX_INCLUDES=/path/to/OpenVX-sample-impl/include \
-DOPENVX_LIB_DIR=/path/to/OpenVX-sample-impl/build/lib ..
cmake --build ../openvx-mark [OPTIONS]# Default run: graph mode, VGA+FHD+4K, 100 iterations
./openvx-mark
# Quick test run
./openvx-mark --resolution VGA --iterations 10 --warmup 3
# Full benchmark with all feature sets
./openvx-mark --all --iterations 200
# Include immediate-mode benchmarks
./openvx-mark --mode both --resolution FHD| Option | Description | Default |
|---|---|---|
--all |
Run all benchmarks (vision + enhanced_vision) | |
--feature-set SET[,SET,...] |
Feature sets: vision, enhanced_vision, all |
vision |
--category CAT[,CAT,...] |
Filter by category | all |
--kernel NAME[,NAME,...] |
Filter by kernel name | all |
--mode graph|immediate|both |
Execution mode | graph |
--skip-pipelines |
Skip multi-node pipeline benchmarks |
| Option | Description | Default |
|---|---|---|
--resolution RES[,RES,...] |
Preset: VGA, HD, FHD, 4K, 8K |
VGA,FHD,4K |
--width W --height H |
Custom resolution |
| Option | Description | Default |
|---|---|---|
--iterations N |
Measurement iterations per benchmark | 100 |
--warmup N |
Warm-up iterations (not measured) | 10 |
--seed N |
PRNG seed for reproducible test data | 42 |
--stability-threshold N |
CV% threshold for stability warnings | 15 |
--max-retries N |
Max retries for unstable benchmarks (2x iterations each retry) | 0 |
| Option | Description | Default |
|---|---|---|
--output-dir DIR |
Output directory for reports | ./benchmark_results |
--format json,csv,markdown |
Output formats (comma-separated) | all three |
--verbose |
Verbose output with per-benchmark warnings | |
--quiet |
Minimal output (suppress per-benchmark lines) | |
--compare file1.json,file2.json |
Compare two or more JSON reports |
| Category | Kernels |
|---|---|
| Pixelwise | And, Or, Xor, Not, AbsDiff, Add, Subtract, Multiply |
| Filters | Box3x3, Gaussian3x3, Median3x3, Erode3x3, Dilate3x3, Sobel3x3, CustomConvolution, NonLinearFilter |
| Color | ColorConvert (RGB2IYUV, RGB2NV12), ChannelExtract, ChannelCombine, ConvertDepth |
| Geometric | ScaleImage (Half, Double), WarpAffine, WarpPerspective, Remap |
| Statistical | Histogram, EqualizeHist, MeanStdDev, MinMaxLoc, IntegralImage |
| Multi-scale | GaussianPyramid, LaplacianPyramid, HalfScaleGaussian |
| Feature Detection | CannyEdgeDetector, HarrisCorners, FastCorners, OpticalFlowPyrLK |
| Misc | Magnitude, Phase, TableLookup, Threshold (Binary, Range), WeightedAverage |
| Category | Kernels |
|---|---|
| Pixelwise | Min, Max, Copy |
| Extraction | MatchTemplate, LBP, NonMaxSuppression, HOGCells, HOGFeatures, HoughLinesP |
| Tensor | TensorAdd, TensorSub, TensorMul, TensorTranspose, TensorConvertDepth, TensorMatMul, TensorTableLookup |
| Misc | BilateralFilter, Select, ScalarOperation |
| Pipeline | Nodes |
|---|---|
| EdgeDetection | ColorConvert + ChannelExtract + Gaussian3x3 + CannyEdgeDetector |
| SobelMagnitudePhase | Sobel3x3 + Magnitude + Phase |
| MorphologyOpen | Erode3x3 + Dilate3x3 |
| MorphologyClose | Dilate3x3 + Erode3x3 |
| DualFilter | Box3x3 + Median3x3 |
| HistogramEqualize | ColorConvert + ChannelExtract + EqualizeHist |
| HarrisTracker | ColorConvert + ChannelExtract + HarrisCorners |
| ThresholdedEdge | Sobel3x3 + Magnitude + ConvertDepth + Threshold |
=============================================================
Summary: 156 total | 156 passed | 0 skipped | 0 failed
OpenVX Vision Score: 1586.05 MP/s (156 benchmarks)
vision Conformance: PASS (41/41)
vision Top-5 Fastest:
1. Not 26835.8 MP/s (graph, FHD)
2. Threshold_Binary 25550.0 MP/s (graph, VGA)
3. Threshold_Binary 25037.7 MP/s (graph, FHD)
4. Threshold_Range 21545.9 MP/s (graph, FHD)
5. Not 21533.7 MP/s (graph, VGA)
vision Top-5 Slowest:
1. LaplacianPyramid 727.501 ms (graph, 4K)
2. NonLinearFilter 580.589 ms (graph, 4K)
3. LaplacianPyramid 225.209 ms (graph, FHD)
4. FastCorners 191.288 ms (graph, 4K)
5. HarrisTracker 160.251 ms (graph, 4K)
=============================================================
| File | Description |
|---|---|
benchmark_results.json |
Full results with scores, conformance, scaling analysis, per-result timing stats |
benchmark_results.csv |
Tabular data for spreadsheet analysis |
benchmark_results.md |
Human-readable report with tables, top-10 lists, glossary |
- OpenVX Vision Score — Geometric mean of MP/s across all passing graph-mode vision benchmarks
- Enhanced Vision Score — Geometric mean when enhanced_vision benchmarks are included
- Category Sub-Scores — Per-category geometric mean (pixelwise, filters, color, etc.)
Checks whether all available kernels in each feature set produced valid graph-mode results. Reports PASS/FAIL with a list of missing kernels.
Run the benchmark on two different implementations, then compare the JSON reports:
# Run on Vendor A
./openvx-mark --output-dir results_vendor_a
# Run on Vendor B (different machine/implementation)
./openvx-mark --output-dir results_vendor_b
# Compare
./openvx-mark --compare results_vendor_a/benchmark_results.json,results_vendor_b/benchmark_results.jsonThis generates a comparison.md with a side-by-side table showing median latency, throughput, and % change for each benchmark.
A Python comparison script is also provided for more flexibility:
python3 scripts/compare_reports.py results_vendor_a/benchmark_results.json \
results_vendor_b/benchmark_results.json \
--output comparison| Metric | Description |
|---|---|
| Median (ms) | Median wall-clock execution time across all iterations (50th percentile). More stable than mean for benchmarking. |
| Mean (ms) | Arithmetic mean of wall-clock execution times. |
| Min (ms) | Fastest observed execution time (best case). |
| Max (ms) | Slowest observed execution time (worst case). |
| StdDev (ms) | Standard deviation of execution times after IQR outlier removal. |
| P5/P95/P99 (ms) | 5th, 95th, and 99th percentile execution times from the raw (pre-outlier-removal) samples. |
| CV% | Coefficient of Variation — (stddev / mean) * 100. Lower values indicate more stable/repeatable results. |
| MP/s | Megapixels per second — (width * height) / median_time / 1e6. Primary throughput metric. |
| Samples | Number of timing samples after IQR outlier removal. |
| Outliers | Number of samples removed by the IQR (Interquartile Range) method. |
| Peak (ms) | Best-case execution time (min_ns). Represents peak achievable performance. |
| Sustained (ms) | Typical execution time (median_ns). Represents sustained real-world performance. |
| Sustained Ratio | min_ns / median_ns. Values near 1.0 indicate consistent performance; lower values suggest variance from caching, scheduling, or thermal effects. |
| Scaling Efficiency | (MP/s at high res) / (MP/s at low res). 1.0 = perfect scaling; values below 1.0 indicate memory or bandwidth bottlenecks at higher resolutions. |
| Vision Score | Geometric mean of MP/s across all passing graph-mode vision benchmarks. Single-number summary for cross-vendor comparison. |
| Stability Warning | Flagged when CV% exceeds the stability threshold (default: 15%). Indicates the result may not be reliable — increase iterations or reduce system load. |
| Conformance | Whether all available kernels in a feature set produced valid graph-mode results. PASS = all kernels benchmarked successfully. |
openvx-mark/
├── CMakeLists.txt # Build system
├── cmake/
│ └── FindOpenVX.cmake # Vendor-agnostic OpenVX discovery
├── include/
│ ├── benchmark_config.h # Configuration and defaults
│ ├── benchmark_context.h # OpenVX context wrapper
│ ├── benchmark_report.h # Report generation + analytics
│ ├── benchmark_runner.h # Benchmark execution engine
│ ├── benchmark_stats.h # Statistical computation
│ ├── benchmark_timer.h # High-resolution timing
│ ├── kernel_registry.h # OpenVX kernel catalog + availability probing
│ ├── resource_tracker.h # RAII resource management
│ ├── system_info.h # Host system information
│ └── test_data_generator.h # Deterministic test data generation
├── scripts/
│ └── compare_reports.py # Python cross-vendor comparison tool
└── src/
├── main.cpp # CLI entry point
├── benchmark_context.cpp
├── benchmark_runner.cpp # Graph/immediate mode execution + stability gating
├── benchmark_report.cpp # JSON/CSV/Markdown generation + analytics
├── benchmark_stats.cpp # Percentiles, IQR outlier removal
├── benchmark_timer.cpp
├── kernel_registry.cpp # 60 standard kernel definitions
├── system_info.cpp # Cross-platform system info collection
├── test_data_generator.cpp # Random image/tensor/auxiliary object creation
└── benchmarks/
├── node_pixelwise.cpp # And, Or, Xor, Not, AbsDiff, Add, Subtract, Multiply, Min, Max, Copy
├── node_filters.cpp # Box3x3, Gaussian3x3, Median3x3, Erode3x3, Dilate3x3, Sobel3x3, CustomConvolution, NonLinearFilter
├── node_color.cpp # ColorConvert, ChannelExtract, ChannelCombine, ConvertDepth
├── node_geometric.cpp # ScaleImage, WarpAffine, WarpPerspective, Remap
├── node_statistical.cpp # Histogram, EqualizeHist, MeanStdDev, MinMaxLoc, IntegralImage
├── node_multiscale.cpp # GaussianPyramid, LaplacianPyramid, HalfScaleGaussian
├── node_feature.cpp # CannyEdgeDetector, HarrisCorners, FastCorners, OpticalFlowPyrLK
├── node_extraction.cpp # MatchTemplate, LBP, NonMaxSuppression
├── node_tensor.cpp # TensorAdd, TensorSub, TensorMul, TensorTranspose, TensorConvertDepth, TensorTableLookup
├── node_misc.cpp # Magnitude, Phase, TableLookup, Threshold, WeightedAverage, Select
├── immediate_benchmarks.cpp # vxu* immediate-mode variants
├── pipeline_vision.cpp # EdgeDetection, SobelMagnitudePhase, MorphologyOpen/Close, DualFilter
└── pipeline_feature.cpp # HistogramEqualize, HarrisTracker, ThresholdedEdge
This project is licensed under the MIT License. See LICENSE for details.