Releases: kiritigowda/openvx-mark
Release list
openvx-mark v1.0.0 — Framework Mark v1
The first major openvx-mark release that benchmarks the OpenVX graph framework itself, not just individual kernels. Adds a new family of framework benchmarks — scenarios that exercise the OpenVX graph runtime (verification, virtual-image fusion, parallel scheduling, async dispatch, per-node attribution) and that no per-kernel benchmark can surface — alongside the existing 60-kernel suite, which is unchanged.
Added — Framework benchmarks (opt-in)
Run with --feature-set framework (only framework scenarios) or --feature-set everything (kernels + framework). Default ./openvx-mark runs are unchanged.
GraphDividend_Box3x3_x4andGraphDividend_MixedFilters— time the same N-node chain three ways (sum of immediatevxu*calls, graph with real intermediates, graph with virtual intermediates) and emitsum_immediate_ms,graph_real_ms,graph_virtual_ms,graph_speedup,virtual_dividend. The headlinegraph_speedup > 1.0is the framework dividend.VerifyChain_Box3x3— sweeps chain depths (configurable via--framework-chain-depths, default1,4,16,64) and reports per-N create / verify / first-process / steady-process timings, plus regression-derivedverify_per_node_ms,verify_intercept_ms, andfirst_process_overhead_ms.ParallelBranches_Box3x3— K = 4 independent Box3x3 nodes sharing one input image, compared against K back-to-backvxuBox3x3immediate calls. Reportsparallelism_speedupandparallelism_efficiency(where 1.0 = perfect K-way parallelism).Async_Single_Box3x3_x4— quantifies the per-call cost ofvxScheduleGraph+vxWaitGraphvsvxProcessGraphon the same graph. Reportsasync_overhead_ratio(lower is better).Async_Concurrent_Box3x3_x2— schedules two independent graphs concurrently and reportsconcurrency_speedup— direct evidence of whether the runtime overlaps independent work.- Per-node
VX_NODE_PERFORMANCEattribution on bothGraphDividend_*chains: emitsnode_count,node_sum_ms,graph_perf_ms, andfusion_ratio(node_sum_ms / graph_perf_ms).≈ 1.0= strict back-to-back,> 1.0= fusion / overlap detected,≈ node_count= the runtime reports graph time per node and isn't attributing per-node performance.
Added — OpenVX Framework Score
A new dimensionless headline number, computed as the equal-weight geometric mean of every graph_speedup, virtual_dividend, parallelism_efficiency, and concurrency_speedup value produced by framework benchmarks. framework_score > 1.0 means the OpenVX graph framework adds aggregate value over a kernel-only baseline. Lower-is-better metrics and the scenario-specific fusion_ratio are intentionally excluded so the score has a single monotonic interpretation. Only emitted when framework benchmarks are run.
Surfaced everywhere the Vision Score appears:
- Terminal summary:
OpenVX Framework Score: <x>x (geomean of <N> framework metrics). - JSON
scores.framework_scoreandscores.framework_metric_count. - Markdown report's Composite Scores table plus a new dedicated Framework Benchmarks section listing every metric per scenario with its unit and direction.
- Both the C++
--comparepath andscripts/compare_reports.pyadd a Framework Score row to Conformance & Scores and a new Framework Metrics Comparison table whose ratio column is direction-aware (so> 1.00always means the second implementation is better).
Added — Plumbing
- New
FrameworkMetricstruct:{name, value, unit, higher_is_better}.BenchmarkResultgains aframework_metricsvector (empty for kernel results — backward-compatible). - New
BenchmarkCase::framework_runcallback: framework benchmarks own their entire timing loop and return a populatedBenchmarkResult. Existing 60-kernel codepath is untouched. - New CLI flag
--framework-chain-depthsforverify_chaindepth sweeps. - New
--feature-setvalues:framework(only) andeverything(kernels + framework). - CI workflow runs framework benchmarks for every vendor (Khronos sample-impl, MIVisionX) in a dedicated step and posts the headline metrics to the GitHub Actions job summary.
Changed
BenchmarkRunner::runAlldispatches toframework_runwhen set, with a pre-check for required kernels (so framework cases skip cleanly on implementations missing Box3x3 etc.).- README adds a Framework Benchmarks section, glossary entries for every framework metric, and a Framework Score entry. Example terminal summary updated.
- JSON schema adds the
scores.framework_score,scores.framework_metric_count, and per-resultframework_metricsarray. Existing kernel results emit an emptyframework_metricsarray. No breaking change for tools that consumed the previous schema.
Notes for implementers
fusion_ratiois implementation-quality-dependent: a value≈ node_count(e.g.4.0on a 4-node chain) usually means the runtime is reporting whole-graph time on every node. Useful cross-vendor signal in its own right; intentionally excluded from the Framework Score because not every conformant runtime populatesVX_NODE_PERFORMANCEcleanly.concurrency_speedup < 1.0at small resolutions is expected and meaningful — it means async dispatch overhead exceeds concurrency gain at that work size.- Pipelined streaming via the optional
vx_khr_pipeliningextension is intentionally out of scope for v1; only standard OpenVX APIs are used.
v2 backlog (separate future PRs)
vxMapImagePatch/vxUnmapImagePatchround-trip cost (host ↔ device tax).- User-kernel dispatch tax via
vxAddUserKernelno-op. - Context lifecycle stress (
vxCreateContext/vxReleaseContext× N). - Determinism under load (single-graph CV% while K other graphs are scheduled).
- NN / extension-gated benchmarks.
See docs/framework-mark-plan.md for the full v1 design rationale.