Framework Mark Plan

Status: v1 shipped. All four v1 scenarios plus the OpenVX Framework Score and the first v2 backlog item (per-node VX_NODE_PERFORMANCE attribution) are merged. See CHANGELOG.md for the release summary. This document is preserved as the design rationale and tracks remaining v2 backlog items.

A concrete plan for adding a framework benchmark suite to openvx-mark that measures what only a graph framework can do — graph construction/verification cost, the "graph dividend" vs immediate mode, virtual-image savings, parallel/heterogeneous scheduling, async/streaming throughput, per-node attribution, user-kernel dispatch tax, and lifecycle costs.

This complements (not replaces) the existing per-kernel suite.

v1 Delivery Summary

PR	Slice	Status	Key artifact
#5	Plumbing — `FrameworkMetric`, `framework_run`, `--feature-set framework`	✅ merged	`include/benchmark_stats.h`, `include/benchmark_runner.h`
#6	`graph_dividend` (Box3x3×4 + MixedFilters chains)	✅ merged	`src/benchmarks/framework_benchmarks.cpp`
#7	`verify_chain` (depth sweep + linear regression)	✅ merged	`--framework-chain-depths` flag
#8	`parallel_branches` (K = 4 independent Box3x3)	✅ merged	`parallelism_efficiency` metric
#9	`async_streaming` (single-graph overhead + concurrent multi-graph)	✅ merged	`concurrency_speedup` metric
#10	OpenVX Framework Score + cross-vendor comparison	✅ merged	`framework_score` in JSON / Markdown / terminal / comparison
#11	Per-node `VX_NODE_PERFORMANCE` attribution (first v2 item)	✅ merged	`node_count`, `node_sum_ms`, `graph_perf_ms`, `fusion_ratio`
#12	Rollup → `kg/framework-mark`	✅ merged	(no new code)

1. Goal & Non-goals

Goal. Add a framework benchmark suite that exercises the OpenVX graph runtime itself — the orchestration layer that differentiates OpenVX from a kernel library. The current suite measures vxProcessGraph of single-node graphs and short pipelines, which captures kernel performance but not framework value.

Non-goals.

Not a replacement for the OpenVX-CTS conformance suite.
Not a cross-framework benchmark (OpenVX vs OpenCV vs CUDA). That is a separate project.
v1 will not require any vendor extensions. Anything that needs an extension (pipelining, NN, targets) is gated and clearly labeled.

2. Conceptual unit: "framework benchmark"

A framework benchmark is not "single kernel @ resolution → MP/s". It is a scenario that produces one or more named scalar metrics.

Scenario	Reported metric(s)	Unit
`verify_chain` (chain of N Box3x3 nodes, N ∈ {1,4,16,64})	verify time, ms/node slope, first-process overhead	ms, ms/node
`graph_dividend` (4-node chain)	sum(immediate), graph latency, speedup, per-node overhead	ms, × ratio
`virtual_intermediate`	real-buffer time, virtual-buffer time, dividend	× ratio
`parallel_branches`	serial baseline, scheduled time, parallelism efficiency	× ratio
`async_streaming`	sync latency, async sustained throughput	ms, FPS
`user_kernel_overhead`	per-call dispatch cost above a no-op host function	ns/call
`context_lifecycle`	create+release time, build+teardown @ N	ms

A framework benchmark therefore needs to emit a small set of typed metrics rather than a single MP/s value. That is the only real model change; everything else is additive.

3. v1 Scope

The four scenarios with the highest "value per line of code" and the cleanest cross-vendor story:

verify_chain — graph build + verify cost vs N.
graph_dividend — N-node chain timed three ways: sum of immediate vxu* calls, graph with real intermediates, graph with virtual intermediates.
parallel_branches — DAG with K independent branches feeding a join; default scheduler vs all-pinned-to-one-target (when targets are exposed via extension; otherwise just default + serial reference).
async_streaming — sync vxProcessGraph loop vs vxScheduleGraph + vxWaitGraph loop (and pipelined queue if the impl supports vx_khr_pipelining).

Each runs across the same --resolution set as kernels, so results scale with image size.

4. v2 Backlog (later, separate PRs)

~~Per-node VX_NODE_PERFORMANCE attribution (infer fusion when sum(node_perf) > graph_perf).~~ — landed in PR #7 as node_count / node_sum_ms / graph_perf_ms / fusion_ratio on graph_dividend results.
vxMapImagePatch / vxUnmapImagePatch round-trip cost (host↔device tax).
User-kernel dispatch tax via vxAddUserKernel no-op.
Context lifecycle stress (vxCreateContext / vxReleaseContext × N; same graph built/torn down × N).
Determinism under load (single-graph CV% while K other graphs are scheduled).
NN / extension-gated benchmarks.

5. Data-model changes

include/benchmark_stats.h — extend BenchmarkResult (additive, won't break existing JSON/CSV/Markdown consumers):

struct FrameworkMetric {
    std::string name;     // "verify_ms", "graph_speedup", "parallel_efficiency", ...
    double value;
    std::string unit;     // "ms", "ms/node", "x", "ns/call", "FPS"
    bool higher_is_better;
};

struct BenchmarkResult {
    // ... existing fields ...

    // For framework benchmarks: zero or more named scalars.
    // For kernel benchmarks: stays empty.
    std::vector<FrameworkMetric> framework_metrics;

    // For framework benchmarks the existing megapixels_per_sec/wall_clock
    // are interpreted as the "primary" timing if applicable, else 0.
};

Pre-existing kernel results emit an empty framework_metrics array. No schema break.

6. Runner changes

Today BenchmarkRunner::runGraphMode and runImmediateMode are hard-coded to "build graph, warmup, time vxProcessGraph × N." Framework benchmarks need their own execution loop because they may run multiple graphs, time graph construction, or compare two execution modes inside one case.

Cleanest extension to include/benchmark_runner.h:

struct BenchmarkCase {
    // ... existing fields ...

    // Optional: for framework benchmarks. If set, this is called instead of
    // graph_setup / immediate_func, and is fully responsible for timing.
    using FrameworkRunFn = std::function<BenchmarkResult(
        vx_context ctx, const Resolution& res,
        const BenchmarkConfig& cfg, TestDataGenerator& gen)>;
    FrameworkRunFn framework_run;
};

In BenchmarkRunner::runAll, add a branch: if bc.framework_run is set, call it (skip graph mode / immediate mode); the returned BenchmarkResult already carries framework_metrics.

This keeps the existing 60-kernel codepath untouched.

7. New file: `src/benchmarks/framework_benchmarks.cpp`

One file with a registerFrameworkBenchmarks() function returning all v1 cases, mirroring pipeline_vision.cpp. Each case implements its own framework_run lambda. Sketch:

// graph_dividend: 4-node chain (Gaussian -> Sobel -> Magnitude -> Threshold)
BenchmarkCase bc;
bc.name = "GraphDividend_4node";
bc.category = "framework_dividend";
bc.feature_set = "framework";
bc.required_kernels = { VX_KERNEL_GAUSSIAN_3x3, VX_KERNEL_SOBEL_3x3,
                        VX_KERNEL_MAGNITUDE,    VX_KERNEL_THRESHOLD };
bc.framework_run = [](vx_context ctx, const Resolution& res,
                      const BenchmarkConfig& cfg, TestDataGenerator& gen)
                   -> BenchmarkResult {
    BenchmarkResult r = makeFrameworkResult("GraphDividend_4node", res);

    double t_imm    = timeImmediateChain(ctx, res, cfg, gen);    // sum vxu*
    double t_graph  = timeGraphChain(ctx, res, cfg, gen, /*virtual=*/false);
    double t_virt   = timeGraphChain(ctx, res, cfg, gen, /*virtual=*/true);

    r.framework_metrics = {
        {"sum_immediate_ms",   t_imm   / 1e6, "ms",  false},
        {"graph_real_ms",      t_graph / 1e6, "ms",  false},
        {"graph_virtual_ms",   t_virt  / 1e6, "ms",  false},
        {"graph_speedup",      t_imm / t_graph, "x", true},
        {"virtual_dividend",   t_graph / t_virt, "x", true},
    };
    r.wall_clock.median_ns = t_virt;  // primary timing = best graph form
    r.megapixels_per_sec   = BenchmarkStats::computeThroughput(
                                 res.width, res.height, t_virt);
    return r;
};

The four v1 scenarios are each one ~50–80 line lambda. Helper functions (timeImmediateChain, timeGraphChain, timeAsyncLoop) live in the same file.

8. CLI changes (`src/main.cpp`)

Minimal additions, no breaking changes:

New feature-set value: --feature-set framework.
--all keeps current meaning (vision,enhanced_vision); introduce --feature-set everything (or --include-framework) for the kitchen-sink run so default kernel runs aren't perturbed.
New category strings: framework_compile, framework_dividend, framework_parallel, framework_async (so users can --category framework_dividend).
New flag: --framework-chain-depths 1,4,16,64 (overrides the N-vector for verify_chain).
--list-framework prints the framework scenarios with one-line descriptions.

registerFrameworkBenchmarks() is added to runner.addCases(...) in main only when the user opts in (or when everything is selected).

9. Reporting changes (`src/benchmark_report.cpp`)

JSON. Each result already serializes; just add "framework_metrics": [{name, value, unit, higher_is_better}, ...]. Pre-existing kernel results emit an empty array. No schema break.
CSV. Long-form rows — emit one row per (benchmark, metric) for framework benchmarks; existing kernel rows unchanged. Long-form is safer than wide-form with empty columns.
Markdown. New section "Framework Benchmarks" with one table per scenario showing each metric per resolution, plus a one-line interpretation under each table (e.g., "graph_speedup > 1.0 means the graph form beats summing immediate-mode calls").
Composite score. Add an OpenVX Framework Score = geomean of graph_speedup, virtual_dividend, parallel_efficiency, async_speedup across resolutions (only for those that produced valid values). Print alongside Vision Score in the terminal summary. Do not fold framework numbers into the existing Vision Score — keep the two scoreboards separate so existing comparisons stay valid.
Comparison (compareReports). Extend the diff to print framework metrics side-by-side (vendor A vs vendor B, % delta).

10. Build (`CMakeLists.txt`)

One line:

src/benchmarks/framework_benchmarks.cpp

added to BENCHMARK_SOURCES. No new dependencies.

11. README & glossary

New "Framework Benchmarks" section explaining the philosophy: "kernel scores measure the kernel; framework scores measure what OpenVX adds as a framework."
Glossary entries for: graph_speedup, virtual_dividend, parallel_efficiency, async_speedup, verify_per_node_ms, OpenVX Framework Score.
Explicit caveat: framework metrics are most useful intra-vendor / cross-version; cross-vendor framework comparison still requires interpretation since vendors differ in target exposure.

12. Phased delivery (suggested PR slicing)

PR	Scope	Mergeable independently?
#1 — Plumbing	`FrameworkMetric` struct, `framework_run` field, runner branch, `--feature-set framework` flag, JSON `framework_metrics: []` (empty for kernels), CSV/Markdown unchanged. No new benchmarks.	Yes
#2 — `graph_dividend`	One scenario, helper functions, README section, terminal-summary entry.	Yes
#3 — `verify_chain`	Adds `--framework-chain-depths`, reports verify_ms and ms/node slope.	Yes
#4 — `parallel_branches`	Optional `VX_NODE_TARGET` pinning where supported.	Yes
#5 — `async_streaming`	`vxScheduleGraph` / `vxWaitGraph` loop; gated pipelining-extension probe.	Yes
#6 — Framework Score	Composite scoring, `comparison.md` extension.	After #2–#5
#7 — v2 backlog	per-node perf, map/unmap cost, user-kernel dispatch, context lifecycle, determinism under load.	Each its own PR

PRs #1 and #2 together are the minimum useful slice — they give the headline "graph dividend" number on day one.

13. Open questions — resolved during v1 delivery

Default behavior. ✅ Opt-in. Framework benchmarks ship behind --feature-set framework (only) or --feature-set everything (kernels + framework). Bare ./openvx-mark is unchanged and produces the same Vision Score it always did.
Framework Score weight. ✅ Equal-weight geomean of graph_speedup, virtual_dividend, parallelism_efficiency, and concurrency_speedup. Lower-is-better metrics (verify_per_node_ms, async_overhead_ratio) and fusion_ratio (graph_dividend-only) are intentionally excluded so the score has a single monotonic interpretation and isn't over-weighted by any one scenario.
Chain content. ✅ Both. graph_dividend ships two chains: GraphDividend_Box3x3_x4 (4 × Box3x3 — pure framework signal) and GraphDividend_MixedFilters (Gaussian → Box → Median → Erode — realistic).
Targets for parallel_branches. ✅ Vendor-neutral v1. No VX_NODE_TARGET pinning, no vendor-extension target enumeration. Scenario compares default-scheduler graph vs strictly-serial immediate-mode dispatch and lets parallelism_efficiency speak.
Pipelining extension. ✅ Skipped. vx_khr_pipelining was deliberately not used in v1; async_streaming uses only standard vxScheduleGraph / vxWaitGraph, so it runs on every conformant implementation.
Naming. ✅ "OpenVX Framework Score" — emitted as framework_score in JSON and printed alongside Vision Score in the terminal summary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework Mark Plan

v1 Delivery Summary

1. Goal & Non-goals

2. Conceptual unit: "framework benchmark"

3. v1 Scope

4. v2 Backlog (later, separate PRs)

5. Data-model changes

6. Runner changes

7. New file: `src/benchmarks/framework_benchmarks.cpp`

8. CLI changes (`src/main.cpp`)

9. Reporting changes (`src/benchmark_report.cpp`)

10. Build (`CMakeLists.txt`)

11. README & glossary

12. Phased delivery (suggested PR slicing)

13. Open questions — resolved during v1 delivery

FilesExpand file tree

framework-mark-plan.md

Latest commit

History

framework-mark-plan.md

File metadata and controls

Framework Mark Plan

v1 Delivery Summary

1. Goal & Non-goals

2. Conceptual unit: "framework benchmark"

3. v1 Scope

4. v2 Backlog (later, separate PRs)

5. Data-model changes

6. Runner changes

7. New file: src/benchmarks/framework_benchmarks.cpp

8. CLI changes (src/main.cpp)

9. Reporting changes (src/benchmark_report.cpp)

10. Build (CMakeLists.txt)

11. README & glossary

12. Phased delivery (suggested PR slicing)

13. Open questions — resolved during v1 delivery

7. New file: `src/benchmarks/framework_benchmarks.cpp`

8. CLI changes (`src/main.cpp`)

9. Reporting changes (`src/benchmark_report.cpp`)

10. Build (`CMakeLists.txt`)