|
| 1 | +# Changelog |
| 2 | + |
| 3 | +All notable changes to **openvx-mark** are documented here. |
| 4 | + |
| 5 | +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project follows semantic versioning where the major version tracks backward compatibility of the JSON report schema. |
| 6 | + |
| 7 | +## [Unreleased] |
| 8 | + |
| 9 | +## [1.0.0] — Framework Mark v1 |
| 10 | + |
| 11 | +The first major openvx-mark release that benchmarks the OpenVX **graph framework** itself, not just individual kernels. Adds a new family of *framework benchmarks* — scenarios that exercise the OpenVX graph runtime (verification, virtual-image fusion, parallel scheduling, async dispatch, per-node attribution) and that **no per-kernel benchmark can surface** — alongside the existing 60-kernel suite, which is unchanged. |
| 12 | + |
| 13 | +### Added — Framework benchmarks (opt-in) |
| 14 | + |
| 15 | +Run with `--feature-set framework` (only framework scenarios) or `--feature-set everything` (kernels + framework). Default `./openvx-mark` runs are unchanged. |
| 16 | + |
| 17 | +- **`GraphDividend_Box3x3_x4`** and **`GraphDividend_MixedFilters`** — time the same N-node chain three ways (sum of immediate `vxu*` calls, graph with real intermediates, graph with virtual intermediates) and emit `sum_immediate_ms`, `graph_real_ms`, `graph_virtual_ms`, `graph_speedup`, `virtual_dividend`. The headline `graph_speedup > 1.0` is the framework dividend. |
| 18 | +- **`VerifyChain_Box3x3`** — sweeps chain depths (configurable via `--framework-chain-depths`, default `1,4,16,64`) and reports per-N create / verify / first-process / steady-process timings, plus regression-derived `verify_per_node_ms`, `verify_intercept_ms`, and `first_process_overhead_ms`. |
| 19 | +- **`ParallelBranches_Box3x3`** — K = 4 independent Box3x3 nodes sharing one input image, compared against K back-to-back `vxuBox3x3` immediate calls. Reports `parallelism_speedup` and `parallelism_efficiency` (where 1.0 = perfect K-way parallelism). |
| 20 | +- **`Async_Single_Box3x3_x4`** — quantifies the per-call cost of `vxScheduleGraph` + `vxWaitGraph` vs `vxProcessGraph` on the same graph. Reports `async_overhead_ratio` (lower is better). |
| 21 | +- **`Async_Concurrent_Box3x3_x2`** — schedules two independent graphs concurrently and reports `concurrency_speedup` — direct evidence of whether the runtime overlaps independent work. |
| 22 | +- **Per-node `VX_NODE_PERFORMANCE` attribution** on both `GraphDividend_*` chains: emits `node_count`, `node_sum_ms`, `graph_perf_ms`, and `fusion_ratio` (`node_sum_ms / graph_perf_ms`). `≈ 1.0` = strict back-to-back, `> 1.0` = fusion / overlap detected, `≈ node_count` = the runtime reports graph time per node and isn't attributing per-node performance. |
| 23 | + |
| 24 | +### Added — OpenVX Framework Score |
| 25 | + |
| 26 | +A new dimensionless headline number, computed as the **equal-weight geometric mean** of every `graph_speedup`, `virtual_dividend`, `parallelism_efficiency`, and `concurrency_speedup` value produced by framework benchmarks. **`framework_score > 1.0` means the OpenVX graph framework adds aggregate value over a kernel-only baseline.** Lower-is-better metrics and the scenario-specific `fusion_ratio` are intentionally excluded so the score has a single monotonic interpretation. Only emitted when framework benchmarks are run. |
| 27 | + |
| 28 | +Surfaced everywhere the Vision Score appears: |
| 29 | + |
| 30 | +- Terminal summary: `OpenVX Framework Score: <x>x (geomean of <N> framework metrics)`. |
| 31 | +- JSON `scores.framework_score` and `scores.framework_metric_count`. |
| 32 | +- Markdown report's Composite Scores table plus a new dedicated **Framework Benchmarks** section listing every metric per scenario with its unit and direction. |
| 33 | +- Both the C++ `--compare` path and `scripts/compare_reports.py` add a Framework Score row to **Conformance & Scores** and a new **Framework Metrics Comparison** table whose ratio column is direction-aware (so `> 1.00` always means the second implementation is better). |
| 34 | + |
| 35 | +### Added — Plumbing |
| 36 | + |
| 37 | +- New `FrameworkMetric` struct: `{name, value, unit, higher_is_better}`. `BenchmarkResult` gains a `framework_metrics` vector (empty for kernel results — backward-compatible). |
| 38 | +- New `BenchmarkCase::framework_run` callback: framework benchmarks own their entire timing loop and return a populated `BenchmarkResult`. Existing 60-kernel codepath is untouched. |
| 39 | +- New CLI flag `--framework-chain-depths` for `verify_chain` depth sweeps. |
| 40 | +- New `--feature-set` values: `framework` (only) and `everything` (kernels + framework). |
| 41 | +- CI workflow runs framework benchmarks for every vendor (Khronos sample-impl, MIVisionX) in a dedicated step and posts the headline metrics to the GitHub Actions job summary. |
| 42 | + |
| 43 | +### Changed |
| 44 | + |
| 45 | +- `BenchmarkRunner::runAll` dispatches to `framework_run` when set, with a pre-check for required kernels (so framework cases skip cleanly on implementations missing Box3x3 etc.). |
| 46 | +- README adds a Framework Benchmarks section, glossary entries for every framework metric, and a Framework Score entry. Example terminal summary updated. |
| 47 | +- JSON schema adds the `scores.framework_score`, `scores.framework_metric_count`, and per-result `framework_metrics` array. Existing kernel results emit an empty `framework_metrics` array. **No breaking change** for tools that consumed the previous schema. |
| 48 | + |
| 49 | +### Notes for implementers |
| 50 | + |
| 51 | +- `fusion_ratio` is implementation-quality-dependent: a value `≈ node_count` (e.g. `4.0` on a 4-node chain) usually means the runtime is reporting whole-graph time on every node. Useful cross-vendor signal in its own right; intentionally excluded from the Framework Score because not every conformant runtime populates `VX_NODE_PERFORMANCE` cleanly. |
| 52 | +- `concurrency_speedup < 1.0` at small resolutions is expected and meaningful — it means async dispatch overhead exceeds concurrency gain at that work size. |
| 53 | +- Pipelined streaming via the optional `vx_khr_pipelining` extension is intentionally out of scope for v1; only standard OpenVX APIs are used. |
| 54 | + |
| 55 | +### v2 backlog (separate future PRs) |
| 56 | + |
| 57 | +- `vxMapImagePatch` / `vxUnmapImagePatch` round-trip cost (host ↔ device tax). |
| 58 | +- User-kernel dispatch tax via `vxAddUserKernel` no-op. |
| 59 | +- Context lifecycle stress (`vxCreateContext` / `vxReleaseContext` × N). |
| 60 | +- Determinism under load (single-graph CV% while K other graphs are scheduled). |
| 61 | +- NN / extension-gated benchmarks. |
| 62 | + |
| 63 | +See [`docs/framework-mark-plan.md`](docs/framework-mark-plan.md) for the full v1 design rationale. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## Pre-1.0 |
| 68 | + |
| 69 | +Earlier work — the kernel-only suite, output verification, MIVisionX CI, and version-independent build — landed in PRs #1–#4 on `main`. There is no formal changelog entry for those releases; see git history. |
0 commit comments