Skip to content

Commit 804ef7a

Browse files
Add OpenVX Framework Score and framework metrics comparison
The framework benchmarks (PRs #2-#5) emit per-scenario metrics, but the report had no single number that summarised the framework dividend across benchmarks, and the comparison report did not surface framework metrics at all. Add an OpenVX Framework Score to `CompositeScores`: the equal-weight geometric mean of the four "higher is better, dimensionless ratio" framework metrics (`graph_speedup`, `virtual_dividend`, `parallelism_efficiency`, `concurrency_speedup`) across all framework results. Lower-is-better metrics (verify cost, async overhead) are intentionally excluded so the score has a single monotonic interpretation. Emit the score in the JSON `scores` section, in the Markdown composite-score table, and in the terminal summary; also add a "Framework Benchmarks" section to the Markdown report listing every framework metric per scenario. Extend both the C++ `compareReports` and the Python `compare_reports.py` to add a Framework Score row to the Conformance & Scores table and a "Framework Metrics Comparison" section. The ratio column normalises direction so >1.00 always means the second implementation is better. README adds glossary entries and updates the example terminal summary. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 94090e1 commit 804ef7a

5 files changed

Lines changed: 275 additions & 5 deletions

File tree

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,7 @@ Interpreting `parallelism_efficiency`:
252252
=============================================================
253253
Summary: 156 total | 156 passed | 0 skipped | 0 failed
254254
OpenVX Vision Score: 1586.05 MP/s (156 benchmarks)
255+
OpenVX Framework Score: 4.872x (geomean of 18 framework metrics)
255256
vision Conformance: PASS (41/41)
256257
vision Top-5 Fastest:
257258
1. Not 26835.8 MP/s (graph, FHD)
@@ -281,6 +282,7 @@ Interpreting `parallelism_efficiency`:
281282
- **OpenVX Vision Score** — Geometric mean of MP/s across all passing graph-mode vision benchmarks
282283
- **Enhanced Vision Score** — Geometric mean when enhanced_vision benchmarks are included
283284
- **Category Sub-Scores** — Per-category geometric mean (pixelwise, filters, color, etc.)
285+
- **OpenVX Framework Score** — Equal-weight geometric mean (dimensionless, ×) of all `graph_speedup`, `virtual_dividend`, `parallelism_efficiency`, and `concurrency_speedup` values produced by the framework benchmarks. **>1.0 means the OpenVX graph framework adds aggregate value over a kernel-only baseline.** Lower-is-better metrics (e.g. `verify_per_node_ms`, `async_overhead_ratio`) are intentionally excluded so the score has a single monotonic interpretation. Only emitted when framework benchmarks are run (`--feature-set framework` or `--feature-set everything`).
284286

285287
### Conformance Summary
286288

@@ -334,6 +336,7 @@ python3 scripts/compare_reports.py results_vendor_a/benchmark_results.json \
334336
| **Sustained Ratio** | `min_ns / median_ns`. Values near 1.0 indicate consistent performance; lower values suggest variance from caching, scheduling, or thermal effects. |
335337
| **Scaling Efficiency** | `(MP/s at high res) / (MP/s at low res)`. 1.0 = perfect scaling; values below 1.0 indicate memory or bandwidth bottlenecks at higher resolutions. |
336338
| **Vision Score** | Geometric mean of MP/s across all passing graph-mode vision benchmarks. Single-number summary for cross-vendor comparison. |
339+
| **Framework Score** | Equal-weight geometric mean (×, dimensionless) of all `graph_speedup`, `virtual_dividend`, `parallelism_efficiency`, and `concurrency_speedup` values produced by framework benchmarks. >1.0 means the OpenVX graph framework adds aggregate value over a kernel-only baseline. Only emitted when framework benchmarks are run. |
337340
| **Stability Warning** | Flagged when CV% exceeds the stability threshold (default: 15%). Indicates the result may not be reliable — increase iterations or reduce system load. |
338341
| **Conformance** | Whether all available kernels in a feature set produced valid graph-mode results. PASS = all kernels benchmarked successfully. |
339342

include/benchmark_report.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,19 @@ struct CompositeScores {
1616
std::map<std::string, double> category_scores; // per-category geometric mean
1717
int vision_count = 0;
1818
int enhanced_count = 0;
19+
20+
// Framework Score: dimensionless geometric mean of "higher is better" framework
21+
// metrics that capture how much value the OpenVX *graph framework* adds beyond
22+
// raw kernel throughput. Aggregates:
23+
// - graph_speedup (graph_dividend benchmarks)
24+
// - virtual_dividend (graph_dividend benchmarks)
25+
// - parallelism_efficiency (parallel_branches benchmarks)
26+
// - concurrency_speedup (async_streaming benchmarks)
27+
// A value >1.0 means the framework adds aggregate value over a kernel-only
28+
// baseline. Lower-is-better metrics (verify cost, async overhead) are
29+
// intentionally excluded so the score has a single, monotonic interpretation.
30+
double framework_score = 0;
31+
int framework_metric_count = 0;
1932
};
2033

2134
// Conformance checking (Feature 7)

scripts/compare_reports.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,13 @@ def write_markdown(impl_names, result_maps, all_keys, output_path, reports, syst
147147
if enhanced_a > 0 or enhanced_b > 0:
148148
f.write(f'| Enhanced Vision Score (MP/s) | {enhanced_a:.2f} | {enhanced_b:.2f} |\n')
149149

150+
framework_a = scores[0].get('framework_score', 0) if len(scores) > 0 else 0
151+
framework_b = scores[1].get('framework_score', 0) if len(scores) > 1 else 0
152+
framework_count_a = scores[0].get('framework_metric_count', 0) if len(scores) > 0 else 0
153+
framework_count_b = scores[1].get('framework_metric_count', 0) if len(scores) > 1 else 0
154+
if framework_count_a > 0 or framework_count_b > 0:
155+
f.write(f'| Framework Score (x, geomean) | {framework_a:.3f} | {framework_b:.3f} |\n')
156+
150157
conformance_info = []
151158
for report in reports:
152159
conf_list = report.get('conformance', [])
@@ -189,6 +196,55 @@ def write_markdown(impl_names, result_maps, all_keys, output_path, reports, syst
189196
f.write(f'| {display} | {a_val:.2f} | {b_val:.2f} | {sign}{change:.1f} |\n')
190197
f.write('\n')
191198

199+
# --- Framework Metrics Comparison ---
200+
# Group framework metrics by (benchmark name, resolution); union across reports.
201+
fw_keys = {} # key -> display
202+
fw_metrics_by_key = {} # key -> set(metric names)
203+
per_side_metrics = [{}, {}] # side -> key -> {metric_name: dict}
204+
for side, rmap in enumerate(result_maps):
205+
for (name, mode, resolution), r in rmap.items():
206+
fms = r.get('framework_metrics', [])
207+
if not fms:
208+
continue
209+
key = (name, resolution)
210+
fw_keys[key] = f'{name} @ {resolution}'
211+
fw_metrics_by_key.setdefault(key, set())
212+
per_side_metrics[side].setdefault(key, {})
213+
for fm in fms:
214+
nm = fm.get('name')
215+
if not nm:
216+
continue
217+
fw_metrics_by_key[key].add(nm)
218+
per_side_metrics[side][key][nm] = fm
219+
220+
if fw_keys:
221+
f.write('## Framework Metrics Comparison\n\n')
222+
f.write(f'> Per-scenario framework metrics (orchestration, scheduling, async, '
223+
f'verification). Higher-is-better metrics show {impl_names[1]}/{impl_names[0]}; '
224+
f'lower-is-better metrics show {impl_names[0]}/{impl_names[1]}. '
225+
f'A ratio >1.00 always means {impl_names[1]} is better.\n\n')
226+
f.write(f'| Benchmark @ Resolution | Metric | Unit | {impl_names[0]} | {impl_names[1]} | Ratio | Direction |\n')
227+
f.write('|:---|:---|:---|---:|---:|---:|:---|\n')
228+
for key in sorted(fw_keys.keys()):
229+
display = fw_keys[key]
230+
for nm in sorted(fw_metrics_by_key[key]):
231+
a_fm = per_side_metrics[0].get(key, {}).get(nm)
232+
b_fm = per_side_metrics[1].get(key, {}).get(nm)
233+
higher_better = (a_fm or b_fm or {}).get('higher_is_better', True)
234+
unit = (a_fm or b_fm or {}).get('unit', '') or '—'
235+
a_val = a_fm.get('value') if a_fm else None
236+
b_val = b_fm.get('value') if b_fm else None
237+
a_str = f'{a_val:.3f}' if a_val is not None else '—'
238+
b_str = f'{b_val:.3f}' if b_val is not None else '—'
239+
if a_val and b_val and a_val > 0 and b_val > 0:
240+
ratio = (b_val / a_val) if higher_better else (a_val / b_val)
241+
ratio_str = f'{ratio:.2f}'
242+
else:
243+
ratio_str = '—'
244+
direction = 'higher is better' if higher_better else 'lower is better'
245+
f.write(f'| {display} | `{nm}` | {unit} | {a_str} | {b_str} | {ratio_str} | {direction} |\n')
246+
f.write('\n')
247+
192248
# --- Build comparison rows (include all results, not just verified) ---
193249
comparison_rows = []
194250
for key in all_keys:

0 commit comments

Comments
 (0)