You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The second framework scenario: time the cost of building and verifying
graphs of varying depth, plus the lazy-allocation tax paid on the very
first vxProcessGraph call.
For each chain depth N (configurable via --framework-chain-depths,
default 1,4,16,64), the benchmark rebuilds a fresh chain of N Box3x3
nodes and times four phases per N:
n{N}_create_ms vxCreateGraph + N node creations
n{N}_verify_ms vxVerifyGraph
n{N}_first_process_ms first vxProcessGraph (lazy alloc included)
n{N}_steady_process_ms median vxProcessGraph after warmup
A linear regression across the (N, verify_ms) samples then yields:
verify_per_node_ms per-node verify slope (ms/node)
verify_intercept_ms fixed verify cost
first_process_overhead_ms first - steady at deepest chain (the
one-shot tax: lazy alloc, kernel JIT,
target affinity selection, etc.)
These metrics tell the story of the OpenVX runtime's compilation
behavior in a way that no per-kernel measurement can. They surface
implementation choices like:
- whether verify cost is linear, super-linear, or has step
discontinuities (e.g. first call loads kernel modules)
- how much per-node overhead the validator/optimizer adds
- how aggressive lazy allocation is (a large
first_process_overhead_ms means the impl defers most setup until
actual execution)
The runner already pre-checks bc.required_kernels, so the case skips
cleanly on impls without Box3x3.
Smoke results on MIVisionX show ~0.027 ms per added Box3x3 node during
verify and ~12 ms first-process overhead at depth 64 -- previously
invisible in any per-kernel benchmark.
Out of scope:
- other chain shapes (only Box3x3 here; mixed-kernel verify chains
can be added later if useful)
- re-verify cost on parameter or dimension changes
- any heterogeneous-target scheduling effects (covered by PR #4)
Co-authored-by: Cursor <cursoragent@cursor.com>
|`VerifyChain_Box3x3`| Box3x3 × N (sweeps `--framework-chain-depths`, default 1, 4, 16, 64) | Graph build / verify cost vs N nodes; first-process lazy-alloc tax |
181
183
182
184
Each `GraphDividend_*` case times the same chain three ways and emits five metrics:
183
185
@@ -189,6 +191,20 @@ Each `GraphDividend_*` case times the same chain three ways and emits five metri
189
191
|`graph_speedup`| × |`sum_immediate_ms / graph_virtual_ms`. **>1 means the graph form beats summed immediate calls** — the headline framework dividend |
190
192
|`virtual_dividend`| × |`graph_real_ms / graph_virtual_ms`. **>1 means virtual intermediates help** (runtime did something useful with the freedom) |
191
193
194
+
`VerifyChain_Box3x3` rebuilds a chain of N Box3x3 nodes for each requested depth and reports per-N timings plus three aggregate metrics:
195
+
196
+
| Metric | Unit | Meaning |
197
+
|:---|:---|:---|
198
+
|`n{N}_create_ms`| ms |`vxCreateGraph` + N node creations at depth N |
199
+
|`n{N}_verify_ms`| ms |`vxVerifyGraph` cost at depth N |
200
+
|`n{N}_first_process_ms`| ms | First `vxProcessGraph` call (often pays a one-shot lazy-allocation / kernel-init tax) |
201
+
|`n{N}_steady_process_ms`| ms | Median `vxProcessGraph` cost after warmup |
202
+
|`verify_per_node_ms`| ms/node | Linear-regression slope of verify cost over N — the per-node verify tax |
203
+
|`verify_intercept_ms`| ms | Linear-regression intercept — fixed verify cost independent of chain length |
204
+
|`first_process_overhead_ms`| ms |`first_process_ms - steady_process_ms` at the deepest chain — the cost of the first execution beyond steady state |
205
+
206
+
Use `--framework-chain-depths 1,4,16,64,256` to sweep custom depths (defaults to `1,4,16,64`).
0 commit comments