Skip to content

Commit 08c28b6

Browse files
feat(enhanced-vision): implement vxMin/vxMax (Phase 1) (#11)
* feat(enhanced-vision): implement vxMin/vxMax (Phase 1) Adds full graph-mode and immediate-mode support for the OpenVX 1.3 Enhanced Vision pixel-wise minimum and maximum kernels: - New `min_image` / `max_image` core routines and `vxu_min_impl` / `vxu_max_impl` immediate-mode dispatchers in `openvx-core::vxu_impl`, covering both `VX_DF_IMAGE_U8` and `VX_DF_IMAGE_S16` formats with matching-format/dimension validation. - `vxMinNode`, `vxMaxNode`, `vxuMin`, `vxuMax` exports in `openvx-core::unified_c_api`, wired into the graph kernel dispatcher via the new `org.khronos.openvx.min` / `.max` cases. - Kernel signature entries in `openvx-core::c_api::standard_kernels` and `openvx-vision::kernel_enums::VISION_KERNELS`, plus `VxKernel::Min` / `VxKernel::Max` enum variants. - `MinKernel` / `MaxKernel` registered in `openvx-vision::register_all_kernels`. - Rust-side unit tests for `min_image` / `max_image` (basic, dim-mismatch). CTS: builds with `OPENVX_USE_ENHANCED_VISION=ON` and passes 8/8 filtered tests (`Min.*:Max.*` — Immediate U8, Graph U8, Immediate S16, Graph S16 each); the Khronos report now records this as a partial Enhanced Vision profile pass. Link stubs for the rest of the Enhanced Vision feature set (`Bilateral`, `LBP`, `MatchTemplate`, `NonMaxSuppression`, `HOG*`, `ScalarOperation`, `Select`, `vxuCopy`, `vxuHoughLinesP`, `Tensor*` kernels and tensor-handle helpers) are added so the CTS binary links under `-DOPENVX_USE_ENHANCED_VISION=ON`. They return `NULL` / `VX_ERROR_NOT_IMPLEMENTED` and will be replaced by real implementations in subsequent phases. The Phase-1 CI filter (`Min.*:Max.*`) does not exercise them. CI: - New `enhanced-vision` job filtered to `Min.*:Max.*`. - Existing CTS build now passes `-DOPENVX_USE_ENHANCED_VISION=ON` explicitly. README: - Conformance status now lists Enhanced Vision (8/8) alongside baseline and Vision profile counts. - Adds the new `enhanced-vision` job badge to the per-job status table. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: untrack target/ build artifacts The whole `target/` tree (510 files, including a stale Linux `libopenvx_ffi.so` that pre-dated the rust workspace move) was committed before `.gitignore` listed it. `.gitignore` already has `target/`, so untracking is a one-shot cleanup — `cargo build` will recreate the directory locally and the gitignore rule will keep it out from now on. No source / CI changes; CI builds rustVX from scratch and uploads its own `target/release/libopenvx_ffi.so` as a workflow artifact, so removing the stale checked-in copy is safe. Co-authored-by: Cursor <cursoragent@cursor.com> * ci: fix invalid YAML in enhanced-vision job name `name: enhanced-vision (Phase 1: Min/Max)` is malformed YAML — the unquoted `: ` inside the value parses as a key indicator and GitHub Actions rejects the workflow with `Invalid workflow file .github/workflows/conformance.yml#L337`. Quote the string and replace the inner colon with an em-dash. Co-authored-by: Cursor <cursoragent@cursor.com> * ci(benchmark): show speedup of rustVX over Khronos sample The `compare_reports.py` script computes `Speedup = throughput(report_b) / throughput(report_a)` and labels the column ">1.00 means report_b is faster". The CI was passing rustVX as `report_a` and Khronos as `report_b`, so the Speedup column was actually showing how much faster the *Khronos sample* was than rustVX — the inverse of the inline comment claim. Swap the argument order (Khronos = baseline / report_a, rustVX = candidate / report_b) so the column now reads as "rustVX over Khronos" with >1.00x meaning rustVX wins. Also prepend a headline summary to the GitHub Actions job summary that aggregates per-benchmark speedups into: - geomean and median speedup of rustVX over Khronos - count of benchmarks compared - rustVX-faster vs Khronos-sample-faster counts - best and worst per-benchmark speedup (with kernel/mode/resolution) - a one-line "rustVX is N.NNx faster" / "N.NNx slower" verdict Followed by the existing detailed comparison table from `compare_reports.py`. Validated locally on synthetic JSON. README: tweak the benchmark callout to mention the new headline. Co-authored-by: Cursor <cursoragent@cursor.com> * perf(integral_image): native u32 stores in inner loop The Phase-1 PR's openvx-mark CI run flagged `IntegralImage` as 4.13x slower than the Khronos sample (rustVX 1.44ms vs Khronos 0.35ms at VGA, CV 0.4% — a stable, real gap, not noise). Root cause was the inner loop in `vxu_impl::integral_image`: - Every pixel read of the row-above value did 4 byte loads from `dst.data_mut()` and reassembled a `u32` via `from_le_bytes`, each guarded by a `if offset + 4 <= len` bounds check. - Every write decomposed the result with `to_le_bytes()` and stored 4 individual bytes through `dst_data[offset+i] = b[i]`, again behind a bounds check. - Source pixels went through `Image::get_pixel(x, y)` which itself bounds-checks and `unwrap_or(&0)`s on every call. That defeated the optimiser's ability to emit native aligned 32-bit loads/stores and added two redundant bounds checks per pixel. Fix: - Validate buffer sizes once up front (returning `VX_ERROR_INVALID_DIMENSION` if undersized rather than silently skipping pixels), then reinterpret the destination byte buffer as `&mut [u32]` via `from_raw_parts_mut`. rustVX only ships on little-endian hosts (x86_64 / aarch64) so the on-disk layout is preserved; a `debug_assert!(cfg!(target_endian = "little"))` keeps that contract honest if a big-endian target is ever added. - Split the dst into "previous row" / "current row" slices via `split_at_mut` so the borrow checker sees disjoint ranges; the optimiser then emits a tight scalar loop with native u32 ops. - Hoist `src.data()` to a `&[u8]` slice and index it directly, eliminating the per-pixel `get_pixel` bounds check. Local microbench (clang -O2 calling vxuIntegralImage 200x at VGA): before: 1.4425 ms/call, 213 MP/s (CI value) after: 0.2428 ms/call, 1265 MP/s ≈6x faster locally; should beat the Khronos sample (0.35ms / 880 MP/s on the same GHA hardware) once CI re-runs. Conformance preserved: 9/9 IntegralImage CTS tests pass with the new code (filter `Integral.*`); 17/17 with `Integral.*:Min.*:Max.*`. The two other "losses" the headline reported on the previous CI run are not real: - `LaplacianPyramid` reported Khronos at 0.0016ms/call = 1.6µs at VGA, which is physically impossible for a multi-level pyramid build — that's a Khronos sample no-op / lazy evaluation, not a rustVX deficit. - `Magnitude` was 1.01x slower (2.65ms vs 2.62ms, CV 0.3%/0.8%) — well within measurement noise. Both are tracked separately for follow-up; this commit fixes the only verified clean gap. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 8291c16 commit 08c28b6

520 files changed

Lines changed: 1015 additions & 2002 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/conformance.yml

Lines changed: 106 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,8 @@ jobs:
5555
-DCMAKE_CXX_STANDARD_LIBRARIES="-lm" \
5656
-DOPENVX_INCLUDES="${{ github.workspace }}/include;${{ github.workspace }}/OpenVX-cts/include" \
5757
-DOPENVX_LIBRARIES="${{ github.workspace }}/target/release/libopenvx_ffi.so;m" \
58-
-DOPENVX_CONFORMANCE_VISION=ON
58+
-DOPENVX_CONFORMANCE_VISION=ON \
59+
-DOPENVX_USE_ENHANCED_VISION=ON
5960
make -j$(nproc)
6061
- name: Upload build artifacts
6162
uses: actions/upload-artifact@v4
@@ -325,6 +326,33 @@ jobs:
325326
export VX_TEST_DATA_PATH=${{ github.workspace }}/OpenVX-cts/test_data/
326327
timeout 300 ./bin/vx_test_conformance --filter="GaussianPyramid.*:LaplacianPyramid.*:LaplacianReconstruct.*:OptFlowPyrLK.*"
327328
329+
# Enhanced Vision Phase 1 — only the kernels rustVX has actually
330+
# implemented from the OpenVX 1.3 Enhanced Vision feature set. The CTS
331+
# binary is built with `OPENVX_USE_ENHANCED_VISION=ON`, but this job
332+
# filters strictly to the kernels Phase 1 ships (vxMin / vxMax). The
333+
# remaining Enhanced Vision symbols are exposed as link stubs in
334+
# rustVX so the binary can build; they are not exercised here and will
335+
# be replaced by real kernels in subsequent phases.
336+
enhanced-vision:
337+
name: "enhanced-vision (Phase 1 — Min/Max)"
338+
runs-on: ubuntu-22.04
339+
needs: build
340+
steps:
341+
- uses: actions/checkout@v4
342+
with:
343+
submodules: recursive
344+
- name: Download build artifacts
345+
uses: actions/download-artifact@v4
346+
with:
347+
name: build-artifacts
348+
- name: Run Enhanced Vision Phase 1 tests
349+
run: |
350+
chmod +x OpenVX-cts/build/bin/vx_test_conformance
351+
cd OpenVX-cts/build
352+
export LD_LIBRARY_PATH=${{ github.workspace }}/target/release
353+
export VX_TEST_DATA_PATH=${{ github.workspace }}/OpenVX-cts/test_data/
354+
timeout 120 ./bin/vx_test_conformance --filter="Min.*:Max.*"
355+
328356
# Performance benchmark using openvx-mark, comparing rustVX against the
329357
# Khronos OpenVX sample implementation on the SAME runner so the two
330358
# numbers come from identical hardware. This job does NOT rebuild either
@@ -458,17 +486,90 @@ jobs:
458486
exit 0
459487
fi
460488
461-
# First report is the "candidate" (rustVX); second is the "baseline"
462-
# (Khronos). The Speedup column shows how much faster the candidate
463-
# is than the baseline (>1.0x = rustVX wins).
489+
# `compare_reports.py` defines Speedup as
490+
# speedup = throughput(report_b) / throughput(report_a)
491+
# i.e. ">1.00 means report_b is faster". To make the Speedup
492+
# column read as "rustVX over Khronos" (>1.00x = rustVX wins),
493+
# pass Khronos first (baseline / report_a) and rustVX second
494+
# (candidate / report_b).
464495
python3 ${{ github.workspace }}/openvx-mark/scripts/compare_reports.py \
465-
"$RUSTVX" "$KHRONOS" \
496+
"$KHRONOS" "$RUSTVX" \
466497
--output ${{ github.workspace }}/openvx-mark/comparison
467498
468499
- name: Post comparison to job summary
469500
if: always()
470501
run: |
471502
COMPARISON=${{ github.workspace }}/openvx-mark/comparison.md
503+
RUSTVX=${{ github.workspace }}/openvx-mark/build-rustvx/benchmark_results/benchmark_results.json
504+
KHRONOS=${{ github.workspace }}/openvx-mark/build-khronos/benchmark_results/benchmark_results.json
505+
506+
# ----- Headline: aggregate speedup of rustVX over Khronos sample -----
507+
if [ -f "$RUSTVX" ] && [ -f "$KHRONOS" ]; then
508+
python3 - "$RUSTVX" "$KHRONOS" >> "$GITHUB_STEP_SUMMARY" <<'PY'
509+
import json, math, sys
510+
511+
rustvx_path, khronos_path = sys.argv[1], sys.argv[2]
512+
with open(rustvx_path) as f: rustvx = json.load(f)
513+
with open(khronos_path) as f: khronos = json.load(f)
514+
515+
def by_key(report):
516+
return {(r['name'], r['mode'], r['resolution']): r
517+
for r in report.get('results', [])}
518+
519+
a = by_key(rustvx)
520+
b = by_key(khronos)
521+
shared = sorted(set(a) & set(b))
522+
523+
speedups = []
524+
wins, losses = 0, 0
525+
best = (None, 0.0)
526+
worst = (None, math.inf)
527+
528+
for key in shared:
529+
ra, rb = a[key], b[key]
530+
if not (ra.get('verified', True) and rb.get('verified', True)):
531+
continue
532+
mps_r = ra.get('megapixels_per_sec', 0)
533+
mps_k = rb.get('megapixels_per_sec', 0)
534+
if mps_r <= 0 or mps_k <= 0:
535+
continue
536+
s = mps_r / mps_k # >1.0 = rustVX faster than Khronos
537+
speedups.append(s)
538+
if s > 1.0: wins += 1
539+
elif s < 1.0: losses += 1
540+
if s > best[1]: best = (key, s)
541+
if s < worst[1]: worst = (key, s)
542+
543+
print('# rustVX vs Khronos sample — headline')
544+
print()
545+
if not speedups:
546+
print('_No verified benchmarks were directly comparable._')
547+
else:
548+
geomean = math.exp(sum(math.log(s) for s in speedups) / len(speedups))
549+
median = sorted(speedups)[len(speedups) // 2]
550+
print('| Metric | Value |')
551+
print('|:---|---:|')
552+
print(f'| Geomean speedup (rustVX / Khronos) | **{geomean:.2f}x** |')
553+
print(f'| Median speedup (rustVX / Khronos) | {median:.2f}x |')
554+
print(f'| Benchmarks compared | {len(speedups)} |')
555+
print(f'| rustVX faster | {wins} |')
556+
print(f'| Khronos sample faster | {losses} |')
557+
if best[0]:
558+
bk, bv = best
559+
print(f'| Best rustVX speedup | {bv:.2f}x ({bk[0]} / {bk[1]} / {bk[2]}) |')
560+
if worst[0] and worst[1] != math.inf:
561+
wk, wv = worst
562+
print(f'| Worst rustVX speedup | {wv:.2f}x ({wk[0]} / {wk[1]} / {wk[2]}) |')
563+
print()
564+
if geomean >= 1.0:
565+
print(f'> rustVX is **{geomean:.2f}x** faster than the Khronos sample on average (geomean across {len(speedups)} verified benchmarks).')
566+
else:
567+
print(f'> rustVX is **{1.0/geomean:.2f}x slower** than the Khronos sample on average (geomean across {len(speedups)} verified benchmarks).')
568+
print()
569+
PY
570+
fi
571+
572+
# ----- Detailed comparison table from compare_reports.py -----
472573
if [ -f "$COMPARISON" ]; then
473574
cat "$COMPARISON" >> "$GITHUB_STEP_SUMMARY"
474575
else

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,16 @@ An [OpenVX 1.3.1](https://www.khronos.org/openvx/) implementation written in Rus
1414

1515
## Conformance Status
1616

17-
rustVX passes the full [Khronos OpenVX 1.3 Conformance Test Suite](https://github.com/KhronosGroup/OpenVX-cts) for both required profiles:
17+
rustVX passes the full [Khronos OpenVX 1.3 Conformance Test Suite](https://github.com/KhronosGroup/OpenVX-cts) for both required profiles, plus an opt-in slice of the Enhanced Vision profile:
1818

1919
| Profile | Required tests | Passing |
2020
|---------|----------------|---------|
2121
| OpenVX baseline | 863 | **863 / 863** |
2222
| Vision conformance profile | 4957 | **4957 / 4957** |
23-
| **Total enabled** | **5820** | **5820 / 5820** |
23+
| Enhanced Vision (`vxMin`, `vxMax`) | 8 | **8 / 8** |
24+
| **Total enabled** | **5828** | **5828 / 5828** |
25+
26+
The remaining Enhanced Vision kernels (`Copy`, `Houghlinesp`, `BilateralFilter`, `NonMaxSuppression`, `MatchTemplate`, `LBP`, `HogCells`, `HogFeatures`, `ControlFlow`/`Select`, `Tensor*`) are tracked as follow-up phases; rustVX currently exposes them as link-only stubs so the CTS binary can be built with `-DOPENVX_USE_ENHANCED_VISION=ON`. The Phase-1 CI job filters strictly to `Min.*:Max.*`.
2427

2528
Latest CTS run results are published on each push and pull request via the [Actions tab](https://github.com/kiritigowda/rustVX/actions).
2629

@@ -212,7 +215,7 @@ cargo bench -p openvx-vision
212215
End-to-end performance is also tracked against the [Khronos OpenVX sample implementation](https://github.com/KhronosGroup/OpenVX-sample-impl) on every CI run via [openvx-mark](https://github.com/kiritigowda/openvx-mark); see the *Benchmark & compare* job in the [Actions tab](https://github.com/kiritigowda/rustVX/actions/workflows/conformance.yml?query=branch%3Amain) for the latest comparison report.
213216

214217
> [!TIP]
215-
> The *Benchmark & compare* job renders the rustVX-vs-Khronos comparison table directly into the GitHub Actions **job summary** for each run — no need to dig into logs. The raw JSON reports are also published as downloadable workflow artifacts (`benchmark-results-rustvx`, `benchmark-results-khronos-sample`, and `benchmark-comparison`) on every push and pull request.
218+
> The *Benchmark & compare* job renders the rustVX-vs-Khronos comparison table directly into the GitHub Actions **job summary** for each run — no need to dig into logs. The summary opens with a headline panel showing the **geomean speedup of rustVX over the Khronos sample** (per-kernel best/worst speedups and a win/loss count) followed by the full per-benchmark detail table. The raw JSON reports are also published as downloadable workflow artifacts (`benchmark-results-rustvx`, `benchmark-results-khronos-sample`, and `benchmark-comparison`) on every push and pull request.
216219
217220
## Continuous Integration
218221

@@ -231,6 +234,7 @@ GitHub Actions builds and runs the full CTS on every push and pull request. The
231234
| **vision-features** | HarrisCorners, FastCorners, Canny | [![vision-features](https://img.shields.io/github/check-runs/kiritigowda/rustVX/main?nameFilter=vision-features&label=)](https://github.com/kiritigowda/rustVX/actions/workflows/conformance.yml?query=branch%3Amain) |
232235
| **vision-statistics** | MeanStdDev, MinMaxLoc, Integral | [![vision-statistics](https://img.shields.io/github/check-runs/kiritigowda/rustVX/main?nameFilter=vision-statistics&label=)](https://github.com/kiritigowda/rustVX/actions/workflows/conformance.yml?query=branch%3Amain) |
233236
| **vision-pyramid** | GaussianPyramid, LaplacianPyramid, LaplacianReconstruct, OptFlowPyrLK | [![vision-pyramid](https://img.shields.io/github/check-runs/kiritigowda/rustVX/main?nameFilter=vision-pyramid&label=)](https://github.com/kiritigowda/rustVX/actions/workflows/conformance.yml?query=branch%3Amain) |
237+
| **enhanced-vision** (Phase 1) | Min, Max (Enhanced Vision feature set) | [![enhanced-vision](https://img.shields.io/github/check-runs/kiritigowda/rustVX/main?nameFilter=enhanced-vision&label=)](https://github.com/kiritigowda/rustVX/actions/workflows/conformance.yml?query=branch%3Amain) |
234238

235239
See the [Actions tab](https://github.com/kiritigowda/rustVX/actions) for full run history.
236240

openvx-core/src/c_api.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,9 @@ fn register_standard_kernels(context_id: u32) {
310310
("org.khronos.openvx.add", 0x21, 4),
311311
("org.khronos.openvx.subtract", 0x22, 4),
312312
("org.khronos.openvx.multiply", 0x20, 7),
313+
// Enhanced Vision: pixel-wise min/max
314+
("org.khronos.openvx.min", 0x3F, 3),
315+
("org.khronos.openvx.max", 0x3E, 3),
313316
// Bitwise
314317
("org.khronos.openvx.and", 0x1C, 3),
315318
("org.khronos.openvx.or", 0x1D, 3),

openvx-core/src/types.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,9 @@ pub enum VxKernel {
114114
Or = 40,
115115
Xor = 41,
116116
Not = 42,
117+
// Enhanced Vision (OpenVX 1.2+)
118+
Min = 43,
119+
Max = 44,
117120
}
118121

119122
/// Reference type

0 commit comments

Comments
 (0)