Skip to content

Commit 709b7da

Browse files
authored
perf-improvements - box3x3, gaussian3x3, rgb_to_gray, add_images
perf: scalar path optimizations for box3x3, gaussian3x3, rgb_to_gray, add_images
2 parents 0f17b90 + 255c13d commit 709b7da

13 files changed

Lines changed: 1476 additions & 169 deletions

File tree

.github/workflows/conformance.yml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,13 +75,18 @@ jobs:
7575
x86_64|amd64)
7676
HAS_SSE2=false
7777
HAS_AVX2=false
78+
# openvx-core hosts the C-API kernel callbacks (vxAdd /
79+
# vxSubtract / vxBox3x3 / vxGaussian3x3 / vxColorConvert
80+
# → crate::simd_kernels). openvx-vision hosts the public
81+
# Rust-API SIMD kernels. Both crates need the matching
82+
# feature flag for the SIMD path to actually compile in.
7883
if echo "$FLAGS" | grep -qw sse2; then
79-
CARGO_FEATURES="$CARGO_FEATURES openvx-vision/sse2"
84+
CARGO_FEATURES="$CARGO_FEATURES openvx-core/sse2 openvx-vision/sse2"
8085
HAS_SSE2=true
8186
echo " + sse2 detected"
8287
fi
8388
if echo "$FLAGS" | grep -qw avx2; then
84-
CARGO_FEATURES="$CARGO_FEATURES openvx-vision/avx2"
89+
CARGO_FEATURES="$CARGO_FEATURES openvx-core/avx2 openvx-vision/avx2"
8590
HAS_AVX2=true
8691
echo " + avx2 detected"
8792
fi
@@ -96,7 +101,7 @@ jobs:
96101
fi
97102
;;
98103
aarch64|arm64)
99-
CARGO_FEATURES="$CARGO_FEATURES openvx-vision/neon"
104+
CARGO_FEATURES="$CARGO_FEATURES openvx-core/neon openvx-vision/neon"
100105
echo " + neon (mandatory on aarch64)"
101106
;;
102107
*)

README.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,19 +74,39 @@ The standard OpenVX 1.3 C headers are bundled in [`include/VX/`](include/VX/) an
7474

7575
### Cargo features
7676

77-
The vision kernel crate exposes opt-in performance features:
77+
Both `openvx-core` (host of the C-API kernel callbacks the OpenVX graph executor invokes) and `openvx-vision` (host of the public Rust API kernels) expose a matching opt-in feature set:
7878

7979
| Feature | Effect |
8080
|---------|--------|
8181
| `simd` | Enables architecture-neutral SIMD code paths |
8282
| `sse2` / `avx2` | x86_64 SIMD back-ends (imply `simd`) |
8383
| `neon` | AArch64 SIMD back-end (implies `simd`) |
84-
| `parallel` | Enables Rayon-based multi-threaded kernels |
84+
| `parallel` (`openvx-vision` only) | Enables Rayon-based multi-threaded kernels |
8585

86-
Build with one or more features, e.g.:
86+
Build with the matching pair on each crate so the FFI graph path and the direct Rust API path both pick up the SIMD kernels:
8787

8888
```bash
89-
cargo build --release -p openvx-ffi --features "openvx-vision/avx2 openvx-vision/parallel"
89+
cargo build --release -p openvx-ffi \
90+
--features "openvx-core/sse2 openvx-core/avx2 openvx-vision/sse2 openvx-vision/avx2"
91+
```
92+
93+
### Hardware acceleration
94+
95+
Performance work targets **AMD Zen (Ryzen / EPYC, Zen 2+)** — that's what CI measures and what the *Benchmark & compare* numbers come from. Intel and ARM aren't penalised; the runtime dispatcher reads CPU **flags**, not vendor strings, so any host whose flags match the same gate runs the same path:
96+
97+
- **AMD Zen 2+** (Ryzen 3000+, Threadripper 3000+, EPYC Rome / Milan / Genoa) → AVX2 kernels + `-C target-cpu=x86-64-v3` auto-vec.
98+
- **Intel Haswell and later** → same AVX2 path, parity with Zen.
99+
- **Older x86_64** (pre-AVX2) → SSE2 kernels + `-C target-cpu=x86-64-v2`.
100+
- **AArch64** (Apple Silicon, AWS Graviton, etc.) → NEON path.
101+
- **Anything else / no features** → scalar slice loop (still ~50× faster than the original per-pixel kernels).
102+
103+
Dispatch lives in `openvx-core::simd_kernels` (FFI graph path) and `openvx-vision::x86_64_simd` (Rust API). CI auto-detects host flags; for a manual Zen-targeted build:
104+
105+
```bash
106+
RUSTFLAGS="-C target-cpu=x86-64-v3" \
107+
cargo build --release -p openvx-ffi \
108+
--features "openvx-core/sse2 openvx-core/avx2 \
109+
openvx-vision/sse2 openvx-vision/avx2"
90110
```
91111

92112
## Using rustVX from a C application

openvx-core/Cargo.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,14 @@ once_cell = { workspace = true }
1818
[features]
1919
default = []
2020
c-api = []
21+
# SIMD acceleration. Mirrors openvx-vision's feature set so the FFI
22+
# build can pass `openvx-core/sse2 openvx-core/avx2` (or
23+
# `openvx-core/neon` on aarch64) and have the C-API kernel callbacks
24+
# (vxAdd, vxSubtract, vxBox3x3, vxGaussian3x3, vxColorConvert) pick
25+
# up the SIMD-fast paths in `crate::simd_kernels` at runtime via
26+
# `is_x86_feature_detected!`. When none of these features are set,
27+
# the kernel callbacks fall back to the existing tight scalar loops.
28+
simd = []
29+
sse2 = ["simd"]
30+
avx2 = ["simd"]
31+
neon = ["simd"]

openvx-core/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ pub mod c_api;
44
pub mod c_api_data;
55
pub mod context;
66
pub mod reference;
7+
pub mod simd_kernels;
78
pub mod types;
89
pub mod unified_c_api;
910
pub mod vxu_impl;

0 commit comments

Comments
 (0)