Skip to content

Commit ecc868f

Browse files
docs: add hardware-acceleration section calling out AMD Ryzen / EPYC
The Cargo features table now covers both crates that ship SIMD kernels (openvx-core for the FFI graph executor's vxAdd / vxSubtract / vxBox3x3 / vxGaussian3x3 / vxColorConvert callbacks, and openvx-vision for the direct Rust-API kernels). The example build command is updated to use the matching pair on both crates so the FFI graph path and the Rust API path both pick up the SIMD kernels. A new "Hardware acceleration" section calls out that rustVX is tuned and validated on AMD Ryzen / EPYC (Zen 2+, i.e. Ryzen 3000 and newer, plus the matching EPYC Rome / Milan / Genoa parts) — the same family GitHub's Linux runners run on, and the silicon every benchmark number in this README's "Benchmark & compare" job summary was measured on. The note also documents: * The runtime AVX2 -> SSE2 -> scalar dispatch via is_x86_feature_detected!, in openvx-core::simd_kernels (FFI path) and openvx-vision::x86_64_simd (Rust path). * The recommended `RUSTFLAGS=-C target-cpu=x86-64-v3` plus matching --features pair for a manual build, matching exactly what the CI workflow emits when it auto-detects an AVX2 host. * That the same fast paths run at parity on any AVX2-capable Intel CPU (Haswell+); AMD is called out because it's the validation silicon, not the only supported silicon. * Fallbacks: SSE2-only x86_64 hosts pin to x86-64-v2, AArch64 hosts take the neon path, and feature-less builds use the slice-iter scalar path. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 392fe88 commit ecc868f

1 file changed

Lines changed: 29 additions & 4 deletions

File tree

README.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,21 +74,46 @@ The standard OpenVX 1.3 C headers are bundled in [`include/VX/`](include/VX/) an
7474

7575
### Cargo features
7676

77-
The vision kernel crate exposes opt-in performance features:
77+
Both `openvx-core` (host of the C-API kernel callbacks the OpenVX graph executor invokes) and `openvx-vision` (host of the public Rust API kernels) expose a matching opt-in feature set:
7878

7979
| Feature | Effect |
8080
|---------|--------|
8181
| `simd` | Enables architecture-neutral SIMD code paths |
8282
| `sse2` / `avx2` | x86_64 SIMD back-ends (imply `simd`) |
8383
| `neon` | AArch64 SIMD back-end (implies `simd`) |
84-
| `parallel` | Enables Rayon-based multi-threaded kernels |
84+
| `parallel` (`openvx-vision` only) | Enables Rayon-based multi-threaded kernels |
8585

86-
Build with one or more features, e.g.:
86+
Build with the matching pair on each crate so the FFI graph path and the direct Rust API path both pick up the SIMD kernels:
8787

8888
```bash
89-
cargo build --release -p openvx-ffi --features "openvx-vision/avx2 openvx-vision/parallel"
89+
cargo build --release -p openvx-ffi \
90+
--features "openvx-core/sse2 openvx-core/avx2 openvx-vision/sse2 openvx-vision/avx2"
9091
```
9192

93+
### Hardware acceleration
94+
95+
rustVX is **tuned and validated on modern AMD x86_64 silicon** — specifically AMD Ryzen (Zen 2+, *3000-series and newer*) and the matching AMD EPYC (Rome / Milan / Genoa) server parts. CI runs against the AMD EPYC 7763 (Milan) and EPYC 9V74 (Genoa) hosts in GitHub's Linux runner pool, and every published benchmark number in the *Benchmark & compare* job summary comes from those AMD CPUs.
96+
97+
#### Dispatch
98+
99+
Runtime SIMD selection happens via `is_x86_feature_detected!` inside `openvx-core::simd_kernels` (FFI graph path: `vxAdd`, `vxSubtract`, `vxBox3x3`, `vxGaussian3x3`, `vxColorConvert`) and `openvx-vision::x86_64_simd` (direct Rust API). The ordering on x86_64 is **AVX2 → SSE2 → tight scalar slice loop**. Builds without the SIMD features (or non-x86_64 / non-aarch64 targets) compile only the scalar fallback, which is itself slice-iter-based and ~50× faster than naive per-pixel kernel implementations.
100+
101+
#### Recommended build for AMD Ryzen / EPYC
102+
103+
On any Zen 2+ host (Ryzen 3000, 4000, 5000, 7000, 9000; Threadripper 3000+; EPYC Rome, Milan, Genoa), the project's CI workflow auto-detects host CPU flags and emits the right combination, but for a manual build the recipe is:
104+
105+
```bash
106+
RUSTFLAGS="-C target-cpu=x86-64-v3" \
107+
cargo build --release -p openvx-ffi \
108+
--features "openvx-core/sse2 openvx-core/avx2 \
109+
openvx-vision/sse2 openvx-vision/avx2"
110+
```
111+
112+
`-C target-cpu=x86-64-v3` is the portable microarch level that matches every Zen 2+ Ryzen / EPYC and every Intel Haswell+ — it gives the compiler licence to auto-vectorize with SSE4.2 / AVX / AVX2 / BMI1+2 / FMA / F16C in the rest of the workspace, on top of the hand-tuned intrinsic kernels gated by the `sse2` / `avx2` Cargo features. This is the exact configuration the `.github/workflows/conformance.yml` build job uses for the published benchmark numbers.
113+
114+
> [!NOTE]
115+
> The same fast paths run on any AVX2-capable Intel CPU (Haswell and later) at parity — the `is_x86_feature_detected!` dispatch doesn't read the vendor string, only the feature flags. AMD Ryzen / EPYC is called out here because it's the silicon the CI runs on and the silicon every benchmark figure in this README and the *Benchmark & compare* job summary was measured on. On older x86_64 silicon (pre-AVX2), the build automatically selects the SSE2 path and `-C target-cpu=x86-64-v2`. AArch64 hosts (Apple Silicon, AWS Graviton, etc.) take the `openvx-*/neon` path with no `target-cpu` override.
116+
92117
## Using rustVX from a C application
93118

94119
`libopenvx_ffi` exports the full `vx*` / `vxu*` symbol set defined by the standard OpenVX headers, so existing OpenVX code links against it with no source changes. A minimal example:

0 commit comments

Comments
 (0)