Skip to content

v1.1: Vision 42/42 + Enhanced Vision 19/19 across openvx-mark, opencv-mark, and rustVX #77

v1.1: Vision 42/42 + Enhanced Vision 19/19 across openvx-mark, opencv-mark, and rustVX

v1.1: Vision 42/42 + Enhanced Vision 19/19 across openvx-mark, opencv-mark, and rustVX #77

Workflow file for this run

name: CI
on:
push:
branches: [main]
# Run CI on pull requests targeting any base branch, not just main.
# This keeps stacked PR workflows covered (a PR's base may be another
# feature branch, e.g. an umbrella branch or a previous PR in a stack).
pull_request:
# Auto-cancel superseded runs on the same ref so a rapid push series
# (e.g. force-push during PR review) doesn't queue 3+ stale runs and
# starve the GitHub Actions runner pool. main pushes are exempt — we
# always want a clean signal on main.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# ============================================================================
# Architecture
#
# Phase 1 (parallel) — four independent build jobs:
# * Three OpenVX-impl jobs (MIVisionX, Khronos sample, rustVX). Each:
# 1. Builds the implementation from source.
# 2. Stages a self-contained artifact: <impl>-stage/lib + <impl>-stage/include.
# 3. Builds openvx-mark against the just-built impl.
# 4. Runs a quick smoke benchmark as a "local unit test" — catches
# build-link breakage and missing-symbol issues immediately,
# scoped to the specific impl, without waiting for the slower
# comparison job downstream.
# 5. Uploads the staged artifact for the comparison job to consume.
# * One OpenCV-baseline job (opencv-mark companion binary). Differs from
# the OpenVX jobs because OpenCV is apt-installable and opencv-mark has
# no OpenVX dependency — see the build-opencv job below for the shape.
# Stages its smoke JSON directly (no impl tarball needed).
#
# Per-impl feature-set policy
# ---------------------------
# Not every impl ships the full OpenVX 1.3.1 conformance surface, so each
# bench is scoped to the feature sets that impl actually implements:
#
# * MIVisionX — `vision,framework`. AMD's runtime exports the 42
# Vision Conformance kernels but **does NOT export
# most of the 19 Enhanced Vision APIs** (Bilateral-
# Filter, HOG*, Tensor*, Select, ScalarOperation,
# etc.). With `enhanced_vision` enabled, the per-
# benchmark dlsym shim in openvx_optional_apis.h
# would dutifully report 19 SKIPPED rows, which is
# accurate but uninformative noise on every run —
# so we omit it.
# * Khronos sample — `vision,enhanced_vision,framework`. CTS-conformant
# reference impl; ships both profiles.
# * rustVX — `vision,enhanced_vision,framework`. CTS-conformant
# for Vision (5923/5923) and Enhanced Vision
# (1235/1235) per the rustVX README.
# * opencv-mark — `vision,enhanced_vision` (no `framework`; cv:: has
# no graph runtime to measure). All 79 + 19 = 98
# OpenCV-side benchmarks run.
#
# Phase 2 (single job, depends on all four Phase-1 jobs) — comparison.
# 1. Downloads all three OpenVX impl artifacts onto a single runner;
# apt-installs OpenCV on that same runner.
# 2. Builds openvx-mark × 3 (one per OpenVX impl) so all binaries link
# against the same openvx-mark source tree at the same commit.
# Builds opencv-mark from the same source tree.
# 3. Runs the full benchmark against each impl using that impl's
# feature-set policy (above). Same hardware = fair cross-vendor
# comparison. `compare_reports.py` joins by (name, mode, resolution)
# and silently drops rows not on both sides, so enhanced_vision
# rows naturally appear in pairs where both impls produced them
# (Khronos↔OpenCV, rustVX↔OpenCV, Khronos↔rustVX) and are absent
# from MIVisionX↔* pairs.
# 4. Generates six pairwise comparison reports:
# OpenVX-vs-OpenVX:
# * MIVisionX vs Khronos sample
# * MIVisionX vs rustVX
# * Khronos sample vs rustVX
# OpenVX-vs-OpenCV (the "does adopting OpenVX pay off?" trio):
# * MIVisionX vs OpenCV
# * Khronos sample vs OpenCV
# * rustVX vs OpenCV
# 5. Posts each report to the job summary and uploads as an artifact.
#
# Inspired by the layered build/perf-gate design in rustVX's conformance CI:
# https://github.com/kiritigowda/rustVX/blob/main/.github/workflows/conformance.yml
# ============================================================================
jobs:
# --------------------------------------------------------------------------
# Phase 1 — MIVisionX (AMD OpenVX, CPU backend)
# --------------------------------------------------------------------------
build-mivisionx:
name: Build MIVisionX (CPU) + smoke test
runs-on: ubuntu-22.04
steps:
- name: Checkout openvx-mark
uses: actions/checkout@v4
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake git python3
# Why -DCMAKE_CXX_FLAGS_RELEASE override (the "optimized kernels" knob):
#
# MIVisionX's amd_openvx/openvx/ago/ago_haf_cpu_*.cpp files contain
# hand-written AVX2 intrinsics (_mm256_*) for the CPU-side "Hardware
# Acceleration Functions" — these are the OPTIMIZED kernel paths.
# However, MIVisionX's own top-level CMakeLists.txt appends ONLY
# `-msse4.2` to CMAKE_CXX_FLAGS, with no -mavx2/-mfma and no
# __attribute__((target("avx2"))) on any function. With just -msse4.2
# the compiler can still emit the AVX2 intrinsics in those specific
# call sites, but it CANNOT auto-vectorise the surrounding scalar /
# loop code beyond SSE4.2, can't use FMA, can't use BMI/BMI2 — so
# the per-kernel dispatch glue, address arithmetic, and any kernel
# code that's not hand-vectorised stays at SSE4.2 throughput. That's
# the "base kernel" path the umbrella PR description points at.
#
# By overriding CMAKE_CXX_FLAGS_RELEASE we get -O3 -DNDEBUG plus
# x86-64-v3 (= SSE4.2 + AVX + AVX2 + BMI + BMI2 + FMA + LZCNT + POPCNT),
# which is the conservative-portable AMD64 baseline modern compilers
# ship for since gcc 11. GitHub Actions Ubuntu 22.04 runners use Intel
# Xeon or AMD EPYC CPUs which all support x86-64-v3.
#
# MIVisionX still appends `-msse4.2` to CMAKE_CXX_FLAGS (we don't
# override CMAKE_CXX_FLAGS, only the per-config Release variant),
# so the final compile line is "-O3 -DNDEBUG -march=x86-64-v3
# -msse4.2". -march wins for code-gen ceiling; the dup -msse4.2 is
# redundant but harmless.
- name: Build MIVisionX (CPU backend, optimized)
run: |
set -euo pipefail
git clone --depth 1 --branch develop \
https://github.com/ROCm/MIVisionX.git /tmp/mivisionx-src
mkdir -p /tmp/mivisionx-src/build
cd /tmp/mivisionx-src/build
cmake \
-DBACKEND=CPU \
-DNEURAL_NET=OFF \
-DLOOM=OFF \
-DMIGRAPHX=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -march=x86-64-v3" \
-DCMAKE_INSTALL_PREFIX=/tmp/mivisionx-install \
..
make -j$(nproc)
make install
# Sanity-print the actual compile flags the make rules used —
# surfaces in CI logs so a reviewer can confirm AVX2 made it
# into the build (look for `-march=x86-64-v3` in the cmake echo).
grep -h 'CXX_FLAGS' CMakeFiles/openvx.dir/flags.make 2>/dev/null \
| head -2 || true
- name: Stage MIVisionX artifact
id: stage
run: |
set -euo pipefail
mkdir -p mivisionx-stage/lib mivisionx-stage/include
LIB_SRC=$(dirname "$(find /tmp/mivisionx-install -name 'libopenvx.so' | head -1)")
echo "MIVisionX libraries discovered in: $LIB_SRC"
# Copy ALL libopenvx* / libvxu* entries (libopenvx.so symlink,
# libopenvx.so.1 SONAME symlink, libopenvx.so.X.Y.Z real file)
# preserving symlinks (-P) so ld.so can follow the SONAME chain.
# Without versioned files the linker reports
# "libopenvx.so.1: cannot open shared object file".
find "$LIB_SRC" -maxdepth 1 -name 'libopenvx*' -exec cp -P {} mivisionx-stage/lib/ \;
find "$LIB_SRC" -maxdepth 1 -name 'libvxu*' -exec cp -P {} mivisionx-stage/lib/ \;
cp -r /tmp/mivisionx-install/include/mivisionx/. mivisionx-stage/include/
echo "--- staged lib ---"
ls -la mivisionx-stage/lib
echo "--- staged include (top-level) ---"
ls -la mivisionx-stage/include
{
echo "lib_dir=$(pwd)/mivisionx-stage/lib"
echo "include_dir=$(pwd)/mivisionx-stage/include"
} >> "$GITHUB_OUTPUT"
- name: Build openvx-mark (smoke)
run: |
set -euo pipefail
mkdir -p build-smoke
cd build-smoke
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPENVX_INCLUDES=${{ steps.stage.outputs.include_dir }} \
-DOPENVX_LIB_DIR=${{ steps.stage.outputs.lib_dir }} \
..
cmake --build . -j$(nproc)
# Smoke covers the `vision` + `framework` feature sets only.
# MIVisionX's runtime exports the 42 Vision Conformance kernels
# but does NOT export most of the 19 Enhanced Vision APIs
# (BilateralFilter, HOG*, Tensor*, Select, ScalarOperation, etc.).
# With `enhanced_vision` enabled, the per-benchmark dlsym shim in
# openvx_optional_apis.h would dutifully report 19 SKIPPED rows
# on every run — accurate but uninformative noise. The Khronos
# sample, rustVX, and opencv-mark smoke jobs DO exercise
# `enhanced_vision` because those impls actually ship it.
- name: Run smoke benchmark (vision + framework, VGA × 5 iters, single-threaded)
# Smoke is advisory — if a specific impl crashes inside a
# specific kernel the artifact upload (which the compare job
# depends on) must still happen so vendor-vs-vendor signal
# isn't lost.
continue-on-error: true
run: |
set -eo pipefail
cd build-smoke
export LD_LIBRARY_PATH=${{ steps.stage.outputs.lib_dir }}:${LD_LIBRARY_PATH:-}
# Timer self-test up front so a sloppy runner clock fails
# loud before we trust a smoke timing number.
./openvx-mark --validate-timing
# `--threads 1` matches the Phase 2 compare config — same
# apples-to-apples threading policy on smoke and full bench
# so smoke timings are interpretable as a coarse preview.
./openvx-mark --feature-set vision,framework \
--resolution VGA --iterations 5 --warmup 1 --threads 1 \
--output-dir smoke-results
- name: Upload MIVisionX artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: impl-mivisionx
path: mivisionx-stage/
retention-days: 1
- name: Upload MIVisionX smoke results
if: always()
uses: actions/upload-artifact@v4
with:
name: smoke-results-mivisionx
path: build-smoke/smoke-results/
if-no-files-found: ignore
# --------------------------------------------------------------------------
# Phase 1 — Khronos OpenVX sample implementation
# --------------------------------------------------------------------------
build-khronos-sample:
name: Build Khronos sample + smoke test
runs-on: ubuntu-22.04
steps:
- name: Checkout openvx-mark
uses: actions/checkout@v4
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake git python3
# Khronos sample is a reference impl (no SIMD intrinsics), so
# most of the perf budget rides on whatever compiler auto-vec the
# build picks up. Build.py honours CFLAGS / CXXFLAGS from the
# environment, so we use those to upgrade the compile baseline
# to x86-64-v3 (= AVX2 + FMA + BMI2 + LZCNT + POPCNT), matching
# what the MIVisionX build above gets. No fairness claim that
# the sample becomes "competitive" — it's a reference — just
# that it's being measured at the SAME compile baseline as
# MIVisionX so the cross-impl comparison isn't contaminated by
# one side getting better auto-vec than the other.
- name: Build Khronos OpenVX sample (Release, x86-64-v3)
run: |
set -euo pipefail
git clone --recursive --depth 1 \
https://github.com/KhronosGroup/OpenVX-sample-impl.git /tmp/khronos-src
cd /tmp/khronos-src
export CFLAGS="-O3 -march=x86-64-v3 ${CFLAGS:-}"
export CXXFLAGS="-O3 -march=x86-64-v3 ${CXXFLAGS:-}"
echo "CFLAGS = ${CFLAGS}"
echo "CXXFLAGS= ${CXXFLAGS}"
python3 Build.py --os=Linux --arch=64 --conf=Release
- name: Stage Khronos sample artifact
id: stage
run: |
set -euo pipefail
mkdir -p khronos-stage/lib khronos-stage/include
LIB_SRC=$(dirname "$(find /tmp/khronos-src -name 'libopenvx.so' -not -path '*/build/*' | head -1)")
echo "Khronos libraries discovered in: $LIB_SRC"
# Same approach as MIVisionX: copy all libopenvx* / libvxu* entries
# preserving symlinks so ld.so can follow the SONAME chain.
find "$LIB_SRC" -maxdepth 1 -name 'libopenvx*' -exec cp -P {} khronos-stage/lib/ \;
find "$LIB_SRC" -maxdepth 1 -name 'libvxu*' -exec cp -P {} khronos-stage/lib/ \;
cp -r /tmp/khronos-src/api-docs/include/. khronos-stage/include/
echo "--- staged lib ---"
ls -la khronos-stage/lib
echo "--- staged include (top-level) ---"
ls -la khronos-stage/include
{
echo "lib_dir=$(pwd)/khronos-stage/lib"
echo "include_dir=$(pwd)/khronos-stage/include"
} >> "$GITHUB_OUTPUT"
- name: Build openvx-mark (smoke)
run: |
set -euo pipefail
mkdir -p build-smoke
cd build-smoke
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPENVX_INCLUDES=${{ steps.stage.outputs.include_dir }} \
-DOPENVX_LIB_DIR=${{ steps.stage.outputs.lib_dir }} \
..
cmake --build . -j$(nproc)
# Khronos sample is a CTS-conformant reference impl that ships
# both the Vision (42 kernels) and Enhanced Vision (19 kernels)
# profiles, so we exercise `vision,enhanced_vision,framework` at
# smoke time. `continue-on-error: true` keeps the artifact upload
# alive if any specific kernel crashes mid-run; the comparison job
# downstream handles whichever JSON files actually got produced.
- name: Run smoke benchmark (vision + enhanced_vision + framework, VGA × 5 iters)
continue-on-error: true
run: |
set -eo pipefail
cd build-smoke
export LD_LIBRARY_PATH=${{ steps.stage.outputs.lib_dir }}:${LD_LIBRARY_PATH:-}
./openvx-mark --validate-timing
./openvx-mark --feature-set vision,enhanced_vision,framework \
--resolution VGA --iterations 5 --warmup 1 --threads 1 \
--output-dir smoke-results
- name: Upload Khronos sample artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: impl-khronos-sample
path: khronos-stage/
retention-days: 1
- name: Upload Khronos sample smoke results
if: always()
uses: actions/upload-artifact@v4
with:
name: smoke-results-khronos-sample
path: build-smoke/smoke-results/
if-no-files-found: ignore
# --------------------------------------------------------------------------
# Phase 1 — rustVX (Rust OpenVX implementation)
#
# rustVX ships a single libopenvx_ffi.so that exports the full vx*/vxu*
# symbol set. openvx-mark's CMake uses find_library(NAMES openvx) and
# find_library(NAMES vxu) — so we symlink the two classic Khronos lib
# names to the FFI .so during staging, without modifying rustVX's own
# build output.
#
# SIMD config: AVX2 + `-C target-cpu=x86-64-v3`, matching what rustVX's
# own CI ships. We deliberately skip the alignment-pad RUSTFLAGS used in
# rustVX's PR-vs-main perf gate — those exist to make rustVX-vs-rustVX
# bench numbers invariant to .text shifts, which is irrelevant for the
# vendor-vs-vendor comparison this workflow runs.
# --------------------------------------------------------------------------
build-rustvx:
name: Build rustVX + smoke test
runs-on: ubuntu-22.04
steps:
- name: Checkout openvx-mark
uses: actions/checkout@v4
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake git
- name: Install Rust toolchain
run: |
set -euo pipefail
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \
| sh -s -- -y --default-toolchain stable
source "$HOME/.cargo/env"
rustc --version
cargo --version
- name: Build rustVX (release, AVX2)
run: |
set -euo pipefail
source "$HOME/.cargo/env"
git clone --depth 1 \
https://github.com/kiritigowda/rustVX.git /tmp/rustvx-src
cd /tmp/rustvx-src
case "$(uname -m)" in
x86_64|amd64)
FEATURES="openvx-core/sse2 openvx-core/avx2 openvx-vision/sse2 openvx-vision/avx2"
export RUSTFLAGS="-C target-cpu=x86-64-v3"
;;
aarch64|arm64)
FEATURES="openvx-core/neon openvx-vision/neon"
export RUSTFLAGS=""
;;
*)
FEATURES=""
export RUSTFLAGS=""
;;
esac
echo "Architecture : $(uname -m)"
echo "Cargo features: ${FEATURES:-<none>}"
echo "RUSTFLAGS : ${RUSTFLAGS:-<none>}"
if [ -n "$FEATURES" ]; then
cargo build --release -p openvx-ffi --features "$FEATURES"
else
cargo build --release -p openvx-ffi
fi
- name: Stage rustVX artifact (with libopenvx / libvxu symlinks)
id: stage
run: |
set -euo pipefail
mkdir -p rustvx-stage/lib rustvx-stage/include
cp /tmp/rustvx-src/target/release/libopenvx_ffi.so rustvx-stage/lib/
# Classic Khronos library names so openvx-mark's find_library picks
# them up. Symlinks survive upload-artifact@v4 (it preserves them
# within tar), so the comparison job downstream sees the same.
(
cd rustvx-stage/lib
ln -sf libopenvx_ffi.so libopenvx.so
ln -sf libopenvx_ffi.so libvxu.so
)
cp -r /tmp/rustvx-src/include/. rustvx-stage/include/
echo "--- staged lib ---"
ls -la rustvx-stage/lib
echo "--- staged include (top-level) ---"
ls -la rustvx-stage/include
{
echo "lib_dir=$(pwd)/rustvx-stage/lib"
echo "include_dir=$(pwd)/rustvx-stage/include"
} >> "$GITHUB_OUTPUT"
- name: Build openvx-mark (smoke)
run: |
set -euo pipefail
mkdir -p build-smoke
cd build-smoke
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPENVX_INCLUDES=${{ steps.stage.outputs.include_dir }} \
-DOPENVX_LIB_DIR=${{ steps.stage.outputs.lib_dir }} \
..
cmake --build . -j$(nproc)
# rustVX is CTS-conformant for both Vision (5923/5923) and
# Enhanced Vision (1235/1235), so we exercise the full
# `vision,enhanced_vision,framework` surface at smoke time. This
# is the impl that gives the headline "all 19 enhanced_vision
# kernels produce real measurements" cell in the comparison
# table — every other OpenVX backend either omits the profile
# (MIVisionX) or has known per-kernel quirks.
- name: Run smoke benchmark (vision + enhanced_vision + framework, VGA × 5 iters)
continue-on-error: true
run: |
set -eo pipefail
cd build-smoke
export LD_LIBRARY_PATH=${{ steps.stage.outputs.lib_dir }}:${LD_LIBRARY_PATH:-}
./openvx-mark --validate-timing
./openvx-mark --feature-set vision,enhanced_vision,framework \
--resolution VGA --iterations 5 --warmup 1 --threads 1 \
--output-dir smoke-results
- name: Upload rustVX artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: impl-rustvx
path: rustvx-stage/
retention-days: 1
- name: Upload rustVX smoke results
if: always()
uses: actions/upload-artifact@v4
with:
name: smoke-results-rustvx
path: build-smoke/smoke-results/
if-no-files-found: ignore
# --------------------------------------------------------------------------
# Phase 1 — OpenCV baseline (companion binary `opencv-mark`)
#
# OpenCV is the de facto vision baseline. This job exists so we can answer
# "does adopting OpenVX actually pay off vs the cv:: code I already have?"
# at the per-kernel level, on the same CI hardware as every OpenVX impl.
#
# Differs from the OpenVX impl jobs in two ways:
# 1. OpenCV is apt-installable (no from-source build), so this job is
# much shorter — install, configure parent CMake, build, smoke.
# 2. There is no impl-tarball staging step. opencv-mark IS the binary
# that runs the OpenCV-side measurements; there is no separate
# "link openvx-mark against this libopenvx.so" rebuild downstream.
# The Phase 2 comparison job re-runs opencv-mark itself (after a
# fresh apt-install of OpenCV) for strict same-runner fairness vs
# the per-impl benches — see compare job's `Build & bench
# opencv-mark` step.
#
# The smoke run here is fast feedback only (catches build/link breakage
# in <1 min on every PR); the comparison-grade FHD × 20 iter benchmark
# lives in Phase 2 alongside the OpenVX impl benches.
# --------------------------------------------------------------------------
build-opencv:
name: Build opencv-mark (OpenCV baseline) + smoke test
runs-on: ubuntu-22.04
steps:
- name: Checkout openvx-mark
uses: actions/checkout@v4
- name: Install dependencies (OpenCV 4 from apt)
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake git python3 \
libopencv-dev
# Sanity-print the OpenCV version that pkg-config sees so
# comparison reports later can be cross-referenced against
# exactly this version string.
pkg-config --modversion opencv4 || true
- name: Configure & build opencv-mark
run: |
set -euo pipefail
mkdir -p build-opencv
cd build-opencv
# Parent CMake auto-includes opencv-mark/ when OpenCV is found.
# No OPENVX_* flags needed — opencv-mark has no OpenVX dep.
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --target opencv-mark -j$(nproc)
# Fail loudly if the binary somehow didn't get produced (e.g.
# OpenCV detection silently no-op'd). This is the exact failure
# mode that PR #1's first CI run was missing.
test -x opencv-mark/opencv-mark \
|| { echo "ERROR: opencv-mark binary not built — OpenCV likely not detected by CMake"; exit 1; }
# `--help` doubles as a version probe — it prints the opencv-mark
# version line and the linked OpenCV version up top. PR1's CLI
# does not implement a dedicated `--version` flag yet.
./opencv-mark/opencv-mark --help | head -3
# Same shape as the OpenVX-impl smokes (VGA × 5 iters, 1 warmup)
# so timing noise stays comparable. Not continue-on-error —
# opencv-mark has no impl-side quirks to tolerate; if a kernel
# breaks here it's our bug.
#
# Feature-set: `vision,enhanced_vision`. opencv-mark has 1:1
# coverage of both profiles (42 vision + 19 enhanced = 61
# kernels) — that's the entire OpenCV-side surface this CI
# exercises. `framework` is intentionally omitted (OpenCV has
# no graph runtime to measure; the framework benches that
# depend on `vxProcessGraph` semantics are OpenVX-only).
- name: Run smoke benchmark (vision + enhanced_vision, VGA × 5 iters)
run: |
set -eo pipefail
cd build-opencv
# Timer self-test up front — same gate that runs in the
# Phase 2 compare job. Catches a borked runner clock at
# smoke time so we don't waste a full FHD bench cycle.
./opencv-mark/opencv-mark --validate-timing
# `--threads 1` for symmetry with the smokes that run
# against single-threaded OpenVX impls — keeps the smoke
# comparable in shape to the cross-impl ones, even though
# the smoke itself is just a "did it build & did it run?"
# check, not a perf claim.
./opencv-mark/opencv-mark --feature-set vision,enhanced_vision \
--resolution VGA --iterations 5 --warmup 1 --threads 1 \
--output-dir smoke-results
- name: Upload opencv-mark smoke results
if: always()
uses: actions/upload-artifact@v4
with:
name: smoke-results-opencv
path: build-opencv/smoke-results/
if-no-files-found: ignore
# --------------------------------------------------------------------------
# Phase 2 — Pairwise comparison
#
# Pulls all three OpenVX implementation artifacts onto the same runner,
# plus apt-installs OpenCV, so every benchmark is exercised on identical
# hardware. Builds openvx-mark once per OpenVX impl (against this commit's
# source tree, not pre-built artifacts — keeps the comparison binary
# identical apart from the linked OpenVX lib), builds opencv-mark from
# the same source tree, runs the full feature-set bench against each,
# and emits six pairwise comparison reports:
#
# OpenVX-vs-OpenVX (3):
# * MIVisionX over Khronos sample — AMD over reference
# * MIVisionX over rustVX — AMD over Rust impl
# * rustVX over Khronos sample — Rust impl over reference
#
# OpenVX-vs-OpenCV (3) — "does adopting OpenVX pay off?":
# * MIVisionX over OpenCV — best-tuned OpenVX vs cv::
# * Khronos sample over OpenCV — reference OpenVX vs cv::
# * rustVX over OpenCV — Rust OpenVX vs cv::
#
# `if: always()` + per-download `continue-on-error` + per-bench
# `if: always() && steps.detect...` so a single failed build still
# surfaces the comparison signal for whichever other impls are
# available, instead of losing all visibility.
# --------------------------------------------------------------------------
compare:
name: Pairwise comparison (MIVisionX, Khronos, rustVX, OpenCV)
runs-on: ubuntu-22.04
needs:
- build-mivisionx
- build-khronos-sample
- build-rustvx
- build-opencv
if: always()
steps:
- name: Checkout openvx-mark
uses: actions/checkout@v4
- name: Install dependencies
run: |
sudo apt-get update
# libopencv-dev is needed so the Phase 2 `Build & bench
# opencv-mark` step can re-link opencv-mark on this runner.
# Strictly same-hardware fairness vs the per-impl benches.
sudo apt-get install -y build-essential cmake git python3 \
libopencv-dev
pkg-config --modversion opencv4 || true
- name: Download MIVisionX artifact
uses: actions/download-artifact@v4
with:
name: impl-mivisionx
path: ${{ github.workspace }}/impl/mivisionx
continue-on-error: true
- name: Download Khronos sample artifact
uses: actions/download-artifact@v4
with:
name: impl-khronos-sample
path: ${{ github.workspace }}/impl/khronos
continue-on-error: true
- name: Download rustVX artifact
uses: actions/download-artifact@v4
with:
name: impl-rustvx
path: ${{ github.workspace }}/impl/rustvx
continue-on-error: true
- name: Detect available implementations
id: detect
run: |
set -euo pipefail
for impl in mivisionx khronos rustvx; do
lib="${{ github.workspace }}/impl/$impl/lib/libopenvx.so"
if [ -e "$lib" ]; then
echo "$impl: AVAILABLE ($lib)"
chmod -R u+rwX "${{ github.workspace }}/impl/$impl/lib"
echo "${impl}=true" >> "$GITHUB_OUTPUT"
else
echo "$impl: MISSING (artifact download failed or build job did not produce it)"
echo "${impl}=false" >> "$GITHUB_OUTPUT"
fi
done
# ----- Per-impl build + benchmark (FHD, 20 iter, 5 warmup) -----
#
# Each per-impl bench uses `if: always() && steps.detect...` because
# GitHub Actions treats any explicit `if:` without `always()` as
# implicit `success()` — meaning a crash in MIVisionX bench would
# skip the Khronos / rustVX bench steps entirely and we'd lose all
# comparison signal. With `always()` the three benches stay
# independent and the comparison job downstream handles whichever
# JSON files actually got produced.
#
# `--threads 1` is passed EXPLICITLY (it's also the default — but
# we want the CI compare config to be self-documenting). Rationale:
#
# * MIVisionX CPU backend, Khronos sample, and rustVX are all
# fundamentally single-threaded per kernel — none of them have
# an internal thread pool on the CPU path.
# * OpenCV, by contrast, will happily spawn nproc threads via
# TBB/OpenMP if left at its default. Without the `--threads 1`
# pin, the OpenCV side would get an unfair (nproc)x parallelism
# boost just from defaults — the comparison would no longer be
# "OpenVX kernel vs OpenCV kernel" but "1-thread OpenVX vs
# n-thread OpenCV". `--threads 1` calls cv::setNumThreads(1)
# for opencv-mark and sets OMP_NUM_THREADS=1 in the env for
# anything OpenMP-using downstream.
#
# Feature set is per-impl (see the architecture comment block
# at the top of this file for the full policy):
# * MIVisionX — `vision,framework` (no enhanced_vision;
# AMD's runtime doesn't export the APIs)
# * Khronos sample — `vision,enhanced_vision,framework`
# * rustVX — `vision,enhanced_vision,framework`
# * opencv-mark — `vision,enhanced_vision` (no framework;
# OpenCV has no graph runtime to measure)
# `compare_reports.py` joins by (name, mode, resolution) and
# silently drops rows not on both sides, so enhanced_vision
# rows naturally appear in pairs where both impls produced them
# (Khronos↔OpenCV, rustVX↔OpenCV, Khronos↔rustVX) and are absent
# from MIVisionX↔* pairs.
- name: Build & bench against MIVisionX (single-threaded, FHD × 20)
if: always() && steps.detect.outputs.mivisionx == 'true'
run: |
set -euo pipefail
mkdir -p build-mivisionx
cd build-mivisionx
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPENVX_INCLUDES=${{ github.workspace }}/impl/mivisionx/include \
-DOPENVX_LIB_DIR=${{ github.workspace }}/impl/mivisionx/lib \
..
cmake --build . -j$(nproc)
export LD_LIBRARY_PATH=${{ github.workspace }}/impl/mivisionx/lib:${LD_LIBRARY_PATH:-}
# Timer self-test first — gates the rest of the bench. If the
# runner clock is sloppy, our timing numbers are meaningless
# and we'd rather know about it now than ship bad data.
./openvx-mark --validate-timing
./openvx-mark --feature-set vision,framework \
--resolution FHD --iterations 20 --warmup 5 --threads 1 \
--output-dir results
# Sentinel-set dump for cross-impl numerical verification —
# see scripts/cross_verify_outputs.py. Runs the kernel set
# ONCE (no timing, no warmup) so it's cheap, then the
# downstream verify step compares this dump against the
# OpenCV dump for correctness.
./openvx-mark --dump-outputs dump-mivisionx --seed 42
- name: Build & bench against Khronos sample (single-threaded, FHD × 20)
if: always() && steps.detect.outputs.khronos == 'true'
# `continue-on-error: true` so a crash inside a single
# enhanced_vision kernel (the reference impl has known per-
# kernel quirks under heavy use) doesn't take out the
# comparison signal for whichever kernels did complete.
# `openvx-mark` only writes its JSON at end-of-run, but the
# surrounding job steps still upload artifacts as long as we
# reach them.
continue-on-error: true
run: |
set -eo pipefail
mkdir -p build-khronos
cd build-khronos
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPENVX_INCLUDES=${{ github.workspace }}/impl/khronos/include \
-DOPENVX_LIB_DIR=${{ github.workspace }}/impl/khronos/lib \
..
cmake --build . -j$(nproc)
export LD_LIBRARY_PATH=${{ github.workspace }}/impl/khronos/lib:${LD_LIBRARY_PATH:-}
./openvx-mark --validate-timing
./openvx-mark --feature-set vision,enhanced_vision,framework \
--resolution FHD --iterations 20 --warmup 5 --threads 1 \
--output-dir results
./openvx-mark --dump-outputs dump-khronos --seed 42 || true
- name: Build & bench against rustVX (single-threaded, FHD × 20)
if: always() && steps.detect.outputs.rustvx == 'true'
# rustVX is CTS-conformant for both Vision (5923/5923) and
# Enhanced Vision (1235/1235), so all 42 + 19 kernels should
# actually produce real measurements here. This row is the
# headline cell for "what does a fully-conformant OpenVX impl
# look like vs OpenCV on the same hardware?".
# `continue-on-error: true` is a belt-and-suspenders safety
# in case any one kernel surfaces a regression mid-bench —
# the artifact upload (which downstream comparisons depend
# on) must still happen.
continue-on-error: true
run: |
set -eo pipefail
mkdir -p build-rustvx
cd build-rustvx
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPENVX_INCLUDES=${{ github.workspace }}/impl/rustvx/include \
-DOPENVX_LIB_DIR=${{ github.workspace }}/impl/rustvx/lib \
..
cmake --build . -j$(nproc)
export LD_LIBRARY_PATH=${{ github.workspace }}/impl/rustvx/lib:${LD_LIBRARY_PATH:-}
./openvx-mark --validate-timing
./openvx-mark --feature-set vision,enhanced_vision,framework \
--resolution FHD --iterations 20 --warmup 5 --threads 1 \
--output-dir results
./openvx-mark --dump-outputs dump-rustvx --seed 42 || true
# opencv-mark has no OpenVX dependency, so no OPENVX_* flags and no
# detect-step gate — it only needs `libopencv-dev` (already installed
# above). Same FHD × 20 iter × 5 warmup × --threads 1 shape as the
# OpenVX benches so per-kernel speedups are directly comparable.
#
# Feature-set is `vision,enhanced_vision` — opencv-mark has 1:1
# coverage of both profiles (79 + 19 = 98 OpenCV-side benchmarks
# total). `framework` is intentionally omitted because OpenCV has
# no graph runtime to measure (the framework benches that depend
# on `vxProcessGraph` / virtual-image fusion semantics are
# OpenVX-only by design). `compare_reports.py` ignores rows that
# only exist on one side, so framework rows naturally don't
# appear in OpenCV pairwise tables.
- name: Build & bench opencv-mark (single-threaded, FHD × 20)
if: always()
id: bench_opencv
run: |
set -euo pipefail
mkdir -p build-opencv-bench
cd build-opencv-bench
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --target opencv-mark -j$(nproc)
test -x opencv-mark/opencv-mark \
|| { echo "ERROR: opencv-mark not built — OpenCV detection failed in compare job"; exit 1; }
./opencv-mark/opencv-mark --validate-timing
./opencv-mark/opencv-mark --feature-set vision,enhanced_vision \
--resolution FHD --iterations 20 --warmup 5 --threads 1 \
--output-dir results
./opencv-mark/opencv-mark --dump-outputs dump-opencv --seed 42
# ----- Cross-impl numerical verification -----
#
# We have one dump-* directory per impl that produced a build.
# Run scripts/cross_verify_outputs.py for each (opencv, openvx)
# pair so a reviewer can see at a glance whether MIVisionX,
# Khronos sample, and rustVX agree with OpenCV at the pixel
# level — proves the timing comparison rows below are honest
# apples-to-apples and not "OpenCV is faster because it's
# silently computing the wrong thing".
#
# The verifier exits non-zero on any kernel exceeding its
# per-kernel tolerance; we collect all three reports into the
# step summary first, then fail the step at the end if any
# report failed. That way a single divergence on one impl
# doesn't hide the other two impls' results.
- name: Cross-impl output verification (OpenCV ↔ each OpenVX impl)
if: always()
run: |
set -euo pipefail
# numpy is the only Python dep — used by the verifier for
# array compare + PSNR. apt's python3-numpy on ubuntu-22.04
# is fine and avoids a pip wheel download.
sudo apt-get install -y python3-numpy
mkdir -p comparisons
OPENCV_DUMP=build-opencv-bench/dump-opencv
{
echo ""
echo "---"
echo ""
echo "## Cross-impl numerical verification"
echo ""
echo "Sentinel kernel suite (VGA × 1 run, no timing) dumped by"
echo "\`--dump-outputs\` on each binary; \`scripts/cross_verify_outputs.py\`"
echo "loads both dumps and computes max-abs-diff + PSNR + exact-%"
echo "per kernel. Tolerances are tuned per kernel (see \`RULES\` in"
echo "the script). Numbers prove inputs are byte-identical (the"
echo "\`_input_u8\` row) and kernels are semantically equivalent."
echo ""
} >> "$GITHUB_STEP_SUMMARY"
OVERALL=0
for impl in mivisionx khronos rustvx; do
VX_DUMP="build-${impl}/dump-${impl}"
if [ ! -d "$OPENCV_DUMP" ] || [ ! -d "$VX_DUMP" ]; then
echo "skipping verify for $impl: missing dump dir ($VX_DUMP or $OPENCV_DUMP)"
echo "_Skipped \`$impl\` verify — dump directory missing._" >> "$GITHUB_STEP_SUMMARY"
continue
fi
set +e
python3 scripts/cross_verify_outputs.py \
"$OPENCV_DUMP" "$VX_DUMP" \
--left-label "OpenCV" --right-label "${impl}" \
--json comparisons/cross-verify-${impl}.json \
>> "$GITHUB_STEP_SUMMARY"
rc=$?
set -e
if [ "$rc" -ne 0 ]; then OVERALL=1; fi
echo "" >> "$GITHUB_STEP_SUMMARY"
done
# Surface OVERALL into a step-level marker — the job stays
# green on a divergence (so reviewers still see the timing
# comparison) but the row is annotated and an artifact link
# is uploaded below.
if [ "$OVERALL" -ne 0 ]; then
echo "::warning::Cross-impl verification flagged ≥1 divergence — see job summary"
fi
# ----- Pairwise comparisons -----
#
# Each comparison is oriented as "<candidate> over <baseline>" so
# the speedup column reads as `candidate / baseline` (>1.00x =
# candidate is faster). The orientation is deliberate:
#
# OpenVX-vs-OpenVX trio — "how much faster is the more-tuned
# impl than the reference":
# * MIVisionX over Khronos sample (AMD over reference)
# * MIVisionX over rustVX (AMD over Rust impl)
# * rustVX over Khronos sample (Rust impl over reference)
#
# OpenVX-vs-OpenCV trio — "does adopting OpenVX pay off vs cv::":
# * MIVisionX over OpenCV
# * Khronos sample over OpenCV
# * rustVX over OpenCV
#
# Mechanically, `scripts/compare_reports.py` computes
# speedup = throughput(arg2) / throughput(arg1)
# so the candidate is passed as the SECOND positional arg.
#
# The step does two things:
# 1. Runs `compare_reports.py` once per pair to produce a
# per-kernel detail .md in comparisons/. These also become
# the `benchmark-comparisons` artifact for downstream tools.
# 2. Invokes `scripts/ci_pairwise_summary.py` once to render
# an organized GitHub Step Summary — TL;DR speedup matrix
# at top, two grouped headline tables, and the per-kernel
# detail tables collapsed inside <details> blocks. See the
# script docstring for the config schema; this used to be a
# ~115-line bash + inline-Python block and rendered ~600
# lines into the summary.
- name: Pairwise comparisons
if: always()
run: |
set -euo pipefail
mkdir -p comparisons
# Per-impl JSON report paths (parallel arrays keyed by impl id).
IDS=(mivisionx khronos rustvx opencv)
PATHS=(
"build-mivisionx/results/benchmark_results.json"
"build-khronos/results/benchmark_results.json"
"build-rustvx/results/benchmark_results.json"
"build-opencv-bench/results/benchmark_results.json"
)
LABELS=(
"MIVisionX (AMD OpenVX)"
"Khronos sample"
"rustVX"
"OpenCV"
)
# The 6 pairs, "<candidate> <baseline>". Order matches the
# rendered summary table order: OpenVX-vs-OpenCV (headline
# question) first, then OpenVX-vs-OpenVX.
PAIRS=(
"mivisionx opencv"
"khronos opencv"
"rustvx opencv"
"mivisionx khronos"
"mivisionx rustvx"
"rustvx khronos"
)
# Phase 1 — per-kernel detail .md per pair where both inputs
# exist. Missing-input pairs are silently skipped here; the
# summary script renders a friendly "_Detail missing_" note
# for them inside the collapsed <details> block.
path_of() {
for i in "${!IDS[@]}"; do
if [ "${IDS[$i]}" = "$1" ]; then echo "${PATHS[$i]}"; return; fi
done
}
for pair in "${PAIRS[@]}"; do
read -r CAND BASE <<< "$pair"
CAND_PATH=$(path_of "$CAND")
BASE_PATH=$(path_of "$BASE")
OUT="comparisons/${CAND}-over-${BASE}"
if [ -f "$CAND_PATH" ] && [ -f "$BASE_PATH" ]; then
python3 scripts/compare_reports.py "$BASE_PATH" "$CAND_PATH" --output "$OUT"
else
echo "Skipping detail for ${CAND}-over-${BASE}: missing ${CAND_PATH} or ${BASE_PATH}"
fi
done
# Phase 2 — render the organized step summary. The config
# below is the only place pair-grouping & intent text lives;
# the helper handles matrix rendering, headline tables, and
# the collapsed <details> blocks.
cat > /tmp/pairwise-config.json <<'JSON'
{
"reports": {
"mivisionx": {"label": "MIVisionX (AMD OpenVX)", "path": "build-mivisionx/results/benchmark_results.json"},
"khronos": {"label": "Khronos sample", "path": "build-khronos/results/benchmark_results.json"},
"rustvx": {"label": "rustVX", "path": "build-rustvx/results/benchmark_results.json"},
"opencv": {"label": "OpenCV", "path": "build-opencv-bench/results/benchmark_results.json"}
},
"groups": [
{
"title": "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?",
"intent": "Speedup reads as `<OpenVX impl> / OpenCV`. Values >1.00x mean adopting that OpenVX impl pays off vs writing the equivalent directly in OpenCV — the headline question this comparison phase exists to answer. Ordered most-tuned (MIVisionX) → reference (Khronos sample) → Rust impl (rustVX) so the table walks the realistic best→worst range of the trade-off.",
"pairs": [["mivisionx", "opencv"], ["khronos", "opencv"], ["rustvx", "opencv"]]
},
{
"title": "OpenVX-vs-OpenVX — cross-implementation",
"intent": "Speedup reads as `<candidate> / <baseline>`. MIVisionX (AMD, most-tuned) compared against both reference impls, then rustVX vs Khronos sample (Rust impl over reference).",
"pairs": [["mivisionx", "khronos"], ["mivisionx", "rustvx"], ["rustvx", "khronos"]]
}
],
"detail_dir": "comparisons"
}
JSON
python3 scripts/ci_pairwise_summary.py --config /tmp/pairwise-config.json \
>> "$GITHUB_STEP_SUMMARY"
echo "--- comparison artifacts ---"
ls -la comparisons/ || true
- name: Upload per-impl benchmark results
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: |
build-mivisionx/results/
build-khronos/results/
build-rustvx/results/
build-opencv-bench/results/
if-no-files-found: ignore
- name: Upload pairwise comparisons
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-comparisons
path: comparisons/
if-no-files-found: ignore
# Sentinel kernel dumps — uploaded so a reviewer can re-run
# `scripts/cross_verify_outputs.py` locally against any pair
# without re-running the whole CI build, and so the raw .bin
# files are inspectable after the fact for any divergence the
# verifier flagged.
- name: Upload sentinel output dumps
if: always()
uses: actions/upload-artifact@v4
with:
name: cross-verify-dumps
path: |
build-mivisionx/dump-mivisionx/
build-khronos/dump-khronos/
build-rustvx/dump-rustvx/
build-opencv-bench/dump-opencv/
if-no-files-found: ignore