v1.1: Vision 42/42 + Enhanced Vision 19/19 across openvx-mark, opencv-mark, and rustVX #77

Workflow file for this run

	name: CI

	on:
	push:
	branches: [main]
	# Run CI on pull requests targeting any base branch, not just main.
	# This keeps stacked PR workflows covered (a PR's base may be another
	# feature branch, e.g. an umbrella branch or a previous PR in a stack).
	pull_request:

	# Auto-cancel superseded runs on the same ref so a rapid push series
	# (e.g. force-push during PR review) doesn't queue 3+ stale runs and
	# starve the GitHub Actions runner pool. main pushes are exempt — we
	# always want a clean signal on main.
	concurrency:
	group: ${{ github.workflow }}-${{ github.ref }}
	cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

	# ============================================================================
	# Architecture
	#
	# Phase 1 (parallel) — four independent build jobs:
	# * Three OpenVX-impl jobs (MIVisionX, Khronos sample, rustVX). Each:
	# 1. Builds the implementation from source.
	# 2. Stages a self-contained artifact: <impl>-stage/lib + <impl>-stage/include.
	# 3. Builds openvx-mark against the just-built impl.
	# 4. Runs a quick smoke benchmark as a "local unit test" — catches
	# build-link breakage and missing-symbol issues immediately,
	# scoped to the specific impl, without waiting for the slower
	# comparison job downstream.
	# 5. Uploads the staged artifact for the comparison job to consume.
	# * One OpenCV-baseline job (opencv-mark companion binary). Differs from
	# the OpenVX jobs because OpenCV is apt-installable and opencv-mark has
	# no OpenVX dependency — see the build-opencv job below for the shape.
	# Stages its smoke JSON directly (no impl tarball needed).
	#
	# Per-impl feature-set policy
	# ---------------------------
	# Not every impl ships the full OpenVX 1.3.1 conformance surface, so each
	# bench is scoped to the feature sets that impl actually implements:
	#
	# * MIVisionX — `vision,framework`. AMD's runtime exports the 42
	# Vision Conformance kernels but **does NOT export
	# most of the 19 Enhanced Vision APIs** (Bilateral-
	# Filter, HOG, Tensor, Select, ScalarOperation,
	# etc.). With `enhanced_vision` enabled, the per-
	# benchmark dlsym shim in openvx_optional_apis.h
	# would dutifully report 19 SKIPPED rows, which is
	# accurate but uninformative noise on every run —
	# so we omit it.
	# * Khronos sample — `vision,enhanced_vision,framework`. CTS-conformant
	# reference impl; ships both profiles.
	# * rustVX — `vision,enhanced_vision,framework`. CTS-conformant
	# for Vision (5923/5923) and Enhanced Vision
	# (1235/1235) per the rustVX README.
	# * opencv-mark — `vision,enhanced_vision` (no `framework`; cv:: has
	# no graph runtime to measure). All 79 + 19 = 98
	# OpenCV-side benchmarks run.
	#
	# Phase 2 (single job, depends on all four Phase-1 jobs) — comparison.
	# 1. Downloads all three OpenVX impl artifacts onto a single runner;
	# apt-installs OpenCV on that same runner.
	# 2. Builds openvx-mark × 3 (one per OpenVX impl) so all binaries link
	# against the same openvx-mark source tree at the same commit.
	# Builds opencv-mark from the same source tree.
	# 3. Runs the full benchmark against each impl using that impl's
	# feature-set policy (above). Same hardware = fair cross-vendor
	# comparison. `compare_reports.py` joins by (name, mode, resolution)
	# and silently drops rows not on both sides, so enhanced_vision
	# rows naturally appear in pairs where both impls produced them
	# (Khronos↔OpenCV, rustVX↔OpenCV, Khronos↔rustVX) and are absent
	# from MIVisionX↔* pairs.
	# 4. Generates six pairwise comparison reports:
	# OpenVX-vs-OpenVX:
	# * MIVisionX vs Khronos sample
	# * MIVisionX vs rustVX
	# * Khronos sample vs rustVX
	# OpenVX-vs-OpenCV (the "does adopting OpenVX pay off?" trio):
	# * MIVisionX vs OpenCV
	# * Khronos sample vs OpenCV
	# * rustVX vs OpenCV
	# 5. Posts each report to the job summary and uploads as an artifact.
	#
	# Inspired by the layered build/perf-gate design in rustVX's conformance CI:
	# https://github.com/kiritigowda/rustVX/blob/main/.github/workflows/conformance.yml
	# ============================================================================

	jobs:
	# --------------------------------------------------------------------------
	# Phase 1 — MIVisionX (AMD OpenVX, CPU backend)
	# --------------------------------------------------------------------------
	build-mivisionx:
	name: Build MIVisionX (CPU) + smoke test
	runs-on: ubuntu-22.04
	steps:
	- name: Checkout openvx-mark
	uses: actions/checkout@v4

	- name: Install dependencies
	run: \|
	sudo apt-get update
	sudo apt-get install -y build-essential cmake git python3

	# Why -DCMAKE_CXX_FLAGS_RELEASE override (the "optimized kernels" knob):
	#
	# MIVisionX's amd_openvx/openvx/ago/ago_haf_cpu_*.cpp files contain
	# hand-written AVX2 intrinsics (_mm256_*) for the CPU-side "Hardware
	# Acceleration Functions" — these are the OPTIMIZED kernel paths.
	# However, MIVisionX's own top-level CMakeLists.txt appends ONLY
	# `-msse4.2` to CMAKE_CXX_FLAGS, with no -mavx2/-mfma and no
	# __attribute__((target("avx2"))) on any function. With just -msse4.2
	# the compiler can still emit the AVX2 intrinsics in those specific
	# call sites, but it CANNOT auto-vectorise the surrounding scalar /
	# loop code beyond SSE4.2, can't use FMA, can't use BMI/BMI2 — so
	# the per-kernel dispatch glue, address arithmetic, and any kernel
	# code that's not hand-vectorised stays at SSE4.2 throughput. That's
	# the "base kernel" path the umbrella PR description points at.
	#
	# By overriding CMAKE_CXX_FLAGS_RELEASE we get -O3 -DNDEBUG plus
	# x86-64-v3 (= SSE4.2 + AVX + AVX2 + BMI + BMI2 + FMA + LZCNT + POPCNT),
	# which is the conservative-portable AMD64 baseline modern compilers
	# ship for since gcc 11. GitHub Actions Ubuntu 22.04 runners use Intel
	# Xeon or AMD EPYC CPUs which all support x86-64-v3.
	#
	# MIVisionX still appends `-msse4.2` to CMAKE_CXX_FLAGS (we don't
	# override CMAKE_CXX_FLAGS, only the per-config Release variant),
	# so the final compile line is "-O3 -DNDEBUG -march=x86-64-v3
	# -msse4.2". -march wins for code-gen ceiling; the dup -msse4.2 is
	# redundant but harmless.
	- name: Build MIVisionX (CPU backend, optimized)
	run: \|
	set -euo pipefail
	git clone --depth 1 --branch develop \
	https://github.com/ROCm/MIVisionX.git /tmp/mivisionx-src
	mkdir -p /tmp/mivisionx-src/build
	cd /tmp/mivisionx-src/build
	cmake \
	-DBACKEND=CPU \
	-DNEURAL_NET=OFF \
	-DLOOM=OFF \
	-DMIGRAPHX=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -march=x86-64-v3" \
	-DCMAKE_INSTALL_PREFIX=/tmp/mivisionx-install \
	..
	make -j$(nproc)
	make install
	# Sanity-print the actual compile flags the make rules used —
	# surfaces in CI logs so a reviewer can confirm AVX2 made it
	# into the build (look for `-march=x86-64-v3` in the cmake echo).
	grep -h 'CXX_FLAGS' CMakeFiles/openvx.dir/flags.make 2>/dev/null \
	\| head -2 \|\| true

	- name: Stage MIVisionX artifact
	id: stage
	run: \|
	set -euo pipefail
	mkdir -p mivisionx-stage/lib mivisionx-stage/include
	LIB_SRC=$(dirname "$(find /tmp/mivisionx-install -name 'libopenvx.so' \| head -1)")
	echo "MIVisionX libraries discovered in: $LIB_SRC"
	# Copy ALL libopenvx* / libvxu* entries (libopenvx.so symlink,
	# libopenvx.so.1 SONAME symlink, libopenvx.so.X.Y.Z real file)
	# preserving symlinks (-P) so ld.so can follow the SONAME chain.
	# Without versioned files the linker reports
	# "libopenvx.so.1: cannot open shared object file".
	find "$LIB_SRC" -maxdepth 1 -name 'libopenvx*' -exec cp -P {} mivisionx-stage/lib/ \;
	find "$LIB_SRC" -maxdepth 1 -name 'libvxu*' -exec cp -P {} mivisionx-stage/lib/ \;
	cp -r /tmp/mivisionx-install/include/mivisionx/. mivisionx-stage/include/
	echo "--- staged lib ---"
	ls -la mivisionx-stage/lib
	echo "--- staged include (top-level) ---"
	ls -la mivisionx-stage/include
	{
	echo "lib_dir=$(pwd)/mivisionx-stage/lib"
	echo "include_dir=$(pwd)/mivisionx-stage/include"
	} >> "$GITHUB_OUTPUT"

	- name: Build openvx-mark (smoke)
	run: \|
	set -euo pipefail
	mkdir -p build-smoke
	cd build-smoke
	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DOPENVX_INCLUDES=${{ steps.stage.outputs.include_dir }} \
	-DOPENVX_LIB_DIR=${{ steps.stage.outputs.lib_dir }} \
	..
	cmake --build . -j$(nproc)

	# Smoke covers the `vision` + `framework` feature sets only.
	# MIVisionX's runtime exports the 42 Vision Conformance kernels
	# but does NOT export most of the 19 Enhanced Vision APIs
	# (BilateralFilter, HOG, Tensor, Select, ScalarOperation, etc.).
	# With `enhanced_vision` enabled, the per-benchmark dlsym shim in
	# openvx_optional_apis.h would dutifully report 19 SKIPPED rows
	# on every run — accurate but uninformative noise. The Khronos
	# sample, rustVX, and opencv-mark smoke jobs DO exercise
	# `enhanced_vision` because those impls actually ship it.
	- name: Run smoke benchmark (vision + framework, VGA × 5 iters, single-threaded)
	# Smoke is advisory — if a specific impl crashes inside a
	# specific kernel the artifact upload (which the compare job
	# depends on) must still happen so vendor-vs-vendor signal
	# isn't lost.
	continue-on-error: true
	run: \|
	set -eo pipefail
	cd build-smoke
	export LD_LIBRARY_PATH=${{ steps.stage.outputs.lib_dir }}:${LD_LIBRARY_PATH:-}
	# Timer self-test up front so a sloppy runner clock fails
	# loud before we trust a smoke timing number.
	./openvx-mark --validate-timing
	# `--threads 1` matches the Phase 2 compare config — same
	# apples-to-apples threading policy on smoke and full bench
	# so smoke timings are interpretable as a coarse preview.
	./openvx-mark --feature-set vision,framework \
	--resolution VGA --iterations 5 --warmup 1 --threads 1 \
	--output-dir smoke-results

	- name: Upload MIVisionX artifact
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: impl-mivisionx
	path: mivisionx-stage/
	retention-days: 1

	- name: Upload MIVisionX smoke results
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: smoke-results-mivisionx
	path: build-smoke/smoke-results/
	if-no-files-found: ignore

	# --------------------------------------------------------------------------
	# Phase 1 — Khronos OpenVX sample implementation
	# --------------------------------------------------------------------------
	build-khronos-sample:
	name: Build Khronos sample + smoke test
	runs-on: ubuntu-22.04
	steps:
	- name: Checkout openvx-mark
	uses: actions/checkout@v4

	- name: Install dependencies
	run: \|
	sudo apt-get update
	sudo apt-get install -y build-essential cmake git python3

	# Khronos sample is a reference impl (no SIMD intrinsics), so
	# most of the perf budget rides on whatever compiler auto-vec the
	# build picks up. Build.py honours CFLAGS / CXXFLAGS from the
	# environment, so we use those to upgrade the compile baseline
	# to x86-64-v3 (= AVX2 + FMA + BMI2 + LZCNT + POPCNT), matching
	# what the MIVisionX build above gets. No fairness claim that
	# the sample becomes "competitive" — it's a reference — just
	# that it's being measured at the SAME compile baseline as
	# MIVisionX so the cross-impl comparison isn't contaminated by
	# one side getting better auto-vec than the other.
	- name: Build Khronos OpenVX sample (Release, x86-64-v3)
	run: \|
	set -euo pipefail
	git clone --recursive --depth 1 \
	https://github.com/KhronosGroup/OpenVX-sample-impl.git /tmp/khronos-src
	cd /tmp/khronos-src
	export CFLAGS="-O3 -march=x86-64-v3 ${CFLAGS:-}"
	export CXXFLAGS="-O3 -march=x86-64-v3 ${CXXFLAGS:-}"
	echo "CFLAGS = ${CFLAGS}"
	echo "CXXFLAGS= ${CXXFLAGS}"
	python3 Build.py --os=Linux --arch=64 --conf=Release

	- name: Stage Khronos sample artifact
	id: stage
	run: \|
	set -euo pipefail
	mkdir -p khronos-stage/lib khronos-stage/include
	LIB_SRC=$(dirname "$(find /tmp/khronos-src -name 'libopenvx.so' -not -path '/build/' \| head -1)")
	echo "Khronos libraries discovered in: $LIB_SRC"
	# Same approach as MIVisionX: copy all libopenvx* / libvxu* entries
	# preserving symlinks so ld.so can follow the SONAME chain.
	find "$LIB_SRC" -maxdepth 1 -name 'libopenvx*' -exec cp -P {} khronos-stage/lib/ \;
	find "$LIB_SRC" -maxdepth 1 -name 'libvxu*' -exec cp -P {} khronos-stage/lib/ \;
	cp -r /tmp/khronos-src/api-docs/include/. khronos-stage/include/
	echo "--- staged lib ---"
	ls -la khronos-stage/lib
	echo "--- staged include (top-level) ---"
	ls -la khronos-stage/include
	{
	echo "lib_dir=$(pwd)/khronos-stage/lib"
	echo "include_dir=$(pwd)/khronos-stage/include"
	} >> "$GITHUB_OUTPUT"

	- name: Build openvx-mark (smoke)
	run: \|
	set -euo pipefail
	mkdir -p build-smoke
	cd build-smoke
	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DOPENVX_INCLUDES=${{ steps.stage.outputs.include_dir }} \
	-DOPENVX_LIB_DIR=${{ steps.stage.outputs.lib_dir }} \
	..
	cmake --build . -j$(nproc)

	# Khronos sample is a CTS-conformant reference impl that ships
	# both the Vision (42 kernels) and Enhanced Vision (19 kernels)
	# profiles, so we exercise `vision,enhanced_vision,framework` at
	# smoke time. `continue-on-error: true` keeps the artifact upload
	# alive if any specific kernel crashes mid-run; the comparison job
	# downstream handles whichever JSON files actually got produced.
	- name: Run smoke benchmark (vision + enhanced_vision + framework, VGA × 5 iters)
	continue-on-error: true
	run: \|
	set -eo pipefail
	cd build-smoke
	export LD_LIBRARY_PATH=${{ steps.stage.outputs.lib_dir }}:${LD_LIBRARY_PATH:-}
	./openvx-mark --validate-timing
	./openvx-mark --feature-set vision,enhanced_vision,framework \
	--resolution VGA --iterations 5 --warmup 1 --threads 1 \
	--output-dir smoke-results

	- name: Upload Khronos sample artifact
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: impl-khronos-sample
	path: khronos-stage/
	retention-days: 1

	- name: Upload Khronos sample smoke results
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: smoke-results-khronos-sample
	path: build-smoke/smoke-results/
	if-no-files-found: ignore

	# --------------------------------------------------------------------------
	# Phase 1 — rustVX (Rust OpenVX implementation)
	#
	# rustVX ships a single libopenvx_ffi.so that exports the full vx/vxu
	# symbol set. openvx-mark's CMake uses find_library(NAMES openvx) and
	# find_library(NAMES vxu) — so we symlink the two classic Khronos lib
	# names to the FFI .so during staging, without modifying rustVX's own
	# build output.
	#
	# SIMD config: AVX2 + `-C target-cpu=x86-64-v3`, matching what rustVX's
	# own CI ships. We deliberately skip the alignment-pad RUSTFLAGS used in
	# rustVX's PR-vs-main perf gate — those exist to make rustVX-vs-rustVX
	# bench numbers invariant to .text shifts, which is irrelevant for the
	# vendor-vs-vendor comparison this workflow runs.
	# --------------------------------------------------------------------------
	build-rustvx:
	name: Build rustVX + smoke test
	runs-on: ubuntu-22.04
	steps:
	- name: Checkout openvx-mark
	uses: actions/checkout@v4

	- name: Install dependencies
	run: \|
	sudo apt-get update
	sudo apt-get install -y build-essential cmake git

	- name: Install Rust toolchain
	run: \|
	set -euo pipefail
	curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \
	\| sh -s -- -y --default-toolchain stable
	source "$HOME/.cargo/env"
	rustc --version
	cargo --version

	- name: Build rustVX (release, AVX2)
	run: \|
	set -euo pipefail
	source "$HOME/.cargo/env"
	git clone --depth 1 \
	https://github.com/kiritigowda/rustVX.git /tmp/rustvx-src
	cd /tmp/rustvx-src
	case "$(uname -m)" in
	x86_64\|amd64)
	FEATURES="openvx-core/sse2 openvx-core/avx2 openvx-vision/sse2 openvx-vision/avx2"
	export RUSTFLAGS="-C target-cpu=x86-64-v3"
	;;
	aarch64\|arm64)
	FEATURES="openvx-core/neon openvx-vision/neon"
	export RUSTFLAGS=""
	;;
	*)
	FEATURES=""
	export RUSTFLAGS=""
	;;
	esac
	echo "Architecture : $(uname -m)"
	echo "Cargo features: ${FEATURES:-<none>}"
	echo "RUSTFLAGS : ${RUSTFLAGS:-<none>}"
	if [ -n "$FEATURES" ]; then
	cargo build --release -p openvx-ffi --features "$FEATURES"
	else
	cargo build --release -p openvx-ffi
	fi

	- name: Stage rustVX artifact (with libopenvx / libvxu symlinks)
	id: stage
	run: \|
	set -euo pipefail
	mkdir -p rustvx-stage/lib rustvx-stage/include
	cp /tmp/rustvx-src/target/release/libopenvx_ffi.so rustvx-stage/lib/
	# Classic Khronos library names so openvx-mark's find_library picks
	# them up. Symlinks survive upload-artifact@v4 (it preserves them
	# within tar), so the comparison job downstream sees the same.
	(
	cd rustvx-stage/lib
	ln -sf libopenvx_ffi.so libopenvx.so
	ln -sf libopenvx_ffi.so libvxu.so
	)
	cp -r /tmp/rustvx-src/include/. rustvx-stage/include/
	echo "--- staged lib ---"
	ls -la rustvx-stage/lib
	echo "--- staged include (top-level) ---"
	ls -la rustvx-stage/include
	{
	echo "lib_dir=$(pwd)/rustvx-stage/lib"
	echo "include_dir=$(pwd)/rustvx-stage/include"
	} >> "$GITHUB_OUTPUT"

	- name: Build openvx-mark (smoke)
	run: \|
	set -euo pipefail
	mkdir -p build-smoke
	cd build-smoke
	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DOPENVX_INCLUDES=${{ steps.stage.outputs.include_dir }} \
	-DOPENVX_LIB_DIR=${{ steps.stage.outputs.lib_dir }} \
	..
	cmake --build . -j$(nproc)

	# rustVX is CTS-conformant for both Vision (5923/5923) and
	# Enhanced Vision (1235/1235), so we exercise the full
	# `vision,enhanced_vision,framework` surface at smoke time. This
	# is the impl that gives the headline "all 19 enhanced_vision
	# kernels produce real measurements" cell in the comparison
	# table — every other OpenVX backend either omits the profile
	# (MIVisionX) or has known per-kernel quirks.
	- name: Run smoke benchmark (vision + enhanced_vision + framework, VGA × 5 iters)
	continue-on-error: true
	run: \|
	set -eo pipefail
	cd build-smoke
	export LD_LIBRARY_PATH=${{ steps.stage.outputs.lib_dir }}:${LD_LIBRARY_PATH:-}
	./openvx-mark --validate-timing
	./openvx-mark --feature-set vision,enhanced_vision,framework \
	--resolution VGA --iterations 5 --warmup 1 --threads 1 \
	--output-dir smoke-results

	- name: Upload rustVX artifact
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: impl-rustvx
	path: rustvx-stage/
	retention-days: 1

	- name: Upload rustVX smoke results
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: smoke-results-rustvx
	path: build-smoke/smoke-results/
	if-no-files-found: ignore

	# --------------------------------------------------------------------------
	# Phase 1 — OpenCV baseline (companion binary `opencv-mark`)
	#
	# OpenCV is the de facto vision baseline. This job exists so we can answer
	# "does adopting OpenVX actually pay off vs the cv:: code I already have?"
	# at the per-kernel level, on the same CI hardware as every OpenVX impl.
	#
	# Differs from the OpenVX impl jobs in two ways:
	# 1. OpenCV is apt-installable (no from-source build), so this job is
	# much shorter — install, configure parent CMake, build, smoke.
	# 2. There is no impl-tarball staging step. opencv-mark IS the binary
	# that runs the OpenCV-side measurements; there is no separate
	# "link openvx-mark against this libopenvx.so" rebuild downstream.
	# The Phase 2 comparison job re-runs opencv-mark itself (after a
	# fresh apt-install of OpenCV) for strict same-runner fairness vs
	# the per-impl benches — see compare job's `Build & bench
	# opencv-mark` step.
	#
	# The smoke run here is fast feedback only (catches build/link breakage
	# in <1 min on every PR); the comparison-grade FHD × 20 iter benchmark
	# lives in Phase 2 alongside the OpenVX impl benches.
	# --------------------------------------------------------------------------
	build-opencv:
	name: Build opencv-mark (OpenCV baseline) + smoke test
	runs-on: ubuntu-22.04
	steps:
	- name: Checkout openvx-mark
	uses: actions/checkout@v4

	- name: Install dependencies (OpenCV 4 from apt)
	run: \|
	sudo apt-get update
	sudo apt-get install -y build-essential cmake git python3 \
	libopencv-dev
	# Sanity-print the OpenCV version that pkg-config sees so
	# comparison reports later can be cross-referenced against
	# exactly this version string.
	pkg-config --modversion opencv4 \|\| true

	- name: Configure & build opencv-mark
	run: \|
	set -euo pipefail
	mkdir -p build-opencv
	cd build-opencv
	# Parent CMake auto-includes opencv-mark/ when OpenCV is found.
	# No OPENVX_* flags needed — opencv-mark has no OpenVX dep.
	cmake -DCMAKE_BUILD_TYPE=Release ..
	cmake --build . --target opencv-mark -j$(nproc)
	# Fail loudly if the binary somehow didn't get produced (e.g.
	# OpenCV detection silently no-op'd). This is the exact failure
	# mode that PR #1's first CI run was missing.
	test -x opencv-mark/opencv-mark \
	\|\| { echo "ERROR: opencv-mark binary not built — OpenCV likely not detected by CMake"; exit 1; }
	# `--help` doubles as a version probe — it prints the opencv-mark
	# version line and the linked OpenCV version up top. PR1's CLI
	# does not implement a dedicated `--version` flag yet.
	./opencv-mark/opencv-mark --help \| head -3

	# Same shape as the OpenVX-impl smokes (VGA × 5 iters, 1 warmup)
	# so timing noise stays comparable. Not continue-on-error —
	# opencv-mark has no impl-side quirks to tolerate; if a kernel
	# breaks here it's our bug.
	#
	# Feature-set: `vision,enhanced_vision`. opencv-mark has 1:1
	# coverage of both profiles (42 vision + 19 enhanced = 61
	# kernels) — that's the entire OpenCV-side surface this CI
	# exercises. `framework` is intentionally omitted (OpenCV has
	# no graph runtime to measure; the framework benches that
	# depend on `vxProcessGraph` semantics are OpenVX-only).
	- name: Run smoke benchmark (vision + enhanced_vision, VGA × 5 iters)
	run: \|
	set -eo pipefail
	cd build-opencv
	# Timer self-test up front — same gate that runs in the
	# Phase 2 compare job. Catches a borked runner clock at
	# smoke time so we don't waste a full FHD bench cycle.
	./opencv-mark/opencv-mark --validate-timing
	# `--threads 1` for symmetry with the smokes that run
	# against single-threaded OpenVX impls — keeps the smoke
	# comparable in shape to the cross-impl ones, even though
	# the smoke itself is just a "did it build & did it run?"
	# check, not a perf claim.
	./opencv-mark/opencv-mark --feature-set vision,enhanced_vision \
	--resolution VGA --iterations 5 --warmup 1 --threads 1 \
	--output-dir smoke-results

	- name: Upload opencv-mark smoke results
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: smoke-results-opencv
	path: build-opencv/smoke-results/
	if-no-files-found: ignore

	# --------------------------------------------------------------------------
	# Phase 2 — Pairwise comparison
	#
	# Pulls all three OpenVX implementation artifacts onto the same runner,
	# plus apt-installs OpenCV, so every benchmark is exercised on identical
	# hardware. Builds openvx-mark once per OpenVX impl (against this commit's
	# source tree, not pre-built artifacts — keeps the comparison binary
	# identical apart from the linked OpenVX lib), builds opencv-mark from
	# the same source tree, runs the full feature-set bench against each,
	# and emits six pairwise comparison reports:
	#
	# OpenVX-vs-OpenVX (3):
	# * MIVisionX over Khronos sample — AMD over reference
	# * MIVisionX over rustVX — AMD over Rust impl
	# * rustVX over Khronos sample — Rust impl over reference
	#
	# OpenVX-vs-OpenCV (3) — "does adopting OpenVX pay off?":
	# * MIVisionX over OpenCV — best-tuned OpenVX vs cv::
	# * Khronos sample over OpenCV — reference OpenVX vs cv::
	# * rustVX over OpenCV — Rust OpenVX vs cv::
	#
	# `if: always()` + per-download `continue-on-error` + per-bench
	# `if: always() && steps.detect...` so a single failed build still
	# surfaces the comparison signal for whichever other impls are
	# available, instead of losing all visibility.
	# --------------------------------------------------------------------------
	compare:
	name: Pairwise comparison (MIVisionX, Khronos, rustVX, OpenCV)
	runs-on: ubuntu-22.04
	needs:
	- build-mivisionx
	- build-khronos-sample
	- build-rustvx
	- build-opencv
	if: always()
	steps:
	- name: Checkout openvx-mark
	uses: actions/checkout@v4

	- name: Install dependencies
	run: \|
	sudo apt-get update
	# libopencv-dev is needed so the Phase 2 `Build & bench
	# opencv-mark` step can re-link opencv-mark on this runner.
	# Strictly same-hardware fairness vs the per-impl benches.
	sudo apt-get install -y build-essential cmake git python3 \
	libopencv-dev
	pkg-config --modversion opencv4 \|\| true

	- name: Download MIVisionX artifact
	uses: actions/download-artifact@v4
	with:
	name: impl-mivisionx
	path: ${{ github.workspace }}/impl/mivisionx
	continue-on-error: true

	- name: Download Khronos sample artifact
	uses: actions/download-artifact@v4
	with:
	name: impl-khronos-sample
	path: ${{ github.workspace }}/impl/khronos
	continue-on-error: true

	- name: Download rustVX artifact
	uses: actions/download-artifact@v4
	with:
	name: impl-rustvx
	path: ${{ github.workspace }}/impl/rustvx
	continue-on-error: true

	- name: Detect available implementations
	id: detect
	run: \|
	set -euo pipefail
	for impl in mivisionx khronos rustvx; do
	lib="${{ github.workspace }}/impl/$impl/lib/libopenvx.so"
	if [ -e "$lib" ]; then
	echo "$impl: AVAILABLE ($lib)"
	chmod -R u+rwX "${{ github.workspace }}/impl/$impl/lib"
	echo "${impl}=true" >> "$GITHUB_OUTPUT"
	else
	echo "$impl: MISSING (artifact download failed or build job did not produce it)"
	echo "${impl}=false" >> "$GITHUB_OUTPUT"
	fi
	done

	# ----- Per-impl build + benchmark (FHD, 20 iter, 5 warmup) -----
	#
	# Each per-impl bench uses `if: always() && steps.detect...` because
	# GitHub Actions treats any explicit `if:` without `always()` as
	# implicit `success()` — meaning a crash in MIVisionX bench would
	# skip the Khronos / rustVX bench steps entirely and we'd lose all
	# comparison signal. With `always()` the three benches stay
	# independent and the comparison job downstream handles whichever
	# JSON files actually got produced.
	#
	# `--threads 1` is passed EXPLICITLY (it's also the default — but
	# we want the CI compare config to be self-documenting). Rationale:
	#
	# * MIVisionX CPU backend, Khronos sample, and rustVX are all
	# fundamentally single-threaded per kernel — none of them have
	# an internal thread pool on the CPU path.
	# * OpenCV, by contrast, will happily spawn nproc threads via
	# TBB/OpenMP if left at its default. Without the `--threads 1`
	# pin, the OpenCV side would get an unfair (nproc)x parallelism
	# boost just from defaults — the comparison would no longer be
	# "OpenVX kernel vs OpenCV kernel" but "1-thread OpenVX vs
	# n-thread OpenCV". `--threads 1` calls cv::setNumThreads(1)
	# for opencv-mark and sets OMP_NUM_THREADS=1 in the env for
	# anything OpenMP-using downstream.
	#
	# Feature set is per-impl (see the architecture comment block
	# at the top of this file for the full policy):
	# * MIVisionX — `vision,framework` (no enhanced_vision;
	# AMD's runtime doesn't export the APIs)
	# * Khronos sample — `vision,enhanced_vision,framework`
	# * rustVX — `vision,enhanced_vision,framework`
	# * opencv-mark — `vision,enhanced_vision` (no framework;
	# OpenCV has no graph runtime to measure)
	# `compare_reports.py` joins by (name, mode, resolution) and
	# silently drops rows not on both sides, so enhanced_vision
	# rows naturally appear in pairs where both impls produced them
	# (Khronos↔OpenCV, rustVX↔OpenCV, Khronos↔rustVX) and are absent
	# from MIVisionX↔* pairs.
	- name: Build & bench against MIVisionX (single-threaded, FHD × 20)
	if: always() && steps.detect.outputs.mivisionx == 'true'
	run: \|
	set -euo pipefail
	mkdir -p build-mivisionx
	cd build-mivisionx
	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DOPENVX_INCLUDES=${{ github.workspace }}/impl/mivisionx/include \
	-DOPENVX_LIB_DIR=${{ github.workspace }}/impl/mivisionx/lib \
	..
	cmake --build . -j$(nproc)
	export LD_LIBRARY_PATH=${{ github.workspace }}/impl/mivisionx/lib:${LD_LIBRARY_PATH:-}
	# Timer self-test first — gates the rest of the bench. If the
	# runner clock is sloppy, our timing numbers are meaningless
	# and we'd rather know about it now than ship bad data.
	./openvx-mark --validate-timing
	./openvx-mark --feature-set vision,framework \
	--resolution FHD --iterations 20 --warmup 5 --threads 1 \
	--output-dir results
	# Sentinel-set dump for cross-impl numerical verification —
	# see scripts/cross_verify_outputs.py. Runs the kernel set
	# ONCE (no timing, no warmup) so it's cheap, then the
	# downstream verify step compares this dump against the
	# OpenCV dump for correctness.
	./openvx-mark --dump-outputs dump-mivisionx --seed 42

	- name: Build & bench against Khronos sample (single-threaded, FHD × 20)
	if: always() && steps.detect.outputs.khronos == 'true'
	# `continue-on-error: true` so a crash inside a single
	# enhanced_vision kernel (the reference impl has known per-
	# kernel quirks under heavy use) doesn't take out the
	# comparison signal for whichever kernels did complete.
	# `openvx-mark` only writes its JSON at end-of-run, but the
	# surrounding job steps still upload artifacts as long as we
	# reach them.
	continue-on-error: true
	run: \|
	set -eo pipefail
	mkdir -p build-khronos
	cd build-khronos
	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DOPENVX_INCLUDES=${{ github.workspace }}/impl/khronos/include \
	-DOPENVX_LIB_DIR=${{ github.workspace }}/impl/khronos/lib \
	..
	cmake --build . -j$(nproc)
	export LD_LIBRARY_PATH=${{ github.workspace }}/impl/khronos/lib:${LD_LIBRARY_PATH:-}
	./openvx-mark --validate-timing
	./openvx-mark --feature-set vision,enhanced_vision,framework \
	--resolution FHD --iterations 20 --warmup 5 --threads 1 \
	--output-dir results
	./openvx-mark --dump-outputs dump-khronos --seed 42 \|\| true

	- name: Build & bench against rustVX (single-threaded, FHD × 20)
	if: always() && steps.detect.outputs.rustvx == 'true'
	# rustVX is CTS-conformant for both Vision (5923/5923) and
	# Enhanced Vision (1235/1235), so all 42 + 19 kernels should
	# actually produce real measurements here. This row is the
	# headline cell for "what does a fully-conformant OpenVX impl
	# look like vs OpenCV on the same hardware?".
	# `continue-on-error: true` is a belt-and-suspenders safety
	# in case any one kernel surfaces a regression mid-bench —
	# the artifact upload (which downstream comparisons depend
	# on) must still happen.
	continue-on-error: true
	run: \|
	set -eo pipefail
	mkdir -p build-rustvx
	cd build-rustvx
	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DOPENVX_INCLUDES=${{ github.workspace }}/impl/rustvx/include \
	-DOPENVX_LIB_DIR=${{ github.workspace }}/impl/rustvx/lib \
	..
	cmake --build . -j$(nproc)
	export LD_LIBRARY_PATH=${{ github.workspace }}/impl/rustvx/lib:${LD_LIBRARY_PATH:-}
	./openvx-mark --validate-timing
	./openvx-mark --feature-set vision,enhanced_vision,framework \
	--resolution FHD --iterations 20 --warmup 5 --threads 1 \
	--output-dir results
	./openvx-mark --dump-outputs dump-rustvx --seed 42 \|\| true

	# opencv-mark has no OpenVX dependency, so no OPENVX_* flags and no
	# detect-step gate — it only needs `libopencv-dev` (already installed
	# above). Same FHD × 20 iter × 5 warmup × --threads 1 shape as the
	# OpenVX benches so per-kernel speedups are directly comparable.
	#
	# Feature-set is `vision,enhanced_vision` — opencv-mark has 1:1
	# coverage of both profiles (79 + 19 = 98 OpenCV-side benchmarks
	# total). `framework` is intentionally omitted because OpenCV has
	# no graph runtime to measure (the framework benches that depend
	# on `vxProcessGraph` / virtual-image fusion semantics are
	# OpenVX-only by design). `compare_reports.py` ignores rows that
	# only exist on one side, so framework rows naturally don't
	# appear in OpenCV pairwise tables.
	- name: Build & bench opencv-mark (single-threaded, FHD × 20)
	if: always()
	id: bench_opencv
	run: \|
	set -euo pipefail
	mkdir -p build-opencv-bench
	cd build-opencv-bench
	cmake -DCMAKE_BUILD_TYPE=Release ..
	cmake --build . --target opencv-mark -j$(nproc)
	test -x opencv-mark/opencv-mark \
	\|\| { echo "ERROR: opencv-mark not built — OpenCV detection failed in compare job"; exit 1; }
	./opencv-mark/opencv-mark --validate-timing
	./opencv-mark/opencv-mark --feature-set vision,enhanced_vision \
	--resolution FHD --iterations 20 --warmup 5 --threads 1 \
	--output-dir results
	./opencv-mark/opencv-mark --dump-outputs dump-opencv --seed 42

	# ----- Cross-impl numerical verification -----
	#
	# We have one dump-* directory per impl that produced a build.
	# Run scripts/cross_verify_outputs.py for each (opencv, openvx)
	# pair so a reviewer can see at a glance whether MIVisionX,
	# Khronos sample, and rustVX agree with OpenCV at the pixel
	# level — proves the timing comparison rows below are honest
	# apples-to-apples and not "OpenCV is faster because it's
	# silently computing the wrong thing".
	#
	# The verifier exits non-zero on any kernel exceeding its
	# per-kernel tolerance; we collect all three reports into the
	# step summary first, then fail the step at the end if any
	# report failed. That way a single divergence on one impl
	# doesn't hide the other two impls' results.
	- name: Cross-impl output verification (OpenCV ↔ each OpenVX impl)
	if: always()
	run: \|
	set -euo pipefail
	# numpy is the only Python dep — used by the verifier for
	# array compare + PSNR. apt's python3-numpy on ubuntu-22.04
	# is fine and avoids a pip wheel download.
	sudo apt-get install -y python3-numpy
	mkdir -p comparisons

	OPENCV_DUMP=build-opencv-bench/dump-opencv
	{
	echo ""
	echo "---"
	echo ""
	echo "## Cross-impl numerical verification"
	echo ""
	echo "Sentinel kernel suite (VGA × 1 run, no timing) dumped by"
	echo "\`--dump-outputs\` on each binary; \`scripts/cross_verify_outputs.py\`"
	echo "loads both dumps and computes max-abs-diff + PSNR + exact-%"
	echo "per kernel. Tolerances are tuned per kernel (see \`RULES\` in"
	echo "the script). Numbers prove inputs are byte-identical (the"
	echo "\`_input_u8\` row) and kernels are semantically equivalent."
	echo ""
	} >> "$GITHUB_STEP_SUMMARY"

	OVERALL=0
	for impl in mivisionx khronos rustvx; do
	VX_DUMP="build-${impl}/dump-${impl}"
	if [ ! -d "$OPENCV_DUMP" ] \|\| [ ! -d "$VX_DUMP" ]; then
	echo "skipping verify for $impl: missing dump dir ($VX_DUMP or $OPENCV_DUMP)"
	echo "_Skipped \`$impl\` verify — dump directory missing._" >> "$GITHUB_STEP_SUMMARY"
	continue
	fi
	set +e
	python3 scripts/cross_verify_outputs.py \
	"$OPENCV_DUMP" "$VX_DUMP" \
	--left-label "OpenCV" --right-label "${impl}" \
	--json comparisons/cross-verify-${impl}.json \
	>> "$GITHUB_STEP_SUMMARY"
	rc=$?
	set -e
	if [ "$rc" -ne 0 ]; then OVERALL=1; fi
	echo "" >> "$GITHUB_STEP_SUMMARY"
	done

	# Surface OVERALL into a step-level marker — the job stays
	# green on a divergence (so reviewers still see the timing
	# comparison) but the row is annotated and an artifact link
	# is uploaded below.
	if [ "$OVERALL" -ne 0 ]; then
	echo "::warning::Cross-impl verification flagged ≥1 divergence — see job summary"
	fi

	# ----- Pairwise comparisons -----
	#
	# Each comparison is oriented as "<candidate> over <baseline>" so
	# the speedup column reads as `candidate / baseline` (>1.00x =
	# candidate is faster). The orientation is deliberate:
	#
	# OpenVX-vs-OpenVX trio — "how much faster is the more-tuned
	# impl than the reference":
	# * MIVisionX over Khronos sample (AMD over reference)
	# * MIVisionX over rustVX (AMD over Rust impl)
	# * rustVX over Khronos sample (Rust impl over reference)
	#
	# OpenVX-vs-OpenCV trio — "does adopting OpenVX pay off vs cv::":
	# * MIVisionX over OpenCV
	# * Khronos sample over OpenCV
	# * rustVX over OpenCV
	#
	# Mechanically, `scripts/compare_reports.py` computes
	# speedup = throughput(arg2) / throughput(arg1)
	# so the candidate is passed as the SECOND positional arg.
	#
	# The step does two things:
	# 1. Runs `compare_reports.py` once per pair to produce a
	# per-kernel detail .md in comparisons/. These also become
	# the `benchmark-comparisons` artifact for downstream tools.
	# 2. Invokes `scripts/ci_pairwise_summary.py` once to render
	# an organized GitHub Step Summary — TL;DR speedup matrix
	# at top, two grouped headline tables, and the per-kernel
	# detail tables collapsed inside <details> blocks. See the
	# script docstring for the config schema; this used to be a
	# ~115-line bash + inline-Python block and rendered ~600
	# lines into the summary.
	- name: Pairwise comparisons
	if: always()
	run: \|
	set -euo pipefail
	mkdir -p comparisons

	# Per-impl JSON report paths (parallel arrays keyed by impl id).
	IDS=(mivisionx khronos rustvx opencv)
	PATHS=(
	"build-mivisionx/results/benchmark_results.json"
	"build-khronos/results/benchmark_results.json"
	"build-rustvx/results/benchmark_results.json"
	"build-opencv-bench/results/benchmark_results.json"
	)
	LABELS=(
	"MIVisionX (AMD OpenVX)"
	"Khronos sample"
	"rustVX"
	"OpenCV"
	)

	# The 6 pairs, "<candidate> <baseline>". Order matches the
	# rendered summary table order: OpenVX-vs-OpenCV (headline
	# question) first, then OpenVX-vs-OpenVX.
	PAIRS=(
	"mivisionx opencv"
	"khronos opencv"
	"rustvx opencv"
	"mivisionx khronos"
	"mivisionx rustvx"
	"rustvx khronos"
	)

	# Phase 1 — per-kernel detail .md per pair where both inputs
	# exist. Missing-input pairs are silently skipped here; the
	# summary script renders a friendly "_Detail missing_" note
	# for them inside the collapsed <details> block.
	path_of() {
	for i in "${!IDS[@]}"; do
	if [ "${IDS[$i]}" = "$1" ]; then echo "${PATHS[$i]}"; return; fi
	done
	}
	for pair in "${PAIRS[@]}"; do
	read -r CAND BASE <<< "$pair"
	CAND_PATH=$(path_of "$CAND")
	BASE_PATH=$(path_of "$BASE")
	OUT="comparisons/${CAND}-over-${BASE}"
	if [ -f "$CAND_PATH" ] && [ -f "$BASE_PATH" ]; then
	python3 scripts/compare_reports.py "$BASE_PATH" "$CAND_PATH" --output "$OUT"
	else
	echo "Skipping detail for ${CAND}-over-${BASE}: missing ${CAND_PATH} or ${BASE_PATH}"
	fi
	done

	# Phase 2 — render the organized step summary. The config
	# below is the only place pair-grouping & intent text lives;
	# the helper handles matrix rendering, headline tables, and
	# the collapsed <details> blocks.
	cat > /tmp/pairwise-config.json <<'JSON'
	{
	"reports": {
	"mivisionx": {"label": "MIVisionX (AMD OpenVX)", "path": "build-mivisionx/results/benchmark_results.json"},
	"khronos": {"label": "Khronos sample", "path": "build-khronos/results/benchmark_results.json"},
	"rustvx": {"label": "rustVX", "path": "build-rustvx/results/benchmark_results.json"},
	"opencv": {"label": "OpenCV", "path": "build-opencv-bench/results/benchmark_results.json"}
	},
	"groups": [
	{
	"title": "OpenVX-vs-OpenCV — does adopting OpenVX pay off vs cv::?",
	"intent": "Speedup reads as `<OpenVX impl> / OpenCV`. Values >1.00x mean adopting that OpenVX impl pays off vs writing the equivalent directly in OpenCV — the headline question this comparison phase exists to answer. Ordered most-tuned (MIVisionX) → reference (Khronos sample) → Rust impl (rustVX) so the table walks the realistic best→worst range of the trade-off.",
	"pairs": [["mivisionx", "opencv"], ["khronos", "opencv"], ["rustvx", "opencv"]]
	},
	{
	"title": "OpenVX-vs-OpenVX — cross-implementation",
	"intent": "Speedup reads as `<candidate> / <baseline>`. MIVisionX (AMD, most-tuned) compared against both reference impls, then rustVX vs Khronos sample (Rust impl over reference).",
	"pairs": [["mivisionx", "khronos"], ["mivisionx", "rustvx"], ["rustvx", "khronos"]]
	}
	],
	"detail_dir": "comparisons"
	}
	JSON
	python3 scripts/ci_pairwise_summary.py --config /tmp/pairwise-config.json \
	>> "$GITHUB_STEP_SUMMARY"

	echo "--- comparison artifacts ---"
	ls -la comparisons/ \|\| true

	- name: Upload per-impl benchmark results
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: benchmark-results
	path: \|
	build-mivisionx/results/
	build-khronos/results/
	build-rustvx/results/
	build-opencv-bench/results/
	if-no-files-found: ignore

	- name: Upload pairwise comparisons
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: benchmark-comparisons
	path: comparisons/
	if-no-files-found: ignore

	# Sentinel kernel dumps — uploaded so a reviewer can re-run
	# `scripts/cross_verify_outputs.py` locally against any pair
	# without re-running the whole CI build, and so the raw .bin
	# files are inspectable after the fact for any divergence the
	# verifier flagged.
	- name: Upload sentinel output dumps
	if: always()
	uses: actions/upload-artifact@v4
	with:
	name: cross-verify-dumps
	path: \|
	build-mivisionx/dump-mivisionx/
	build-khronos/dump-khronos/
	build-rustvx/dump-rustvx/
	build-opencv-bench/dump-opencv/
	if-no-files-found: ignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1: Vision 42/42 + Enhanced Vision 19/19 across openvx-mark, opencv-mark, and rustVX #77

Workflow file

v1.1: Vision 42/42 + Enhanced Vision 19/19 across openvx-mark, opencv-mark, and rustVX #77

Uh oh!

Workflow file for this run