This document defines a production-grade CI/CD design for compute-kernel style repositories where correctness, determinism, and hardware compatibility are first- class release gates.
Commit
-> Static Analysis
-> Deterministic Build Matrix
-> CPU + GPU Test Matrix
-> Security / Supply Chain
-> Numerical Validation
-> Performance Regression Validation
-> Artifact Packaging + Attestation
-> Release + Registry Publish
-> Production Telemetry Hooks
| Workflow | Trigger | Purpose |
|---|---|---|
ci.yml |
PR + push | Fast quality checks, unit tests, package build |
hpc-matrix.yml |
PR + push + nightly | Expanded matrix with PyTorch/CUDA compatibility checks |
gpu-hardware.yml |
nightly + manual + release candidate | Hardware validation on self-hosted GPU runners |
security.yml |
PR + push + weekly | SAST, dependency audit, secret scan |
benchmark.yml |
PR + nightly | Performance baselines and regression thresholds |
release.yml |
tags | Build, verify, and publish release artifacts |
docs.yml |
docs changes + release | Build and publish documentation |
Determinism is enforced by:
- Pinning lock files and toolchain versions.
- Building wheels in isolated environments (
python -m build). - Verifying metadata (
twine check). - Capturing SBOM and provenance attestations.
| Axis | Values |
|---|---|
| Python | 3.9, 3.10, 3.11, 3.12 |
| PyTorch | Supported minor versions |
| CUDA | 11.8, 12.1, 12.4 |
| GPU arch | sm_80, sm_86, sm_89, sm_90 |
| OS | ubuntu-latest, self-hosted GPU Linux |
ruff check .ruff format --check .mypy kernels implementations
- Unit tests with coverage threshold.
- Integration tests for runtime loading and fallback behavior.
- Numerical equivalence checks against reference implementations.
- Run kernel smoke tests on A100 / H100 / RTX-class runners.
- Enforce CPU fallback tests in every PR.
- CodeQL
pip-auditsafetygitleaks- Optional Semgrep policy pack
- Pytest benchmark suite with historical comparison.
- Fail CI if median latency regresses beyond threshold (default 5%).
Release jobs should execute only after all mandatory checks pass:
- Lint / type / tests
- Security scans
- Benchmark regression check
- Docs build
On tag:
- Build source + wheel distributions.
- Generate SBOM (
syft) and vulnerability report (grypeortrivy). - Publish to PyPI.
- Publish container image.
- Attach benchmark + security artifacts to GitHub release.
If release validation fails for correctness or benchmark thresholds:
- Mark release candidate as failed.
- Prevent publication jobs from running.
- Emit structured summary and incident artifact.
For large OSS adoption:
- Split fast and slow workflows; protect PR latency.
- Use distributed/self-hosted GPU pools by architecture label.
- Cache Python deps, build layers, and benchmark baselines.
- Nightly deep validation for expensive fuzz + hardware tests.