This document defines a production-grade CI/CD topology for compute-kernel style repositories where determinism, security, hardware compatibility, and release repeatability are first-class gates.
Developer
-> Pull Request / Push
-> Layer 1: Pre-commit Validation
-> Layer 2: Static Analysis
-> Layer 3: Dependency & Supply Chain
-> Layer 4: Build Matrix
-> Layer 5: Kernel Compilation
-> Layer 6: Test Matrix
-> Layer 7: Performance
-> Layer 8: Security
-> Layer 9: Packaging
-> Layer 10: Release
- lint-python
- lint-cpp
- format-check
- import-order-check
- precommit-hooks
- commit-message-lint
- secrets-scan
Primary tools: ruff, black, clang-format, pre-commit, commitlint, gitleaks.
- mypy-type-check
- pylint-analysis
- code-complexity-check
- dead-code-detection
- docstring-coverage
- codeql-analysis
Primary tools: mypy, pylint, radon, vulture, codeql.
- dependency-vulnerability-scan
- license-compliance
- dependency-update-check
- sbom-generation
Primary tools: pip-audit, safety, syft, dependabot.
- build-linux
- build-macos
- build-python-3.9
- build-python-3.10
- build-python-3.11
- build-cuda-11
- build-cuda-12
- compile-cuda-kernels
- compile-cpu-kernels
- compile-ptx
- compile-avx-optimizations
- unit-tests
- integration-tests
- api-contract-tests
- gpu-runtime-tests
- numerical-accuracy-tests
- memory-safety-tests
Primary tools: pytest, cuda-memcheck, ASAN, valgrind.
- benchmark-kernels
- performance-regression-detection
- latency-analysis
- memory-bandwidth-tests
Primary tools: pytest-benchmark, nvprof/nsight, Airspeed Velocity.
- container-vulnerability-scan
- binary-vulnerability-scan
- fuzz-testing
Primary tools: trivy, grype, libFuzzer, hypothesis.
- build-python-wheels
- build-docker-images
- publish-release
Primary outputs: signed PyPI wheels, signed OCI images, compiled kernels, benchmark reports, SBOM + provenance attestations.
repo/
├── kernels/
├── tests/
│ ├── unit/
│ ├── integration/
│ ├── gpu/
│ └── benchmarks/
├── benchmarks/
├── docker/
├── scripts/
├── .github/workflows/
└── pyproject.toml
The following files in this repository implement core parts of the architecture:
| Existing workflow | Layer coverage | Notes |
|---|---|---|
.github/workflows/ci.yml |
1, 2, 4, 6 | Fast lint/type/test/build gate |
.github/workflows/hpc-matrix.yml |
4, 5, 6 | Expanded Python/Torch/CUDA compatibility |
.github/workflows/gpu-hardware.yml |
5, 6, 7 | Hardware validation on self-hosted GPU runners |
.github/workflows/security.yml |
1, 2, 3, 8 | CodeQL, dependency review, secret scanning |
.github/workflows/benchmark.yml |
7 | Regression checks using benchmark artifacts |
.github/workflows/dependency-health.yml |
3 | Dependency graph and vulnerability health |
.github/workflows/dependency-canary.yml |
3, 4 | Canary runs against latest dependencies |
.github/workflows/release.yml |
9, 10 | Build + publish releases |
.github/workflows/docs.yml |
release support | Documentation validation and publication |
Use artifact-backed baseline comparison:
- Store a baseline benchmark artifact from
main. - Run PR benchmarks on the same hardware class.
- Compare with a fixed threshold (default: 5% slowdown).
- Fail PR when threshold is exceeded.
Recommended dedicated GPU runner pools:
- Self-hosted RTX 4090
- Self-hosted A100
- Self-hosted H100
Alternative cloud-backed runner pools:
- RunPod
- Lambda Labs
- GCP GPU nodes
- Cache aggressively: Python deps, build outputs, ccache, benchmark baselines.
- Shard expensive tests across runners.
- Keep PR checks fast; move deep validation to nightly/release candidates.
- Use reusable workflows to avoid duplication across matrix dimensions.
- Signed commits + signed tags.
- SBOM generation per build.
- Artifact signing with Sigstore/Cosign.
- SLSA provenance attestations for release bundles.
Recommended minimum gate:
pytest --cov=kernels --cov=implementations --cov-fail-under=85Raise the threshold for critical packages over time as flaky suites are removed.
For kernel-heavy production systems, add:
- Kernel fuzzing
- PTX verification
- ABI compatibility tests
- GPU driver compatibility matrix
- Binary reproducibility checks