Skip to content

Commit 17abaa1

Browse files
authored
Merge branch 'main' into fix/attestation-ux-bad-install
2 parents edb7265 + e2cbf32 commit 17abaa1

28 files changed

+2245
-1785
lines changed

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,27 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## [0.8.16] - 2026-03-05
6+
7+
### Bug Fixes
8+
9+
- *(evidence)* Use nvcr image in HPA GPU test manifest by [@yuanchen8911](https://github.com/yuanchen8911)
10+
11+
## [0.8.15] - 2026-03-05
12+
13+
### Bug Fixes
14+
15+
- *(verify)* Return FAILED and non-zero exit when bundle has verifica… by [@lockwobr](https://github.com/lockwobr)
16+
17+
### New Features
18+
19+
- *(ci)* Support /ok-to-test for fork PRs by [@mchmarny](https://github.com/mchmarny)
20+
- *(recipes)* Add GB200 EKS recipe overlays, fix HPA multi-arch, add DRA evidence and deploy mitigations by [@yuanchen8911](https://github.com/yuanchen8911)
21+
22+
### Other Tasks
23+
24+
- *(validator)* Remove redundant operator-health deployment check by [@xdu31](https://github.com/xdu31)
25+
526
## [0.8.14] - 2026-03-05
627

728
### Bug Fixes

RELEASING.md

Lines changed: 86 additions & 168 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,87 @@
11
# Release Process
22

3-
This document outlines the release process for NVIDIA AI Cluster Runtime (AICR). For contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).
3+
This document describes when, why, and how AICR releases are made. For contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).
44

5-
## Prerequisites
5+
## Cadence
66

7-
- Repository admin access with write permissions
8-
- Understanding of semantic versioning (vMAJOR.MINOR.PATCH)
9-
- Access to GitHub Actions workflows
10-
- [git-cliff](https://git-cliff.org/) installed (run `make tools-setup` to install)
7+
Releases follow a **bi-weekly cadence**, aligned with sprint boundaries. A new release is cut at the conclusion of each 2-week sprint.
118

12-
## Release Methods
9+
| Release Type | When | Version Bump | Decision |
10+
|-------------|------|-------------|----------|
11+
| Sprint release | End of each 2-week sprint | `patch` or `minor` | Maintainer determines bump type based on changes landed |
12+
| Hotfix | Between sprints, as needed | `patch` | Any maintainer can initiate for critical fixes |
13+
| Major | Planned | `major` | Requires team agreement and advance communication |
1314

14-
### Method 1: Version Bump (Recommended)
15+
## What Goes Into a Release
1516

16-
Use Makefile targets for standard releases:
17+
A release includes everything merged to `main` since the last tag. There is no cherry-picking or feature branching for releases — if it's on `main`, it ships.
1718

18-
```bash
19-
make bump-patch # v1.2.3 → v1.2.4
20-
make bump-minor # v1.2.3 → v1.3.0
21-
make bump-major # v1.2.3 → v2.0.0
22-
```
19+
**Before cutting a release, verify:**
2320

24-
**What happens automatically**:
21+
- All CI checks pass on `main` (`make qualify`)
22+
- No known regressions from the current sprint
23+
- Breaking changes use `feat!:` or `fix!:` commit prefix (drives changelog and signals consumers)
2524

26-
1. Validates working directory is clean with no unpushed commits
27-
2. Calculates the new version based on bump type
28-
3. Generates/updates `CHANGELOG.md` using [git-cliff](https://git-cliff.org/)
29-
4. Commits the changelog update
30-
5. Creates an annotated tag
31-
6. Pushes both commit and tag to origin
32-
7. Triggers release workflows (see [Workflow Pipeline](#workflow-pipeline))
25+
## Quality Gates
3326

34-
**Note**: Manual edits to `CHANGELOG.md` (e.g., corrections to previous releases) are preserved. The bump process prepends new entries without overwriting existing content.
27+
Every release must pass these automated gates before artifacts are published:
3528

36-
### Method 2: Manual Tag (Advanced)
29+
- Unit tests with race detector
30+
- golangci-lint + yamllint
31+
- License header verification
32+
- Trivy vulnerability scan
33+
- E2E tests on Kind cluster
3734

38-
For cases where you need more control over the release process:
35+
If any gate fails, the release pipeline stops. Fix forward on `main` and cut a new tag.
3936

40-
1. **Ensure main is ready**:
41-
```bash
42-
git checkout main
43-
git pull origin main
44-
make qualify # All checks must pass
45-
```
37+
## How to Release
4638

47-
2. **Generate changelog manually** (optional):
48-
```bash
49-
git-cliff --tag v1.2.3 -o CHANGELOG.md
50-
git add CHANGELOG.md
51-
git commit -m "chore: update CHANGELOG for v1.2.3"
52-
git push origin main
53-
```
39+
### Standard Release (recommended)
5440

55-
3. **Create and push a version tag**:
56-
```bash
57-
git tag -a v1.2.3 -m "Release v1.2.3"
58-
git push origin v1.2.3
59-
```
41+
```bash
42+
git checkout main
43+
git pull origin main
44+
make qualify # Verify locally before releasing
6045

61-
4. **Automatic workflows trigger** (via `on-tag.yaml`)
46+
make bump-patch # v1.2.3 -> v1.2.4
47+
# or
48+
make bump-minor # v1.2.3 -> v1.3.0
49+
```
6250

63-
### Method 3: Manual Workflow Trigger
51+
This automatically: validates clean state, generates changelog, commits, tags, pushes, and triggers the release pipeline.
6452

65-
For rebuilding from existing tags or emergency releases:
53+
### Manual Tag
6654

67-
1. Navigate to **Actions****On Tag Release**
68-
2. Click **Run workflow**
69-
3. Enter the existing tag (e.g., `v1.2.3`)
70-
4. Click **Run workflow**
55+
For more control:
7156

72-
This is useful when you need to re-run the release pipeline without creating a new tag.
57+
```bash
58+
git checkout main && git pull origin main
59+
make qualify
60+
git-cliff --tag v1.2.3 -o CHANGELOG.md # Optional: generate changelog
61+
git add CHANGELOG.md && git commit -m "chore: update CHANGELOG for v1.2.3"
62+
git tag -a v1.2.3 -m "Release v1.2.3"
63+
git push origin main v1.2.3
64+
```
7365

74-
## Workflow Pipeline
66+
### Re-run Existing Release
67+
68+
To rebuild artifacts from an existing tag without creating a new one: **Actions** > **On Tag Release** > **Run workflow** > enter the tag.
69+
70+
## Hotfix Procedure
71+
72+
For critical fixes between sprints:
73+
74+
1. Fix on `main` first (PR, review, merge as normal)
75+
2. Cut a patch release: `make bump-patch`
76+
3. For patching older release lines (rare): cherry-pick from `main` onto a hotfix branch, tag manually
77+
78+
## Release Pipeline
7579

7680
```
77-
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
78-
│ Tag Push │───▶│ Go CI │───▶│ Build │───▶│ Attest │───▶│ Deploy │
79-
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
80-
tests + binaries + SBOM + Demo Deploy
81-
lint images provenance (example)
81+
Tag Push --> CI (tests + lint) --> Build (binaries + images) --> Attest (SBOM + provenance) --> Deploy (demo)
8282
```
8383

84-
## Released Components
84+
## Released Artifacts
8585

8686
### Binaries
8787

@@ -101,148 +101,66 @@ Published to GitHub Container Registry (`ghcr.io/nvidia/`):
101101
| `aicr` | `nvcr.io/nvidia/cuda:13.1.0-runtime-ubuntu24.04` | CLI with CUDA runtime |
102102
| `aicrd` | `gcr.io/distroless/static:nonroot` | Minimal API server |
103103

104-
Tags: `latest`, `v1.2.3`
104+
Tags: `latest`, `vX.Y.Z`
105105

106-
### Supply Chain Artifacts
106+
### Supply Chain
107107

108108
Every release includes:
109109

110-
- **SLSA Build Level 3 Provenance**: Verifiable build attestations
111-
- **SBOM**: Software Bill of Materials (SPDX format)
112-
- **Sigstore Signatures**: Keyless signing via Fulcio + Rekor
113-
- **Checksums**: SHA256 checksums for all binaries
110+
- **SLSA Build Level 3 Provenance** — verifiable build attestations
111+
- **SBOM** Software Bill of Materials (SPDX format)
112+
- **Sigstore Signatures** — keyless signing via Fulcio + Rekor
113+
- **Checksums** — SHA256 for all binaries
114114

115-
## Quality Gates
116-
117-
All releases must pass:
115+
## Versioning
118116

119-
- **Unit tests**: With race detector enabled
120-
- **Linting**: golangci-lint + yamllint
121-
- **License headers**: All source files verified
122-
- **Security scan**: Trivy vulnerability scan
117+
- **Semantic versioning**: `vMAJOR.MINOR.PATCH`
118+
- **Pre-releases**: `v1.2.3-rc1`, `v1.2.3-beta1` (automatically marked in GitHub)
119+
- **Breaking changes**: Increment MAJOR version
123120

124121
## Verification
125122

126-
### Verify Container Attestations
123+
### Container Attestations
127124

128125
```bash
129-
# Get latest release tag
130126
export TAG=$(curl -s https://api.github.com/repos/NVIDIA/aicr/releases/latest | jq -r '.tag_name')
131127

132-
# Verify with GitHub CLI (recommended)
128+
# GitHub CLI
133129
gh attestation verify oci://ghcr.io/nvidia/aicr:${TAG} --owner nvidia
134130
gh attestation verify oci://ghcr.io/nvidia/aicrd:${TAG} --owner nvidia
135131

136-
# Verify with Cosign
132+
# Cosign
137133
cosign verify-attestation \
138134
--type spdxjson \
139135
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
140136
--certificate-identity-regexp 'https://github.com/NVIDIA/aicr/.github/workflows/.*' \
141137
ghcr.io/nvidia/aicr:${TAG}
142138
```
143139

144-
### Verify Binary Checksums
140+
### Binary Checksums
145141

146142
```bash
147-
# Download checksums file from GitHub Release
148143
curl -sL "https://github.com/NVIDIA/aicr/releases/download/${TAG}/aicr_checksums.txt" -o checksums.txt
149-
150-
# Verify downloaded binary
151144
sha256sum -c checksums.txt --ignore-missing
152145
```
153146

154-
### Pull and Test Images
155-
156-
```bash
157-
# Pull container images
158-
docker pull ghcr.io/nvidia/aicr:${TAG}
159-
docker pull ghcr.io/nvidia/aicrd:${TAG}
160-
161-
# Test CLI
162-
docker run --rm ghcr.io/nvidia/aicr:${TAG} --version
163-
164-
# Test API server
165-
docker run --rm -p 8080:8080 ghcr.io/nvidia/aicrd:${TAG} &
166-
curl http://localhost:8080/health
167-
```
168-
169-
## Version Management
170-
171-
- **Semantic versioning**: `vMAJOR.MINOR.PATCH`
172-
- **Pre-releases**: `v1.2.3-rc1`, `v1.2.3-beta1` (automatically marked in GitHub)
173-
- **Breaking changes**: Increment MAJOR version
147+
## Demo Deployment
174148

175-
## Demo Cloud Run Deployment
149+
> **Note**: Demonstration only — not a production service. Self-host `aicrd` for production use. See [API Server Documentation](docs/contributor/api-server.md).
176150
177-
> **Note**: This is a **demonstration deployment** for testing and development purposes only. It is not a production service. Users should self-host the `aicrd` API server in their own infrastructure for production use. See [API Server Documentation](docs/contributor/api-server.md) for deployment guidance.
151+
The `aicrd` API server demo deploys to Google Cloud Run on successful release (project: `eidosx`, region: `us-west1`, auth: Workload Identity Federation).
178152

179-
The `aicrd` API server demo is automatically deployed to Google Cloud Run on successful release:
180-
181-
- **Project**: `eidosx`
182-
- **Region**: `us-west1`
183-
- **Service**: `api`
184-
- **Authentication**: Workload Identity Federation (keyless)
153+
## Troubleshooting
185154

186-
This demo deployment only occurs if the build step succeeds and serves as an example of how to deploy the API server.
155+
| Problem | Action |
156+
|---------|--------|
157+
| Tests fail during release | Fix on `main`, cut new tag |
158+
| Lint errors | Run `make lint` locally before releasing |
159+
| Image push failure | Check GHCR permissions |
160+
| Need to rebuild | Use manual workflow trigger with existing tag |
187161

188-
## Troubleshooting
162+
## Prerequisites
189163

190-
### Failed Release
191-
192-
1. Check **Actions****On Tag Release** for error logs
193-
2. Common issues:
194-
- Tests failing: Fix and create new tag
195-
- Lint errors: Run `make lint` locally first
196-
- Image push failures: Check GHCR permissions
197-
198-
### Rebuild Existing Release
199-
200-
Use manual workflow trigger with the existing tag. No need to delete and recreate tags.
201-
202-
## Emergency Hotfix Procedure
203-
204-
For urgent fixes:
205-
206-
1. **Fix in main first**:
207-
```bash
208-
git checkout main
209-
git checkout -b fix/critical-issue
210-
# Apply fix, create PR to main, merge
211-
```
212-
213-
2. **Create hotfix release**:
214-
```bash
215-
git checkout main
216-
git pull origin main
217-
make bump-patch # Generates changelog, tags, and pushes
218-
```
219-
220-
3. **For patching older releases** (rare):
221-
```bash
222-
git checkout v1.2.3
223-
git checkout -b hotfix/v1.2.4
224-
git cherry-pick <commit-hash-from-main>
225-
git-cliff --tag v1.2.4 --unreleased --prepend CHANGELOG.md
226-
git add CHANGELOG.md
227-
git commit -m "chore: update CHANGELOG for v1.2.4"
228-
git tag -a v1.2.4 -m "Release v1.2.4"
229-
git push origin hotfix/v1.2.4 v1.2.4
230-
```
231-
232-
## Release Checklist
233-
234-
Before running `make bump-*`:
235-
236-
- [ ] All CI checks pass on main (`make qualify`)
237-
- [ ] Working directory is clean (no uncommitted changes)
238-
- [ ] All commits are pushed to origin
239-
- [ ] Breaking changes documented in commit messages (use `feat!:` or `fix!:` prefix)
240-
- [ ] Version bump type is correct (major for breaking, minor for features, patch for fixes)
241-
242-
After release:
243-
244-
- [ ] GitHub Release created with changelog
245-
- [ ] Container images available in GHCR
246-
- [ ] Attestations verifiable
247-
- [ ] Demo Cloud Run deployment successful (optional)
248-
- [ ] Announce release (if applicable)
164+
- Repository admin access with write permissions
165+
- Access to GitHub Actions workflows
166+
- [git-cliff](https://git-cliff.org/) installed (`make tools-setup`)

0 commit comments

Comments
 (0)