Skip to content

Commit 9145aeb

Browse files
committed
Merge remote-tracking branch 'origin/main' into jerm-dro/script-middleware
# Conflicts: # pkg/vmcp/cli/serve.go
2 parents 8ed5634 + dc74588 commit 9145aeb

753 files changed

Lines changed: 83957 additions & 22836 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/rules/operator.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,24 @@ See `cmd/thv-operator/DESIGN.md` for detailed decision guidelines.
3939
- Always run `task crdref-gen` from `cmd/thv-operator/` after CRD changes to regenerate API docs (uses relative paths)
4040
- Use `envtest` for integration testing, not real clusters
4141
- Chainsaw tests require a real Kubernetes cluster
42-
- Status updates require a separate client patch (`r.Status().Update()`)
42+
- Status writes must go through `controllerutil.MutateAndPatchStatus` — see the Status Writes section below
4343

4444
## Status Condition Parity
4545

4646
When adding a status condition to one CRD type, check all parallel types (e.g., `MCPServer` and `VirtualMCPServer`) for the same condition. Conditions that warn about misconfiguration or unsupported states should be consistent across types that share the same feature set — a gap means one type silently accepts invalid config that the other rejects.
4747

48+
## Status Writes
49+
50+
Use `controllerutil.MutateAndPatchStatus` for every status write — not `r.Status().Update` or inline `client.Status().Patch` (see #4633). The helper's doc comment is the authoritative spec.
51+
52+
When adding a status-write call site, check three things:
53+
54+
1. **Caller holds a freshly-`Get`ted object.** Reconciler-start writers do; writers that iterate `List` results (e.g., deletion-path fan-out in `MCPGroupReconciler`) do not and need a fresh `Get` before calling the helper.
55+
2. **Caller is the sole owner of the entire `Status.Conditions` array.** Per-condition-type ownership is NOT enough. JSON merge-patch replaces the array wholesale for CRDs (the `+listType=map` marker is only honored by strategic-merge-patch), so any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, not just the ones this caller touches — will be erased. A fresh `Get` narrows the TOCTOU window but does not eliminate it. If two code paths must write conditions on the same CRD (e.g., operator reconciler + in-pod `K8sReporter`), fix at the design level: consolidate to a single owner, or move one writer to a dedicated status field outside the array.
56+
3. **Scalar fields the writer touches are not co-owned.** A stale-computed value different from the caller's snapshot will overwrite the live value — the helper cannot defend against this.
57+
58+
Do not use `MutateAndPatchStatus` for spec or metadata writes — those require optimistic locking (`client.MergeFromWithOptions(..., MergeFromWithOptimisticLock{})`). See #4767.
59+
4860
## Key Operator Commands
4961

5062
```bash
@@ -55,3 +67,39 @@ task operator-test # Run unit tests
5567
task operator-e2e-test # Run e2e tests
5668
task crdref-gen # Generate CRD API docs (run from cmd/thv-operator/)
5769
```
70+
71+
## Spec / metadata patching
72+
73+
Never use `r.Update` on a CR spec or metadata: `Update` is a full PUT,
74+
so any field our local copy does not track (e.g. `spec.authzConfig`
75+
written by a separate authorization controller) gets zeroed on every
76+
reconcile.
77+
78+
Use `controllerutil.MutateAndPatchSpec` instead. The helper wraps an
79+
optimistic-lock merge patch: the body only contains fields the caller
80+
changed, and `MergeFromWithOptimisticLock` sends `resourceVersion` as a
81+
precondition, so if the server moved between our Get and Patch the
82+
apiserver returns 409 and controller-runtime requeues with a fresh Get.
83+
84+
This is what protects `metadata.finalizers`. Merge-patch has no
85+
array-append semantics — arrays are replaced wholesale — so when our
86+
diff includes `finalizers` (e.g. an `AddFinalizer` call) it must have
87+
been computed from an up-to-date snapshot. The 409 + requeue is what
88+
guarantees that: any concurrent finalizer added by another controller
89+
fails our precondition, and the next reconcile observes it via a fresh
90+
Get before recomputing the diff.
91+
92+
```go
93+
if err := ctrlutil.MutateAndPatchSpec(ctx, r.Client, mcpServer, func(m *mcpv1beta1.MCPServer) {
94+
controllerutil.AddFinalizer(m, MCPServerFinalizerName)
95+
}); err != nil {
96+
return ctrl.Result{}, err
97+
}
98+
```
99+
100+
Expect 409s as routine log noise once the external controller lands —
101+
the guard doing its job, not a bug.
102+
103+
Status-subresource patching uses the sibling helper
104+
`controllerutil.MutateAndPatchStatus` (see the "Status Writes" section
105+
above).

.claude/skills/implement-story/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ After changes that affect generated artifacts, run the appropriate tasks:
170170

171171
| Change Type | Regeneration Command |
172172
|-------------|---------------------|
173-
| CRD type definitions (`api/v1alpha1/*_types.go`) | `task operator-manifests operator-generate` |
173+
| CRD type definitions (`api/v1beta1/*_types.go`) | `task operator-manifests operator-generate` |
174174
| Mock interfaces | `task gen` |
175175
| CLI commands or API endpoints | `task docs` |
176176
| Helm chart values | `task helm-docs` |

.claude/skills/toolhive-release/references/WORKFLOW-REFERENCE.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,13 @@ Detailed documentation of the ToolHive release workflow chain.
6868
- Version in commit message matches VERSION file
6969
3. Check if tag already exists (skip if so)
7070
4. Create annotated git tag `vX.Y.Z`
71-
5. Push tag using `RELEASE_TOKEN` (PAT required to trigger downstream workflows)
71+
5. Push tag using a GitHub App installation token (required to trigger downstream workflows; `GITHUB_TOKEN`-authored events do not)
7272
6. Create GitHub Release with auto-generated notes
7373

7474
**Requirements**:
75-
- `RELEASE_TOKEN` secret (PAT with repo access)
75+
- GitHub App installed on the repo with `contents: write` permission
76+
- `RELEASE_APP_CLIENT_ID` repository **variable** (the app's Client ID)
77+
- `RELEASE_APP_PRIVATE_KEY` repository **secret** (the app's private key in PEM)
7678

7779
## Workflow 3: releaser.yml
7880

@@ -162,7 +164,8 @@ If a previous Create Release PR run failed after creating the branch but before
162164
### Tag creation fails
163165

164166
- Tag may already exist: `git tag | grep vX.Y.Z`
165-
- RELEASE_TOKEN may be expired or lack permissions
167+
- Release GitHub App may be uninstalled, or the `RELEASE_APP_CLIENT_ID` variable / `RELEASE_APP_PRIVATE_KEY` secret may be missing or stale
168+
- App may lack `contents: write` permission on the repo
166169

167170
### Releaser workflow fails
168171

.codespellrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
[codespell]
2-
ignore-words-list = NotIn,notin,AfterAll,ND,aks,deriver,te,clientA,AtMost,atmost
2+
ignore-words-list = NotIn,notin,AfterAll,ND,aks,deriver,te,clientA,AtMost,atmost,convertIn
33
skip = *.svg,*.mod,*.sum

.github/pull_request_template.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,25 @@ describe exactly what you tested below the checkbox.
4242
- [ ] Linting (`task lint-fix`)
4343
- [ ] Manual testing (describe below)
4444

45+
## API Compatibility
46+
47+
<!--
48+
The CRD Schema Compatibility check guards the v1beta1 operator API.
49+
If the check flags this PR as Incompatible and the break is intentional,
50+
apply the `api-break-allowed` label and describe below:
51+
52+
1. Which fields, types, or CRDs are changing.
53+
2. Why the break is unavoidable.
54+
3. The user-facing migration path (what cluster admins need to do).
55+
56+
See CONTRIBUTING.md → "API Stability" for the full rubric. Coordinate
57+
with maintainers before applying the label.
58+
59+
Remove this section entirely if the PR does not touch operator API surface.
60+
-->
61+
62+
- [ ] This PR does not break the `v1beta1` API, OR the `api-break-allowed` label is applied and the migration guidance is described above.
63+
4564
## Changes
4665

4766
<!--
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: API Compatibility
2+
3+
# No-op companion to api-compat.yml. Its sole purpose is to satisfy the
4+
# required `CRD Schema Compatibility` status check on PRs that don't touch
5+
# any operator API surface. Without this companion, such PRs deadlock:
6+
# branch protection requires the check, the real workflow's path filter
7+
# prevents it from firing, and GitHub shows the required status as
8+
# "expected — waiting to be reported" forever.
9+
#
10+
# The workflow `name:` and job `name:` intentionally mirror api-compat.yml
11+
# so the check-run context string matches (`CRD Schema Compatibility`).
12+
# GitHub's branch protection treats a successful report from either
13+
# workflow as satisfying the requirement.
14+
#
15+
# The `paths-ignore` list is the exact inverse of api-compat.yml's
16+
# `paths:` include list. Keep them in sync: a path that moves from one
17+
# list needs to move from the other, or PRs touching that path will
18+
# either run both workflows (double-count) or neither (deadlock returns).
19+
20+
on:
21+
pull_request:
22+
paths-ignore:
23+
- 'cmd/thv-operator/api/**'
24+
- 'deploy/charts/operator-crds/files/crds/**'
25+
- '.github/workflows/api-compat*.yml'
26+
27+
permissions:
28+
contents: read
29+
30+
jobs:
31+
crd-schema-check:
32+
name: CRD Schema Compatibility
33+
runs-on: ubuntu-latest
34+
timeout-minutes: 2
35+
steps:
36+
- name: No API surface changes
37+
run: echo "This PR does not touch operator API surface; no compatibility check needed."

.github/workflows/api-compat.yml

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
name: API Compatibility
2+
3+
# This workflow guards the stability of the v1beta1 operator API surface.
4+
#
5+
# A breaking CRD schema change (field removal, type change, required-field
6+
# addition, etc.) fails this check and blocks the PR. If the break is
7+
# intentional — almost exclusively for graduation to v1beta2 — apply the
8+
# `api-break-allowed` label to skip the check. See CONTRIBUTING.md → "API
9+
# Stability" for the full rubric.
10+
11+
on:
12+
pull_request:
13+
# Include `labeled` and `unlabeled` so applying or removing
14+
# `api-break-allowed` triggers a fresh workflow run. Without these,
15+
# re-running the job from the UI uses the original event payload
16+
# (which still has the old label set) and the skip condition misfires.
17+
# Re-evaluating on `unlabeled` closes the gap where a user could
18+
# apply the label, watch the check skip, then remove the label and
19+
# merge without the check ever running against the current state.
20+
types: [opened, synchronize, reopened, labeled, unlabeled]
21+
paths:
22+
- 'cmd/thv-operator/api/**'
23+
# files/crds is the source of truth — controller-gen emits here, and
24+
# crd-helm-wrapper copies from here into templates/. Any drift in
25+
# templates/ is caught by operator-ci.yml's generate-crds job, so
26+
# watching templates/ would be redundant. values.yaml and the
27+
# crd-helm-wrapper only affect Helm conditionals and annotations the
28+
# checker ignores, so they can't change what we compare.
29+
- 'deploy/charts/operator-crds/files/crds/**'
30+
# Self-exercise when either workflow file (real or no-op companion)
31+
# changes. The companion file reports the same required check on
32+
# PRs that don't touch the api surface; see api-compat-noop.yml.
33+
- '.github/workflows/api-compat*.yml'
34+
35+
permissions:
36+
contents: read
37+
38+
jobs:
39+
crd-schema-check:
40+
name: CRD Schema Compatibility
41+
runs-on: ubuntu-latest
42+
# Skip the check entirely when `api-break-allowed` is applied — a
43+
# required check that is skipped (rather than failed) counts as passing
44+
# for branch protection, so this is the escape hatch for intentional
45+
# breaks. Do not remove the label guard without a replacement path.
46+
if: ${{ !contains(github.event.pull_request.labels.*.name, 'api-break-allowed') }}
47+
# Expected runtime is ~1 minute (checkout + go setup + git fetch tag +
48+
# go install + per-CRD checker loop). 10 minutes is a cheap upper
49+
# bound that protects against a hung go install or git fetch.
50+
timeout-minutes: 10
51+
steps:
52+
- name: Checkout PR HEAD
53+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
54+
55+
- name: Set up Go
56+
uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6
57+
with:
58+
go-version: 'stable'
59+
cache: true
60+
61+
- name: Resolve baseline tag
62+
id: baseline
63+
env:
64+
GH_TOKEN: ${{ github.token }}
65+
run: |
66+
set -euo pipefail
67+
68+
# Baseline is the most recent release tag. Tags are immutable, so
69+
# comparing against the tag gives us a stable, released reference
70+
# without needing to render the Helm chart or pull from OCI.
71+
# Falling back to origin/main would silently compare against an
72+
# already-broken baseline once a break lands on main.
73+
LATEST_TAG="$(gh release list --repo "$GITHUB_REPOSITORY" --limit 1 --json tagName --jq '.[0].tagName')"
74+
if [ -z "$LATEST_TAG" ]; then
75+
echo "::error::No releases found for $GITHUB_REPOSITORY; cannot establish an API compatibility baseline."
76+
exit 1
77+
fi
78+
79+
# Fetch just the tag, shallow — no need to unshallow the repo.
80+
git fetch origin "refs/tags/$LATEST_TAG:refs/tags/$LATEST_TAG" --depth=1
81+
echo "tag=$LATEST_TAG" >> "$GITHUB_OUTPUT"
82+
83+
- name: Install crd-schema-checker
84+
# SHA-pinned: openshift/crd-schema-checker has no release tags at the
85+
# time of writing, so @latest is the only other option. Pinning makes
86+
# CI deterministic and mitigates supply-chain risk (upstream compromise
87+
# would otherwise execute attacker code on the runner with GITHUB_TOKEN
88+
# in env). Bump via a deliberate PR after verifying the new output
89+
# locally. SHA pinned on 2026-04-21.
90+
run: go install github.com/openshift/crd-schema-checker/cmd/crd-schema-checker@3fee146022bfe6f4adf84998de35d7267b864bef
91+
92+
- name: Check CRD schema compatibility
93+
id: checker
94+
env:
95+
# Route step outputs through env vars so bash quotes them instead
96+
# of the runner substituting them directly into the script body.
97+
# Defense-in-depth against a future edit that routes a
98+
# PR-controlled string through these outputs.
99+
BASELINE_TAG: ${{ steps.baseline.outputs.tag }}
100+
run: |
101+
set -euo pipefail
102+
103+
# NoBools and NoMaps are OpenShift API-style conventions, not
104+
# compat-breaking rules. They fire on fields we legitimately use
105+
# (e.g. embeddingservers.spec.modelCache.enabled) and drown out
106+
# real findings. Re-enable only if upstream clarifies breaking-
107+
# change semantics for them.
108+
DISABLED_VALIDATORS="NoBools,NoMaps"
109+
110+
CRD_DIR="deploy/charts/operator-crds/files/crds"
111+
mkdir -p /tmp/api-compat
112+
: > /tmp/api-compat/output.txt
113+
114+
OVERALL_EXIT=0
115+
116+
# Detect CRD files removed between baseline and HEAD — a removed
117+
# CRD is a break that the checker can't report (it needs both
118+
# inputs present). Compare the set of filenames directly.
119+
BASELINE_FILES=$(git ls-tree --name-only "$BASELINE_TAG" -- "$CRD_DIR/" | sed "s|$CRD_DIR/||" | sort)
120+
HEAD_FILES=$(ls "$CRD_DIR" | sort)
121+
REMOVED=$(comm -23 <(echo "$BASELINE_FILES") <(echo "$HEAD_FILES") || true)
122+
if [ -n "$REMOVED" ]; then
123+
{
124+
echo "ERROR: CRD files removed from HEAD (present at $BASELINE_TAG):"
125+
echo "$REMOVED" | sed 's/^/ - /'
126+
} | tee -a /tmp/api-compat/output.txt
127+
OVERALL_EXIT=1
128+
fi
129+
130+
# For each CRD present on HEAD, fetch the baseline version from the
131+
# tag and run the checker. New CRDs (HEAD-only) are additive and
132+
# skipped — note that in the output so reviewers see the full
133+
# inventory.
134+
for crd in "$CRD_DIR"/*.yaml; do
135+
fname=$(basename "$crd")
136+
rel="$CRD_DIR/$fname"
137+
if ! git show "$BASELINE_TAG:$rel" > /tmp/api-compat/baseline.yaml 2>/dev/null; then
138+
echo " (new CRD on HEAD, skipping: $fname)" >> /tmp/api-compat/output.txt
139+
continue
140+
fi
141+
set +e
142+
crd-schema-checker check-manifests \
143+
--existing-crd-filename /tmp/api-compat/baseline.yaml \
144+
--new-crd-filename "$crd" \
145+
--disabled-validators="$DISABLED_VALIDATORS" \
146+
>> /tmp/api-compat/output.txt 2>&1
147+
RC=$?
148+
set -e
149+
[ "$RC" -ne 0 ] && OVERALL_EXIT=1
150+
done
151+
152+
# Surface the combined output in the step log too, not only in the
153+
# summary — some reviewers check the raw log first.
154+
cat /tmp/api-compat/output.txt
155+
156+
if [ "$OVERALL_EXIT" -eq 0 ]; then
157+
STATUS="Compatible"
158+
else
159+
STATUS="Incompatible or Unknown"
160+
fi
161+
162+
{
163+
echo "## API Compatibility — CRD Schema Check"
164+
echo ""
165+
echo "**Baseline**: $BASELINE_TAG"
166+
echo "**Status**: $STATUS"
167+
echo ""
168+
echo "<details><summary>crd-schema-checker output</summary>"
169+
echo ""
170+
echo '```'
171+
cat /tmp/api-compat/output.txt
172+
echo '```'
173+
echo ""
174+
echo "</details>"
175+
} >> "$GITHUB_STEP_SUMMARY"
176+
177+
exit "$OVERALL_EXIT"

.github/workflows/claude.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ jobs:
5959

6060
- name: Run Claude Code
6161
id: claude
62-
uses: anthropics/claude-code-action@b47fd721da662d48c5680e154ad16a73ed74d2e0 # v1
62+
uses: anthropics/claude-code-action@567fe954a4527e81f132d87d1bdbcc94f7737434 # v1
6363
with:
6464
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
6565
# Security: Restrict tools to prevent arbitrary code execution.

.github/workflows/create-release-pr.yml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@ jobs:
3030
name: Create Release PR
3131
runs-on: ubuntu-latest
3232
steps:
33+
- name: Generate release app token
34+
id: app-token
35+
uses: actions/create-github-app-token@1b10c78c7865c340bc4f6099eb2f838309f1e8c3 # v3.1.1
36+
with:
37+
client-id: ${{ vars.RELEASE_APP_CLIENT_ID }}
38+
private-key: ${{ secrets.RELEASE_APP_PRIVATE_KEY }}
39+
3340
- name: Checkout
3441
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
3542

@@ -71,9 +78,9 @@ jobs:
7178
id: release
7279
uses: stacklok/releaseo@80e8d8131d41cf8763254d02360f2c5ce9b7c0df # v0.0.4
7380
with:
74-
releaseo_version: v0.0.3
81+
releaseo_version: v0.0.4
7582
bump_type: ${{ inputs.bump_type }}
76-
token: ${{ secrets.RELEASE_TOKEN }}
83+
token: ${{ steps.app-token.outputs.token }}
7784
version_files: |
7885
- file: deploy/charts/operator-crds/Chart.yaml
7986
path: version

0 commit comments

Comments
 (0)