Patch MCPServer spec instead of Update#4914
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4914 +/- ##
==========================================
+ Coverage 69.01% 69.02% +0.01%
==========================================
Files 554 554
Lines 73056 73075 +19
==========================================
+ Hits 50416 50439 +23
+ Misses 19640 19623 -17
- Partials 3000 3013 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dee0e05 to
0513e4e
Compare
0513e4e to
3187fdc
Compare
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision.
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision.
3a9a6d3 to
1c137de
Compare
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision.
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: kubernetes-expert, go-expert-developer, code-reviewer, toolhive-expert
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| 2 | Other MCPServer-writing reconcilers still Update spec | 8/10 | HIGH | Discuss scope |
| 3 | Verbatim 4-line rationale comment duplicated 3× → helper | 7/10 | MEDIUM | Fix |
| 4 | Hardcoded finalizer string; other CRDs have exported constants | 7/10 | MEDIUM | Fix |
| 5 | Rule text misstates why optimistic lock defends finalizers | 7/10 | MEDIUM | Fix |
| 6 | envtest finalizer-add case can degenerate into a no-op | 7/10 | LOW | Fix |
| 7 | patchRecordingClient silently discards patch.Data() errors |
7/10 | LOW | Fix |
| 8 | envtest cleanupServer swallows errors; uses non-lock MergeFrom |
7/10 | LOW | Fix |
Overall
The controller migration itself is correct: DeepCopy → mutate → Patch(MergeFromWithOptions(orig, MergeFromWithOptimisticLock{})) is the right tool for the "external controller owns a disjoint spec field" problem, the finalizer-defense reasoning is sound, and MergeFromWithOptimisticLock strictly improves on Update for this use case.
The PR scope closes the clobber hazard in mcpserver_controller.go, but toolconfig_controller.go:167 and mcpexternalauthconfig_controller.go:186 still call r.Update(ctx, &server) on MCPServer — so an external SSA writer of spec.authzConfig can still lose its write on any MCPToolConfig or MCPExternalAuthConfig reconcile.
The medium findings are polish: a DRY helper for the 3-site copy-paste pattern, an exported MCPServerFinalizerName constant to match every other CRD, and a correction to how the rule file explains the optimistic lock's role in defending finalizers.
Documentation
The rule file addition in .claude/rules/operator.md is load-bearing for the convention being introduced. See finding #5 — the "merge-patch has no array-merge semantics" paragraph conflates two independent mechanisms. The optimistic lock does not change merge-patch's array-replacement behavior; it forces a requeue on staleness, which then triggers a fresh Get that observes the concurrently-added finalizer.
Generated with Claude Code
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision. Co-authored-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
1c137de to
daf9140
Compare
Fixes #4767. The controller writes finalizers, finalizer removal, and the restart-processed annotation via r.Update. Update is a full PUT, so any spec field the operator does not track — most importantly spec.authzConfig, which a separate authorization controller will soon own — is zeroed on every reconcile. Replace the three Update call sites with an optimistic-lock merge patch. The merge-patch body carries only fields the caller changed, so untouched fields never hit the wire and cannot be clobbered. MergeFromWithOptimisticLock sends resourceVersion as a precondition, giving 409-on-collision semantics for concurrent writers and defending metadata.finalizers (which has no array-merge semantics under merge-patch) against wholesale replacement when another controller is mid-flight adding its own entry. Tests: - Envtest suite writes spec.authzConfig out-of-band and asserts it survives both the finalizer-add reconcile and the restart-annotation reconcile. - Unit suite uses a patch-recording client to assert each migrated call site emits a body carrying the resourceVersion precondition — a deterministic wire-level signal that MergeFromWithOptimisticLock is in effect. A regression to plain MergeFrom would drop the precondition and fail the assertion independent of the higher-level survival test. Also: - .claude/rules/operator.md: new "Spec / metadata patching" section documenting the pattern for future CR writes. Status patching is a separate follow-up (#4633). - Rename the mock-client flag failOnMCPServerUpdate → failOnMCPServerWrite; it now intercepts both Update and Patch on MCPServer, so the name matches reality.
daf9140 to
0cc5777
Compare
* Introduce MutateAndPatchSpec and adopt across spec-patch sites The inline DeepCopy + Patch(MergeFromWithOptimisticLock) pattern from #4914 landed at five MCPServer spec-write sites, each carrying the same four-line rationale comment. Extract it into a MutateAndPatchSpec[T] generic helper in cmd/thv-operator/pkg/controllerutil, siblinged with the existing MutateAndPatchStatus. The helper mirrors its sibling exactly -- same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow -- with two deliberate differences: - Uses MergeFromWithOptions(original, MergeFromWithOptimisticLock{}) so concurrent writers get 409-and-requeue instead of silent clobber. This is the property that defends spec.authzConfig, which the forthcoming authorization controller will own, from being zeroed on every reconcile. - No no-op short-circuit. MergeFromWithOptimisticLock always emits metadata.resourceVersion into the body, so the status helper's "body == {}" check never fires; and every current spec call site carries a real mutation. All five inline sites (mcpserver_controller.go finalizer add/remove and restart-annotation stamp; toolconfig_controller.go and mcpexternalauthconfig_controller.go config-hash stamps) adopt the helper. Pure refactor at the call sites -- no behavior change. Tests mirror the status helper's shape: happy-path + optimistic-lock wire signal, DeepCopy isolation, 409 Conflict propagation, nil-obj rejection, and disjoint-spec-field preservation (the regression guard that would fire if the helper were ever swapped back to r.Update). Existing TestMCPServerSpecPatchesAreOptimisticLock and the mcpserver_spec_patch_integration_test.go envtest remain green. The operator-rules section on spec/metadata patching is updated to reference the helper as the canonical pattern. * Address MoE review feedback on MutateAndPatchSpec Three of five review findings applied (two skipped with rebuttal in the PR thread): - Add TestMutateAndPatchSpec_NoOpMutateStillPatches. The existing AppliesMutationWithOptimisticLock test asserts resourceVersion is present in the body when a patch is issued; it cannot detect a short-circuit regression that issues zero patches. Pin the documented divergence from MutateAndPatchStatus (which DOES short-circuit) with a direct no-op-mutate wire-level assertion. - Fix the godoc "Typical usage" example in both patch.go and status.go to use the ctrlutil alias. operator.md already uses this alias, and every real caller aliases the package to avoid colliding with controller-runtime's own controllerutil package. patch.go is updated to match; status.go is brought along for symmetry. - Document that obj is mutated before Patch is issued, so on error the caller's in-memory copy is post-mutation. Callers must re-fetch rather than retrying in place. Added to both helpers. The standard reconciler return-and-requeue pattern is the correct retry path; the sentence names it explicitly so a future caller reading the doc does not invent an in-place retry loop. Also tightened the "don't use this for status" cross-reference in status.go to name the sibling MutateAndPatchSpec directly.
Summary
r.Update.Updateis a full PUT, so any spec field the operator does not track — most importantlyspec.authzConfig, which an external operator will soon own via server-side apply — is zeroed on every reconcile.Updatecall sites with an optimistic-lock merge patch (client.MergeFromWithOptions(orig, client.MergeFromWithOptimisticLock{})). The merge-patch body carries only fields the caller changed, so untouched fields never hit the wire and cannot be clobbered.MergeFromWithOptimisticLocksendsresourceVersionas a precondition, giving 409-on-collision semantics for concurrent writers and defendingmetadata.finalizers(which has no array-merge semantics under merge-patch) against wholesale replacement when another controller is mid-flight adding its own entry.Fixes #4767.
Type of change
Test plan
task test)task lint-fix)Unit tests:
cmd/thv-operator/controllers/mcpserver_spec_patch_test.gouses a patch-recording client wrapper to assert that each of the three migrated call sites (AddFinalizer, RemoveFinalizer, RestartAnnotation) emits a merge-patch body carrying theresourceVersionprecondition — a deterministic wire-level signal thatMergeFromWithOptimisticLockis in effect. A regression to plainMergeFromwould drop the precondition and fail the assertion independent of the higher-level survival test.Envtest integration:
cmd/thv-operator/test-integration/mcp-server/mcpserver_spec_patch_integration_test.gocreates an MCPServer, writesspec.authzConfigout-of-band from a second client, and asserts the field survives both the finalizer-add reconcile and the restart-annotation reconcile. Run viatask operator-test-integration.Manual:
kubectl applyan MCPServer,kubectl patch --type=mergeto setspec.authzConfig, wait for reconcile,kubectl get mcpserver -o yaml—spec.authzConfigpersists across reconciles.Does this introduce a user-facing change?
No.
Implementation plan
Approved implementation plan
First PR of the
r.Update→r.Patchmigration tracked in #4767 and #4633. This PR covers Track A (#4767): the three MCPServer spec writes that must move to an optimistic-lock merge patch before an external operator starts writingspec.authzConfigvia SSA. Status-subresource migration (#4633) is the separate Track B, starting in PR 2.Key design decisions:
MergeFromWithOptimisticLock{}for MCPServer spec patches — preserves conflict-detection parity withUpdate, forces requeue on concurrent SSA instead of silent clobber, defends againstmetadata.finalizersarray replacement.Special notes for reviewers
cmd/thv-operator/controllers/mcpserver_controller.go:196(RemoveFinalizer),:212(AddFinalizer), and:768(restart annotation). Each follows the sameDeepCopy → mutate → Patch(MergeFromWithOptions)idiom.mcpserver_restart_test.gorenames the mock-client flagfailOnMCPServerUpdate→failOnMCPServerWritebecause it now intercepts bothUpdateandPatchon MCPServer..claude/rules/operator.mddocumenting the pattern for future CR writes.Generated with Claude Code