Surface StorageVersionMigrator behind chart feature flag (opt-in)#5418
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5418 +/- ##
==========================================
+ Coverage 68.76% 68.77% +0.01%
==========================================
Files 629 629
Lines 63937 63937
==========================================
+ Hits 43964 43972 +8
+ Misses 16721 16712 -9
- Partials 3252 3253 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Adds an opt-in chart value (`operator.features.storageVersionMigrator`, default `false`) and wires `TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR` into the operator deployment. helm-docs README regenerated. The controller (PR-A, #5362) and the opt-in labels + CI guard (PR-B, #5391) are on main. This PR is the chart-side surface that lets operators opt in via `helm install --set` or values.yaml. Reference and upgrade-guide docs are intentionally separate — they land in a follow-up PR. Closes #4969. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ff9a780 to
cf9c977
Compare
JAORMX
approved these changes
Jun 2, 2026
This was referenced Jun 3, 2026
ChrisJBurns
added a commit
that referenced
this pull request
Jun 11, 2026
* Add storage-version migration docs + upgrade-guide walkthrough End-to-end user documentation for the StorageVersionMigrator controller now that the chart surface (PR #5418) is in main. Reference doc — docs/operator/storage-version-migration.md: - Describes the actual shipped mechanism: plain Get+Update on the main resource, per-CR conflict retry (max 3), RequeueAfter sentinel on the conflict path. - Admission-policy compatibility section: webhooks fire on every Update before the bytes-equality elision check, so policies (Kyverno/Gatekeeper/OPA) that reject same-spec round-trip Updates will prevent the migrator from converging. - ⚠ Skip-a-version upgrade trap section: clusters that bypass an intermediate release that runs the migrator will hit a helm upgrade failure at the version-removal release; kube-storage-version-migrator documented as the recovery path. - Label contract reflects the no-escape-hatch rule from PR-B — every storage-version root type must carry the migrate marker. - RBAC list matches what's actually on main. Upgrade-guide walkthrough — docs/operator/upgrade-guide/: - Reproducible kind-cluster end-to-end test of the v1alpha1→v1beta1 graduation, verifying storedVersions converges to [v1beta1] on all 12 graduated CRDs after enabling the migrator. - CR fixtures for v1alpha1 and v1beta1 of all 12 graduated kinds. Supersedes #5011, which carried earlier-draft versions that described the pre-review SSA-on-/status mechanism, the removed exclude marker, and the old env-var name. Part of #4969. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Scope upgrade walkthrough to the 12 graduated CRDs explicitly Addresses six review findings on PR #5451: F1 (HIGH): The chart ships 13 labeled CRDs, not 12. The 13th (mcpwebhookconfigs) is single-version v1alpha1 and not part of the v1alpha1 → v1beta1 graduation this walkthrough demonstrates. Adds a scope note in the intro and introduces an explicit GRADUATED_CRDS array in Step 2 used by every verification loop in steps 8–11 so the loops don't drift. F2 (HIGH): Step 11's bulk patch loop previously iterated every toolhive CRD. For mcpwebhookconfigs the jq filter "map(select( .name != "v1alpha1"))" produces an empty array and the apiserver rejects the patch. Scoping the loop to GRADUATED_CRDS fixes this — only multi-version CRDs are touched. F3 (MED): Step 8's baseline "every line ends with [v1alpha1,v1beta1]" was false for the 13th CRD which reads [v1alpha1]. Loop now iterates GRADUATED_CRDS only; the scope note explains why the 13th is excluded. F4 (MED): Step 9's convergence loop hardcoded "12" and silently exited if convergence never landed. Loop count is now derived from ${#GRADUATED_CRDS[@]} so a future PR adding a graduated CRD won't drift, and a post-loop conditional surfaces an actionable timeout message naming exactly what didn't converge. F5 (LOW): The companion-reading bullet described the reference doc as covering "label contract, opt-out, mechanism." The doc explicitly removed the exclude marker in PR-B; it's an opt-in model. Reworded. F6 (LOW): Steps 5 and 9 selected a pod via "kubectl get pods | grep | head -1", which can read the still-Terminating old pod after a helm upgrade and show the pre-upgrade env value. Replaced with a Deployment-spec jsonpath read that avoids the race entirely. Step 10 still uses pod selection (logs aren't readable off a Deployment) but with --field-selector=status.phase=Running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Enable migrator flag explicitly in upgrade step 9 Step 9 set only the three image values on the helm upgrade, so the StorageVersionMigrator never turned on: the chart defaults operator.features.storageVersionMigrator to false, and helm upgrade does not reuse the previous release's --set values, so the env var rendered as false and storedVersions stayed stuck at [v1alpha1, v1beta1]. Following the guide verbatim never converged the CRDs. Add --set operator.features.storageVersionMigrator=true to step 9 and note why it is required. Verified end-to-end against a local kind cluster: all 12 graduated CRDs converge to [v1beta1] with the flag set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Exposes the StorageVersionMigrator controller (PR-A, #5362) to operator users via a new helm chart feature flag.
The feature stays opt-in via the chart value (default
false). Admins who want auto-migration setoperator.features.storageVersionMigrator: trueat install or upgrade time. A future release can flip the default if/when we want the feature on by default — that flip is intentionally a separate decision.User-facing reference docs and the upgrade-guide walkthrough are deliberately not in this PR — they land separately.
Part of #4969. Depends on #5362 (merged) and #5391 (merged).
Medium level
operator.features.storageVersionMigratorvalue indeploy/charts/operator/values.yaml, defaulting tofalse. helm-docs comment explains what the feature does — no temporal claims about when the default might change.deployment.yamlwiresTOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATORenv var from the chart value.deploy/charts/operator/README.mdregenerated bytask helm-docsto reflect the new value.cmd/thv-operator/app/app.go'sisStorageVersionMigratorEnabled()staysfalse— chart-level opt-in is sufficient for helm-installed operators, and keeping the operator-code default unchanged matches PR-A's posture.Low level
deploy/charts/operator/values.yamloperator.features.storageVersionMigrator: falsevalue (+6 lines)deploy/charts/operator/templates/deployment.yamlTOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR(+2 lines)deploy/charts/operator/README.mdType of change
Test plan
task buildcleanhelm lint deploy/charts/operatorcleantask helm-docsproduces no further diff after the value is addedTOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATORenv var wiring matches the constant the controller reads incmd/thv-operator/app/app.goWhy default
falseOpt-in posture for one release:
helm upgradeget a behavioural change they didn't anticipate.A future release will flip the chart default to
true. That flip is intentionally a separate decision.Follow-ups (separate PRs / issues)
app/app.go(paired with the chart-default flip).if .Values.operator.features.storageVersionMigrator).MigrationStuckCondition + Warning event for better SRE visibility.crs_processed_total,reconcile_duration_seconds,cache_size,consecutive_conflict_passes).Generated with Claude Code