Skip to content

Surface StorageVersionMigrator behind chart feature flag (opt-in)#5418

Merged
ChrisJBurns merged 1 commit into
mainfrom
chris/svm-chart-surface
Jun 2, 2026
Merged

Surface StorageVersionMigrator behind chart feature flag (opt-in)#5418
ChrisJBurns merged 1 commit into
mainfrom
chris/svm-chart-surface

Conversation

@ChrisJBurns

@ChrisJBurns ChrisJBurns commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Exposes the StorageVersionMigrator controller (PR-A, #5362) to operator users via a new helm chart feature flag.

The feature stays opt-in via the chart value (default false). Admins who want auto-migration set operator.features.storageVersionMigrator: true at install or upgrade time. A future release can flip the default if/when we want the feature on by default — that flip is intentionally a separate decision.

User-facing reference docs and the upgrade-guide walkthrough are deliberately not in this PR — they land separately.

Part of #4969. Depends on #5362 (merged) and #5391 (merged).

Medium level
  • New operator.features.storageVersionMigrator value in deploy/charts/operator/values.yaml, defaulting to false. helm-docs comment explains what the feature does — no temporal claims about when the default might change.
  • deployment.yaml wires TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR env var from the chart value.
  • deploy/charts/operator/README.md regenerated by task helm-docs to reflect the new value.
  • No operator-code change. The default in cmd/thv-operator/app/app.go's isStorageVersionMigratorEnabled() stays false — chart-level opt-in is sufficient for helm-installed operators, and keeping the operator-code default unchanged matches PR-A's posture.
Low level
File Change
deploy/charts/operator/values.yaml New operator.features.storageVersionMigrator: false value (+6 lines)
deploy/charts/operator/templates/deployment.yaml New env-var block wiring TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR (+2 lines)
deploy/charts/operator/README.md helm-docs regeneration (+2 / -1)

Type of change

  • New feature (additive — extends an existing env-var-driven feature flag with a chart-level surface)
  • Bug fix
  • Breaking change
  • Refactoring
  • Documentation
  • Other

Test plan

  • task build clean
  • helm lint deploy/charts/operator clean
  • task helm-docs produces no further diff after the value is added
  • TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR env var wiring matches the constant the controller reads in cmd/thv-operator/app/app.go

Why default false

Opt-in posture for one release:

  • Lets operators try the feature on staging before flipping their production charts.
  • Avoids the situation where users on helm upgrade get a behavioural change they didn't anticipate.
  • Defers the "feature must be on for the version-removal release to be safe" coupling to a deliberate later flip, when the upgrade story is more visible to users.

A future release will flip the chart default to true. That flip is intentionally a separate decision.

Follow-ups (separate PRs / issues)

  • User-facing reference docs and upgrade-guide walkthrough (will land in a follow-up PR).
  • Operator-code default flip in app/app.go (paired with the chart-default flip).
  • Chart-conditional RBAC gating (template the migrator's RBAC rules behind if .Values.operator.features.storageVersionMigrator).
  • CRD-level MigrationStuck Condition + Warning event for better SRE visibility.
  • Full Prometheus metrics suite (crs_processed_total, reconcile_duration_seconds, cache_size, consecutive_conflict_passes).

Generated with Claude Code

@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label Jun 1, 2026
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.77%. Comparing base (05ca226) to head (cf9c977).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5418      +/-   ##
==========================================
+ Coverage   68.76%   68.77%   +0.01%     
==========================================
  Files         629      629              
  Lines       63937    63937              
==========================================
+ Hits        43964    43972       +8     
+ Misses      16721    16712       -9     
- Partials     3252     3253       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds an opt-in chart value (`operator.features.storageVersionMigrator`,
default `false`) and wires `TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR`
into the operator deployment. helm-docs README regenerated.

The controller (PR-A, #5362) and the opt-in labels + CI guard
(PR-B, #5391) are on main. This PR is the chart-side surface that
lets operators opt in via `helm install --set` or values.yaml.

Reference and upgrade-guide docs are intentionally separate — they
land in a follow-up PR.

Closes #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ChrisJBurns ChrisJBurns force-pushed the chris/svm-chart-surface branch from ff9a780 to cf9c977 Compare June 1, 2026 17:27
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 1, 2026
@ChrisJBurns ChrisJBurns changed the title Surface StorageVersionMigrator behind chart feature flag (opt-in) + docs Surface StorageVersionMigrator behind chart feature flag (opt-in) Jun 1, 2026
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Jun 1, 2026
@ChrisJBurns ChrisJBurns merged commit 974aab5 into main Jun 2, 2026
43 checks passed
@ChrisJBurns ChrisJBurns deleted the chris/svm-chart-surface branch June 2, 2026 13:13
ChrisJBurns added a commit that referenced this pull request Jun 11, 2026
* Add storage-version migration docs + upgrade-guide walkthrough

End-to-end user documentation for the StorageVersionMigrator
controller now that the chart surface (PR #5418) is in main.

Reference doc — docs/operator/storage-version-migration.md:
- Describes the actual shipped mechanism: plain Get+Update on the
  main resource, per-CR conflict retry (max 3), RequeueAfter
  sentinel on the conflict path.
- Admission-policy compatibility section: webhooks fire on every
  Update before the bytes-equality elision check, so policies
  (Kyverno/Gatekeeper/OPA) that reject same-spec round-trip
  Updates will prevent the migrator from converging.
- ⚠ Skip-a-version upgrade trap section: clusters that bypass an
  intermediate release that runs the migrator will hit a helm
  upgrade failure at the version-removal release;
  kube-storage-version-migrator documented as the recovery path.
- Label contract reflects the no-escape-hatch rule from PR-B —
  every storage-version root type must carry the migrate marker.
- RBAC list matches what's actually on main.

Upgrade-guide walkthrough — docs/operator/upgrade-guide/:
- Reproducible kind-cluster end-to-end test of the v1alpha1→v1beta1
  graduation, verifying storedVersions converges to [v1beta1] on
  all 12 graduated CRDs after enabling the migrator.
- CR fixtures for v1alpha1 and v1beta1 of all 12 graduated kinds.

Supersedes #5011, which carried earlier-draft versions that
described the pre-review SSA-on-/status mechanism, the removed
exclude marker, and the old env-var name.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Scope upgrade walkthrough to the 12 graduated CRDs explicitly

Addresses six review findings on PR #5451:

F1 (HIGH): The chart ships 13 labeled CRDs, not 12. The 13th
(mcpwebhookconfigs) is single-version v1alpha1 and not part of
the v1alpha1 → v1beta1 graduation this walkthrough demonstrates.
Adds a scope note in the intro and introduces an explicit
GRADUATED_CRDS array in Step 2 used by every verification loop
in steps 8–11 so the loops don't drift.

F2 (HIGH): Step 11's bulk patch loop previously iterated every
toolhive CRD. For mcpwebhookconfigs the jq filter "map(select(
.name != "v1alpha1"))" produces an empty array and the apiserver
rejects the patch. Scoping the loop to GRADUATED_CRDS fixes
this — only multi-version CRDs are touched.

F3 (MED): Step 8's baseline "every line ends with
[v1alpha1,v1beta1]" was false for the 13th CRD which reads
[v1alpha1]. Loop now iterates GRADUATED_CRDS only; the scope
note explains why the 13th is excluded.

F4 (MED): Step 9's convergence loop hardcoded "12" and silently
exited if convergence never landed. Loop count is now derived
from ${#GRADUATED_CRDS[@]} so a future PR adding a graduated CRD
won't drift, and a post-loop conditional surfaces an actionable
timeout message naming exactly what didn't converge.

F5 (LOW): The companion-reading bullet described the reference
doc as covering "label contract, opt-out, mechanism." The doc
explicitly removed the exclude marker in PR-B; it's an opt-in
model. Reworded.

F6 (LOW): Steps 5 and 9 selected a pod via "kubectl get pods |
grep | head -1", which can read the still-Terminating old pod
after a helm upgrade and show the pre-upgrade env value.
Replaced with a Deployment-spec jsonpath read that avoids the
race entirely. Step 10 still uses pod selection (logs aren't
readable off a Deployment) but with --field-selector=status.phase=Running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Enable migrator flag explicitly in upgrade step 9

Step 9 set only the three image values on the helm upgrade, so the
StorageVersionMigrator never turned on: the chart defaults
operator.features.storageVersionMigrator to false, and helm upgrade does
not reuse the previous release's --set values, so the env var rendered as
false and storedVersions stayed stuck at [v1alpha1, v1beta1]. Following
the guide verbatim never converged the CRDs.

Add --set operator.features.storageVersionMigrator=true to step 9 and note
why it is required. Verified end-to-end against a local kind cluster: all
12 graduated CRDs converge to [v1beta1] with the flag set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants