Which component does this relate to?
Manager bootstrap (cmd/main.go) — specifically the ctrl.GetConfigOrDie() call when the operator runs against a remote cluster with a kubeconfig mounted from a Secret/ConfigMap. Affects any deployment topology where the operator's kubeconfig is rotated by an external system (e.g. a Gardener-based runtime cluster, EKS IRSA renewal) rather than baked into the image or the in-cluster ServiceAccount.
What is the reason for this feature request or change?
When metal-operator-controller-manager is deployed against a remote cluster, it loads its kubeconfig once at startup via:
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{...})
The resulting rest.Config is held by the manager for the process lifetime. client-go's transport handles two of three rotation cases transparently:
| Rotation type |
Form |
Handled? |
| Bearer token |
tokenFile: (path) |
✅ transport re-reads BearerTokenFile per request |
| CA certificate |
certificate-authority: (path) |
✅ since client-go v0.36, ClientsAllowCARotation (Beta, default-on) reloads from the file at most every 5 min (KEP-4222, atomicTransportHolder) |
| CA certificate |
certificate-authority-data: (bytes) |
❌ frozen — embedded bytes excluded by the feature gate's len(c.TLS.CAData) == 0 check |
| API server endpoint |
server: <url> |
❌ no reload mechanism exists — rest.Config.Host is a string, baked into every *http.Transport |
The third case (endpoint change) is the only one that is genuinely unhandled today. It happens during runtime cluster rebuilds, control-plane zone migrations, and DR scenarios. The operator silently retains the stale endpoint until manually pod-deleted.
A secondary issue affects deployments whose kubeconfig embeds the CA as certificate-authority-data: bytes rather than a path reference; those deployments don't benefit from ClientsAllowCARotation and also need a manual restart on CA rotation.
Empirical evidence from a production deployment (metal-operator-controller-manager image sha-715247e, client-go v0.36.1):
- The operator's kubeconfig is provided as a
ConfigMap referencing the CA + token via paths, so token + CA rotation already work without a restart. Pod uptime: 5 days, 0 restarts, actively reconciling.
- The same kubeconfig hardcodes
clusters[0].cluster.server, which would require manual intervention if the API server endpoint ever changes.
So the fix is narrower than "implement credential reload" — it's specifically detect kubeconfig-on-disk changes that client-go cannot pick up automatically (endpoint, embedded-bytes CA), and recover gracefully.
Describe the feature
Add an opt-in watcher that detects changes to a configurable kubeconfig path (and any credential mount directories it references), and triggers a graceful manager shutdown so the pod restarts with fresh configuration.
Concretely:
- New flag
--watch-kubeconfig (default false, off-by-default to preserve existing behaviour).
- New flag
--kubeconfig-watch-paths (comma-separated, defaults to the directory containing KUBECONFIG) so deployers with split mounts (e.g. ConfigMap kubeconfig + Secret-mounted credentials) can list both directories.
- New internal package
internal/kubeconfigwatcher that uses fsnotify to watch each directory, tracks the resolved symlink targets of each watched file, and signals via a channel when any target changes. This mirrors the proven pattern in mcm-provider-ironcore-metal/pkg/client/provider.go:132-169 (Kubernetes secret/configmap mounts use ..data symlink swaps, so directory watching with target comparison is required — file-level watches miss the events).
- On change: log the event, cancel the manager's root context, allow
mgr.Start() to return, and os.Exit(0). Kubelet recreates the pod with a fresh kubeconfig.
- Set
LeaderElectionReleaseOnCancel: true so the leader lease is released cleanly during shutdown — collapses the new pod's lease-acquisition wait from ~15 s to ~0 s. Currently commented out at cmd/main.go:397; safe to enable because main does no post-shutdown work.
End-to-end downtime per rotation event: ~10–30 s (reconcile drain + image-cache pod start + cache warm-up). Comparable to any other operator that handles credential rotation by restart.
Proposed API or behavior changes
No CRD changes. Two new cmd/main.go flags:
--watch-kubeconfig bool, default false
If true, watch the kubeconfig (and any directories listed in
--kubeconfig-watch-paths) for changes and gracefully shut down on
detected change so kubelet restarts the pod with fresh credentials.
--kubeconfig-watch-paths string, default ""
Comma-separated list of additional directories to watch for credential
files referenced by the kubeconfig (e.g. CA bundle, token file). If
empty, only the directory containing the kubeconfig is watched.
Deployment-side recommendation that the README / chart can document:
- Prefer kubeconfig path form (
certificate-authority: + tokenFile:) over embedded-bytes form. With path form on client-go ≥ v0.36, ClientsAllowCARotation handles CA rotation automatically; no watcher needed for that case.
Alternatives considered
- In-process manager rebuild (cancel inner ctx, build a new manager, re-register all controllers + webhooks + indexers + runnables). Saves ~5 s of downtime versus restart but adds significant code (~500 LOC) and many failure modes (leaked goroutines, webhook port re-bind contention, partial re-registration). Webhook listener still has to tear down and rebind on the same port, so the gap reduction over restart is small. Not worth the complexity.
- Hot-swap
client.Client only (literal mcm-provider-ironcore-metal pattern). Insufficient: controller-runtime's caches and informers hold transports built from the original rest.Config. Swapping the user-facing client doesn't replace those, so endpoint changes wouldn't be picked up by mgr.GetCache()-backed reads or ongoing watches. Works for MCM because MCM doesn't run a controller-runtime manager.
- Periodic re-read of the kubeconfig instead of fsnotify. Polling adds detection latency for no code-complexity win;
fsnotify is already a transitive dependency via controller-runtime's pkg/certwatcher.
- Helm chart annotation-driven reload (e.g.
stakater/reloader, no Go code change). Sufficient for deployments where the kubeconfig ConfigMap/Secret is updated via helm rollouts that the reloader can observe, but doesn't help when credentials are rotated out-of-band by an external controller (e.g. Gardener token-requestor writing directly to the Secret). A Go-side watcher complements rather than competes with this.
Additional context
Which component does this relate to?
Manager bootstrap (
cmd/main.go) — specifically thectrl.GetConfigOrDie()call when the operator runs against a remote cluster with a kubeconfig mounted from a Secret/ConfigMap. Affects any deployment topology where the operator's kubeconfig is rotated by an external system (e.g. a Gardener-based runtime cluster, EKS IRSA renewal) rather than baked into the image or the in-cluster ServiceAccount.What is the reason for this feature request or change?
When
metal-operator-controller-manageris deployed against a remote cluster, it loads its kubeconfig once at startup via:The resulting
rest.Configis held by the manager for the process lifetime.client-go's transport handles two of three rotation cases transparently:tokenFile:(path)BearerTokenFileper requestcertificate-authority:(path)client-go v0.36,ClientsAllowCARotation(Beta, default-on) reloads from the file at most every 5 min (KEP-4222,atomicTransportHolder)certificate-authority-data:(bytes)len(c.TLS.CAData) == 0checkserver: <url>rest.Config.Hostis a string, baked into every*http.TransportThe third case (endpoint change) is the only one that is genuinely unhandled today. It happens during runtime cluster rebuilds, control-plane zone migrations, and DR scenarios. The operator silently retains the stale endpoint until manually pod-deleted.
A secondary issue affects deployments whose kubeconfig embeds the CA as
certificate-authority-data:bytes rather than a path reference; those deployments don't benefit fromClientsAllowCARotationand also need a manual restart on CA rotation.Empirical evidence from a production deployment (
metal-operator-controller-managerimagesha-715247e,client-go v0.36.1):ConfigMapreferencing the CA + token via paths, so token + CA rotation already work without a restart. Pod uptime: 5 days, 0 restarts, actively reconciling.clusters[0].cluster.server, which would require manual intervention if the API server endpoint ever changes.So the fix is narrower than "implement credential reload" — it's specifically detect kubeconfig-on-disk changes that
client-gocannot pick up automatically (endpoint, embedded-bytes CA), and recover gracefully.Describe the feature
Add an opt-in watcher that detects changes to a configurable kubeconfig path (and any credential mount directories it references), and triggers a graceful manager shutdown so the pod restarts with fresh configuration.
Concretely:
--watch-kubeconfig(defaultfalse, off-by-default to preserve existing behaviour).--kubeconfig-watch-paths(comma-separated, defaults to the directory containingKUBECONFIG) so deployers with split mounts (e.g. ConfigMap kubeconfig + Secret-mounted credentials) can list both directories.internal/kubeconfigwatcherthat usesfsnotifyto watch each directory, tracks the resolved symlink targets of each watched file, and signals via a channel when any target changes. This mirrors the proven pattern inmcm-provider-ironcore-metal/pkg/client/provider.go:132-169(Kubernetes secret/configmap mounts use..datasymlink swaps, so directory watching with target comparison is required — file-level watches miss the events).mgr.Start()to return, andos.Exit(0). Kubelet recreates the pod with a fresh kubeconfig.LeaderElectionReleaseOnCancel: trueso the leader lease is released cleanly during shutdown — collapses the new pod's lease-acquisition wait from ~15 s to ~0 s. Currently commented out atcmd/main.go:397; safe to enable becausemaindoes no post-shutdown work.End-to-end downtime per rotation event: ~10–30 s (reconcile drain + image-cache pod start + cache warm-up). Comparable to any other operator that handles credential rotation by restart.
Proposed API or behavior changes
No CRD changes. Two new
cmd/main.goflags:Deployment-side recommendation that the README / chart can document:
certificate-authority:+tokenFile:) over embedded-bytes form. With path form onclient-go ≥ v0.36,ClientsAllowCARotationhandles CA rotation automatically; no watcher needed for that case.Alternatives considered
client.Clientonly (literal mcm-provider-ironcore-metal pattern). Insufficient: controller-runtime's caches and informers hold transports built from the originalrest.Config. Swapping the user-facing client doesn't replace those, so endpoint changes wouldn't be picked up bymgr.GetCache()-backed reads or ongoing watches. Works for MCM because MCM doesn't run a controller-runtime manager.fsnotifyis already a transitive dependency via controller-runtime'spkg/certwatcher.stakater/reloader, no Go code change). Sufficient for deployments where the kubeconfig ConfigMap/Secret is updated via helm rollouts that the reloader can observe, but doesn't help when credentials are rotated out-of-band by an external controller (e.g. Gardener token-requestor writing directly to the Secret). A Go-side watcher complements rather than competes with this.Additional context
mcm-provider-ironcore-metal/pkg/client/provider.go:132-169(proven in production).client-goCA rotation feature gate: KEP-4222,ClientsAllowCARotation(Beta + default-on in v1.36+).