Reload kubeconfig on disk changes for remote-cluster manager (CA / API server endpoint rotation without manual restart)

**Which component does this relate to?**

Manager bootstrap (`cmd/main.go`) — specifically the `ctrl.GetConfigOrDie()` call when the operator runs against a remote cluster with a kubeconfig mounted from a Secret/ConfigMap. Affects any deployment topology where the operator's kubeconfig is rotated by an external system (e.g. a Gardener-based runtime cluster, EKS IRSA renewal) rather than baked into the image or the in-cluster ServiceAccount.

**What is the reason for this feature request or change?**

When `metal-operator-controller-manager` is deployed against a remote cluster, it loads its kubeconfig once at startup via:

```go
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{...})
```

The resulting `rest.Config` is held by the manager for the process lifetime. `client-go`'s transport handles two of three rotation cases transparently:

| Rotation type | Form | Handled? |
|---|---|---|
| Bearer token | `tokenFile:` (path) | ✅ transport re-reads `BearerTokenFile` per request |
| CA certificate | `certificate-authority:` (path) | ✅ since `client-go v0.36`, `ClientsAllowCARotation` (Beta, default-on) reloads from the file at most every 5 min ([KEP-4222](https://kep.k8s.io/4222), [`atomicTransportHolder`](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/transport/ca_rotation.go)) |
| CA certificate | `certificate-authority-data:` (bytes) | ❌ frozen — embedded bytes excluded by the feature gate's `len(c.TLS.CAData) == 0` check |
| **API server endpoint** | `server: <url>` | ❌ **no reload mechanism exists** — `rest.Config.Host` is a string, baked into every `*http.Transport` |

The third case (endpoint change) is the only one that is genuinely unhandled today. It happens during runtime cluster rebuilds, control-plane zone migrations, and DR scenarios. The operator silently retains the stale endpoint until manually pod-deleted.

A secondary issue affects deployments whose kubeconfig embeds the CA as `certificate-authority-data:` bytes rather than a path reference; those deployments don't benefit from `ClientsAllowCARotation` and also need a manual restart on CA rotation.

Empirical evidence from a production deployment (`metal-operator-controller-manager` image `sha-715247e`, `client-go v0.36.1`):

- The operator's kubeconfig is provided as a `ConfigMap` referencing the CA + token via paths, so token + CA rotation already work without a restart. Pod uptime: 5 days, 0 restarts, actively reconciling.
- The same kubeconfig hardcodes `clusters[0].cluster.server`, which would require manual intervention if the API server endpoint ever changes.

So the fix is narrower than "implement credential reload" — it's specifically **detect kubeconfig-on-disk changes that `client-go` cannot pick up automatically (endpoint, embedded-bytes CA), and recover gracefully.**

**Describe the feature**

Add an opt-in watcher that detects changes to a configurable kubeconfig path (and any credential mount directories it references), and triggers a graceful manager shutdown so the pod restarts with fresh configuration.

Concretely:

1. New flag `--watch-kubeconfig` (default `false`, off-by-default to preserve existing behaviour).
2. New flag `--kubeconfig-watch-paths` (comma-separated, defaults to the directory containing `KUBECONFIG`) so deployers with split mounts (e.g. ConfigMap kubeconfig + Secret-mounted credentials) can list both directories.
3. New internal package `internal/kubeconfigwatcher` that uses `fsnotify` to watch each directory, tracks the resolved symlink targets of each watched file, and signals via a channel when any target changes. This mirrors the proven pattern in [`mcm-provider-ironcore-metal/pkg/client/provider.go:132-169`](https://github.com/ironcore-dev/machine-controller-manager-provider-ironcore-metal/blob/main/pkg/client/provider.go#L132-L169) (Kubernetes secret/configmap mounts use `..data` symlink swaps, so directory watching with target comparison is required — file-level watches miss the events).
4. On change: log the event, cancel the manager's root context, allow `mgr.Start()` to return, and `os.Exit(0)`. Kubelet recreates the pod with a fresh kubeconfig.
5. Set `LeaderElectionReleaseOnCancel: true` so the leader lease is released cleanly during shutdown — collapses the new pod's lease-acquisition wait from ~15 s to ~0 s. Currently commented out at `cmd/main.go:397`; safe to enable because `main` does no post-shutdown work.

End-to-end downtime per rotation event: ~10–30 s (reconcile drain + image-cache pod start + cache warm-up). Comparable to any other operator that handles credential rotation by restart.

**Proposed API or behavior changes**

No CRD changes. Two new `cmd/main.go` flags:

```text
--watch-kubeconfig                        bool, default false
    If true, watch the kubeconfig (and any directories listed in
    --kubeconfig-watch-paths) for changes and gracefully shut down on
    detected change so kubelet restarts the pod with fresh credentials.

--kubeconfig-watch-paths                  string, default ""
    Comma-separated list of additional directories to watch for credential
    files referenced by the kubeconfig (e.g. CA bundle, token file). If
    empty, only the directory containing the kubeconfig is watched.
```

Deployment-side recommendation that the README / chart can document:

- Prefer kubeconfig **path form** (`certificate-authority:` + `tokenFile:`) over embedded-bytes form. With path form on `client-go ≥ v0.36`, `ClientsAllowCARotation` handles CA rotation automatically; no watcher needed for that case.

**Alternatives considered**

1. **In-process manager rebuild** (cancel inner ctx, build a new manager, re-register all controllers + webhooks + indexers + runnables). Saves ~5 s of downtime versus restart but adds significant code (~500 LOC) and many failure modes (leaked goroutines, webhook port re-bind contention, partial re-registration). Webhook listener still has to tear down and rebind on the same port, so the gap reduction over restart is small. Not worth the complexity.
2. **Hot-swap `client.Client` only** (literal mcm-provider-ironcore-metal pattern). Insufficient: controller-runtime's caches and informers hold transports built from the original `rest.Config`. Swapping the user-facing client doesn't replace those, so endpoint changes wouldn't be picked up by `mgr.GetCache()`-backed reads or ongoing watches. Works for MCM because MCM doesn't run a controller-runtime manager.
3. **Periodic re-read** of the kubeconfig instead of fsnotify. Polling adds detection latency for no code-complexity win; `fsnotify` is already a transitive dependency via controller-runtime's `pkg/certwatcher`.
4. **Helm chart annotation-driven reload** (e.g. `stakater/reloader`, no Go code change). Sufficient for deployments where the kubeconfig ConfigMap/Secret is updated via helm rollouts that the reloader can observe, but doesn't help when credentials are rotated out-of-band by an external controller (e.g. Gardener token-requestor writing directly to the Secret). A Go-side watcher complements rather than competes with this.

**Additional context**

- Reference implementation pattern: [`mcm-provider-ironcore-metal/pkg/client/provider.go:132-169`](https://github.com/ironcore-dev/machine-controller-manager-provider-ironcore-metal/blob/main/pkg/client/provider.go#L132-L169) (proven in production).
- `client-go` CA rotation feature gate: [KEP-4222](https://kep.k8s.io/4222), `ClientsAllowCARotation` (Beta + default-on in v1.36+).
- Related upstream PR (different scope, closed): [#178 — Add kustomization for managing remote clusters](https://github.com/ironcore-dev/metal-operator/pull/178).


Rotation type	Form	Handled?
Bearer token	`tokenFile:` (path)	✅ transport re-reads `BearerTokenFile` per request
CA certificate	`certificate-authority:` (path)	✅ since `client-go v0.36`, `ClientsAllowCARotation` (Beta, default-on) reloads from the file at most every 5 min (KEP-4222, `atomicTransportHolder`)
CA certificate	`certificate-authority-data:` (bytes)	❌ frozen — embedded bytes excluded by the feature gate's `len(c.TLS.CAData) == 0` check
API server endpoint	`server: <url>`	❌ no reload mechanism exists — `rest.Config.Host` is a string, baked into every `*http.Transport`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reload kubeconfig on disk changes for remote-cluster manager (CA / API server endpoint rotation without manual restart) #922

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Reload kubeconfig on disk changes for remote-cluster manager (CA / API server endpoint rotation without manual restart) #922

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions