@@ -282,6 +282,71 @@ When `tls.enabled=true`, the chart creates a cert-manager `Certificate` resource
282282and mounts the resulting Secret. The `issuerName` defaults to the janitor's existing
283283self-signed Issuer, reusing NVSentinel's existing PKI infrastructure.
284284
285+ # ## Certificate Rotation
286+
287+ cert-manager automatically rotates certificates before expiry. The current reference
288+ implementation loads certificates once at startup — neither the server's TLS cert
289+ nor the client's CA bundle is reloaded at runtime. This means a cert rotation by
290+ cert-manager would not take effect until the affected pod restarts.
291+
292+ For SA tokens, rotation is already handled : the client interceptor re-reads the
293+ token file on every gRPC call, so kubelet's automatic token refresh is picked up
294+ immediately.
295+
296+ # ### Server-Side Cert Rotation
297+
298+ The janitor-provider loads its TLS key pair once via `tls.LoadX509KeyPair()` in
299+ ` main.go` . When cert-manager rotates the server certificate, the provider continues
300+ serving the old cert until restarted.
301+
302+ **Recommended fix**: Use the `certwatcher` package from `controller-runtime`, which
303+ watches certificate files for changes and hot-reloads them via the
304+ ` tls.Config.GetCertificate` callback. This is the same pattern already used in the
305+ janitor for webhook and metrics TLS :
306+
307+ ` ` ` go
308+ // Existing pattern in janitor/main.go for webhook certs:
309+ watcher, err := certwatcher.New(certPath, keyPath)
310+ mgr.Add(watcher) // starts filesystem watch
311+
312+ tlsConfig := &tls.Config{
313+ GetCertificate: watcher.GetCertificate,
314+ MinVersion: tls.VersionTLS12,
315+ }
316+ ` ` `
317+
318+ The janitor-provider does not use `controller-runtime`'s manager, so it would need
319+ to either :
320+
321+ 1. Start the `certwatcher` goroutine manually (it implements `Runnable` with a
322+ ` Start(ctx)` method)
323+ 2. Use a standalone filesystem watcher (e.g., `fsnotify`) to reload the key pair
324+
325+ Option 1 is preferred since it reuses the same well-tested package.
326+
327+ # ### Client-Side CA Bundle Rotation
328+
329+ The janitor controller loads the CA bundle once in `NewCSPProviderDialOptions()` and
330+ builds a static `x509.CertPool`. If the CA itself rotates (e.g., the self-signed
331+ Issuer regenerates its CA key), the client will reject the new server cert because
332+ its cert pool is stale.
333+
334+ In practice, CA rotation is infrequent — cert-manager rotates leaf certificates
335+ but the CA key typically remains stable. However, for robustness :
336+
337+ **Option A — Periodic reload**: A background goroutine re-reads the CA bundle file
338+ on an interval (e.g., every 5 minutes) and swaps the `RootCAs` cert pool atomically.
339+
340+ **Option B — File watcher**: Use `fsnotify` or `certwatcher` to watch the CA bundle
341+ file and reload on change.
342+
343+ **Option C — Accept restart-on-CA-rotation**: Since CA rotation is rare and
344+ typically planned, accept that the janitor pod needs a restart when the CA changes.
345+ This is the simplest approach and may be sufficient for most deployments.
346+
347+ The reference implementation currently follows option C implicitly. Options A or B
348+ can be added if CA rotation without downtime becomes a requirement.
349+
285350# ## Test Compatibility
286351
287352Existing unit tests use `bufconn` with insecure credentials. The `CSPProviderInsecure`
0 commit comments