@@ -282,55 +282,52 @@ self-signed Issuer, reusing NVSentinel's existing PKI infrastructure.
282282
283283# ## Certificate Rotation
284284
285- cert-manager automatically rotates certificates before expiry. The current reference
286- implementation loads certificates once at startup — neither the server's TLS cert
287- nor the client's CA bundle is reloaded at runtime. This means a cert rotation by
288- cert-manager would not take effect until the affected pod restarts.
285+ cert-manager automatically rotates leaf certificates before expiry. The design needs
286+ to account for how each component picks up rotated credentials without manual
287+ intervention.
289288
290- For SA tokens, rotation is already handled : the client interceptor re-reads the
291- token file on every gRPC call, so kubelet's automatic token refresh is picked up
292- immediately.
289+ ** SA tokens** are the simplest case: because the client interceptor re-reads the
290+ token file on every gRPC call, kubelet's automatic token refresh is picked up
291+ immediately with no additional machinery .
293292
294- # ### Server-Side Cert Rotation
293+ **TLS certificates** require more consideration. A naive implementation that loads
294+ certs once at startup would serve stale credentials after cert-manager rotates the
295+ certificate, requiring a pod restart to pick up the new cert. The approaches below
296+ avoid that.
295297
296- The janitor-provider loads its TLS key pair once via `tls.LoadX509KeyPair()` in
297- ` main.go` . When cert-manager rotates the server certificate, the provider continues
298- serving the old cert until restarted.
298+ # ### Server-Side Cert Rotation
299299
300- **Recommended fix**: Use the `certwatcher` package from `controller-runtime`, which
301- watches certificate files for changes and hot-reloads them via the
302- ` tls.Config.GetCertificate ` callback. This is the same pattern already used in the
303- janitor for webhook and metrics TLS :
300+ The janitor-provider should use the `certwatcher` package from `controller-runtime`
301+ to watch certificate files and hot-reload them via the `tls.Config.GetCertificate`
302+ callback. This is the same pattern the janitor already uses for webhook and metrics
303+ TLS :
304304
305305` ` ` go
306- // Existing pattern in janitor/main.go for webhook certs:
307306watcher, err := certwatcher.New(certPath, keyPath)
308- mgr.Add(watcher ) // starts filesystem watch
307+ go watcher.Start(ctx ) // starts filesystem watch
309308
310309tlsConfig := &tls.Config{
311310 GetCertificate: watcher.GetCertificate,
312311 MinVersion: tls.VersionTLS12,
313312}
314313` ` `
315314
316- The janitor-provider does not use `controller-runtime`'s manager, so it would need
317- to either :
318-
319- 1. Start the `certwatcher` goroutine manually (it implements `Runnable` with a
320- ` Start(ctx)` method)
321- 2. Use a standalone filesystem watcher (e.g., `fsnotify`) to reload the key pair
315+ Since the janitor-provider does not use `controller-runtime`'s manager, the
316+ ` certwatcher` goroutine would be started manually (it implements `Runnable` with a
317+ ` Start(ctx)` method). This reuses a well-tested package already in the dependency
318+ tree.
322319
323- Option 1 is preferred since it reuses the same well-tested package.
320+ An alternative is a standalone `fsnotify` watcher, but `certwatcher` is preferred
321+ for consistency with the janitor's existing pattern.
324322
325323# ### Client-Side CA Bundle Rotation
326324
327- The janitor controller loads the CA bundle once in `NewCSPProviderDialOptions()` and
328- builds a static `x509.CertPool`. If the CA itself rotates (e.g., the self-signed
329- Issuer regenerates its CA key), the client will reject the new server cert because
330- its cert pool is stale.
325+ The client needs to trust the CA that signed the server's certificate. If the CA
326+ itself rotates (e.g., the self-signed Issuer regenerates its CA key), the client
327+ must pick up the new CA bundle or it will reject the new server cert.
331328
332- In practice, CA rotation is infrequent — cert-manager rotates leaf certificates
333- but the CA key typically remains stable. However, for robustness :
329+ In practice, CA rotation is infrequent — cert-manager rotates leaf certificates but
330+ the CA key typically remains stable. Three approaches for handling CA bundle updates :
334331
335332**Option A — Periodic reload**: A background goroutine re-reads the CA bundle file
336333on an interval (e.g., every 5 minutes) and swaps the `RootCAs` cert pool atomically.
@@ -342,8 +339,9 @@ file and reload on change.
342339typically planned, accept that the janitor pod needs a restart when the CA changes.
343340This is the simplest approach and may be sufficient for most deployments.
344341
345- The reference implementation currently follows option C implicitly. Options A or B
346- can be added if CA rotation without downtime becomes a requirement.
342+ This design recommends **Option C** as the initial approach, given that CA rotation
343+ is an infrequent, planned event. Options A or B can be adopted later if zero-downtime
344+ CA rotation becomes a requirement.
347345
348346# ## Test Compatibility
349347
0 commit comments