Skip to content

Prepare GKE node pool for bootstrap certificate rotation (expires 2026-12-05) #2827

@curtismchale

Description

@curtismchale

Background

Google notified us that the GKE node pool in proudcity-1184 (cluster proudcity, zone us-central1-a) has a node bootstrap leaf certificate expiring 2026-12-05 17:05:08 UTC.

If the cert expires:

  • New nodes can no longer register to the cluster.
  • Existing kubelet client/server certs (rotated every 10 days using the bootstrap cert) will eventually expire — within 1 year for shielded nodes, 5 for non-shielded.

GKE will automatically recreate the node pool ~30 days before expiry (~2026-11-05) if we do nothing — but at their timing, not ours.

Plan

Schedule a manual node pool upgrade (same version → same version is enough to regenerate the leaf cert) during a low-traffic maintenance window before 2026-11-05.

Before the upgrade, add disruption protection so the rolling node drain doesn't take services offline:

  1. Apply per-tenant PodDisruptionBudgets for prod WP tenantsconfig/prod-pdbs.yml (already drafted; 103 PDBs, maxUnavailable: 1, mirrors elasticsearch-pdbs.yml).
  2. Scale api/proudcitycityapi from 1 → 2 replicas with a PDB for the upgrade window — config/api-proudcitycityapi-scale-patch.yml (already drafted). The public city API is currently a single replica.
  3. Confirm api/proudcityfeeds (replicas=0) is intentionally off before the upgrade — easy time to notice if it shouldn't be.
  4. Leave alone: the 22 single-replica *redis pods in prod (WP falls back to DB on a brief cache cold start), kube-system/* (Google manages), cert-manager/*, jenkins.

Diagnostic + upgrade commands

# Identify affected node pools
gcloud container clusters describe proudcity \
  --zone us-central1-a --project proudcity-1184

# Apply PDBs + temporary scale-up
kubectl apply -f config/prod-pdbs.yml
kubectl apply -f config/api-proudcitycityapi-scale-patch.yml

# Verify PDBs in place
kubectl get pdb -A

# Trigger the node pool upgrade (use current cluster version)
gcloud container node-pools upgrade <pool-name> \
  --cluster=proudcity --zone=us-central1-a \
  --cluster-version=<current-version>

# After upgrade completes, revert proudcitycityapi if desired
kubectl scale deployment proudcitycityapi -n api --replicas=1

Acceptance

  • Confirm proudcityfeeds replicas=0 is intentional.
  • Apply config/prod-pdbs.yml and verify with kubectl get pdb -n prod.
  • Apply config/api-proudcitycityapi-scale-patch.yml.
  • Run node pool upgrade in maintenance window before 2026-11-05.
  • Confirm new cert expiry pushed forward (check MIG instance template).
  • Decide whether to leave proudcitycityapi at 2 replicas permanently or revert to 1.

References

  • Google email (2026-05-27) — bootstrap leaf cert expiry notice.
  • GKE node leaf cert rotation docs
  • Existing PDB pattern: proudcity-kubernetes/config/elasticsearch-pdbs.yml (added in PCD265).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions