Background
Google notified us that the GKE node pool in proudcity-1184 (cluster proudcity, zone us-central1-a) has a node bootstrap leaf certificate expiring 2026-12-05 17:05:08 UTC.
If the cert expires:
- New nodes can no longer register to the cluster.
- Existing kubelet client/server certs (rotated every 10 days using the bootstrap cert) will eventually expire — within 1 year for shielded nodes, 5 for non-shielded.
GKE will automatically recreate the node pool ~30 days before expiry (~2026-11-05) if we do nothing — but at their timing, not ours.
Plan
Schedule a manual node pool upgrade (same version → same version is enough to regenerate the leaf cert) during a low-traffic maintenance window before 2026-11-05.
Before the upgrade, add disruption protection so the rolling node drain doesn't take services offline:
- Apply per-tenant PodDisruptionBudgets for prod WP tenants —
config/prod-pdbs.yml (already drafted; 103 PDBs, maxUnavailable: 1, mirrors elasticsearch-pdbs.yml).
- Scale
api/proudcitycityapi from 1 → 2 replicas with a PDB for the upgrade window — config/api-proudcitycityapi-scale-patch.yml (already drafted). The public city API is currently a single replica.
- Confirm
api/proudcityfeeds (replicas=0) is intentionally off before the upgrade — easy time to notice if it shouldn't be.
- Leave alone: the 22 single-replica
*redis pods in prod (WP falls back to DB on a brief cache cold start), kube-system/* (Google manages), cert-manager/*, jenkins.
Diagnostic + upgrade commands
# Identify affected node pools
gcloud container clusters describe proudcity \
--zone us-central1-a --project proudcity-1184
# Apply PDBs + temporary scale-up
kubectl apply -f config/prod-pdbs.yml
kubectl apply -f config/api-proudcitycityapi-scale-patch.yml
# Verify PDBs in place
kubectl get pdb -A
# Trigger the node pool upgrade (use current cluster version)
gcloud container node-pools upgrade <pool-name> \
--cluster=proudcity --zone=us-central1-a \
--cluster-version=<current-version>
# After upgrade completes, revert proudcitycityapi if desired
kubectl scale deployment proudcitycityapi -n api --replicas=1
Acceptance
References
- Google email (2026-05-27) — bootstrap leaf cert expiry notice.
- GKE node leaf cert rotation docs
- Existing PDB pattern:
proudcity-kubernetes/config/elasticsearch-pdbs.yml (added in PCD265).
Background
Google notified us that the GKE node pool in
proudcity-1184(clusterproudcity, zoneus-central1-a) has a node bootstrap leaf certificate expiring 2026-12-05 17:05:08 UTC.If the cert expires:
GKE will automatically recreate the node pool ~30 days before expiry (~2026-11-05) if we do nothing — but at their timing, not ours.
Plan
Schedule a manual node pool upgrade (same version → same version is enough to regenerate the leaf cert) during a low-traffic maintenance window before 2026-11-05.
Before the upgrade, add disruption protection so the rolling node drain doesn't take services offline:
config/prod-pdbs.yml(already drafted; 103 PDBs,maxUnavailable: 1, mirrorselasticsearch-pdbs.yml).api/proudcitycityapifrom 1 → 2 replicas with a PDB for the upgrade window —config/api-proudcitycityapi-scale-patch.yml(already drafted). The public city API is currently a single replica.api/proudcityfeeds(replicas=0) is intentionally off before the upgrade — easy time to notice if it shouldn't be.*redispods in prod (WP falls back to DB on a brief cache cold start),kube-system/*(Google manages),cert-manager/*, jenkins.Diagnostic + upgrade commands
Acceptance
proudcityfeedsreplicas=0 is intentional.config/prod-pdbs.ymland verify withkubectl get pdb -n prod.config/api-proudcitycityapi-scale-patch.yml.proudcitycityapiat 2 replicas permanently or revert to 1.References
proudcity-kubernetes/config/elasticsearch-pdbs.yml(added in PCD265).