Commit ffcb500
authored
The chart's postgres-auth-preflight init container at .kubernetes/chart/templates/deployment.yaml lines 70-156 installs psql/jq via 'apk add', assuming the base image (mcr.microsoft.com/azure-cli:latest) is Alpine. The image is now Mariner-based, so 'apk' is missing -> Exit 127 -> CrashLoopBackOff -> CRUD pod never starts.
Triggered an outage during PR #1090 (Pattern A Helm takeover): the legacy live pod was using a cached Alpine layer (5h+ uptime), but the freshly pulled image broke when Helm's rolling update created a new ReplicaSet. Recovery required manually stripping initContainers from the live Deployment and suspending the HelmRelease.
Fix: flip preflight.postgresAuth.enabled from true to false in the HelmRelease values for crud-service. Safe because BaseRepository.check_pool_health self-recovers from transient pool init errors per commit 811fdbe (#911 / PR #1087) - the preflight gate is no longer load-bearing.
Follow-up issues to file: (1) chart fix to support multi-distro package install (apk/tdnf/apt-get) or pin an Alpine-tagged image; (2) ADR-017 addendum documenting the prune-vs-Helm-adopt race that hit during this incident.
Verified: cluster currently serving CRUD /health 200 OK and all 26 agents 200 OK after manual recovery. This PR brings GitOps state in sync with live so the HelmRelease can be unsuspended without regression.
1 parent 058fe32 commit ffcb500
1 file changed
Lines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
| |||
0 commit comments