Commit 9df66e8
authored
fix(backup): relax MinIO liveness probe to stop daily SIGKILL restarts (#133)
The rig-prd-backup MinIO sets no timeoutSeconds/failureThreshold, so Kubernetes
applies the defaults (1s / 3). MinIO can't reliably answer /minio/health/live
within 1s during Ceph-RBD I/O spikes, so the kubelet SIGKILLs it (exit 137,
reason Error) ~daily -- 60 restarts in 31 days. A backup target killed mid-write
risks failing a nightly backup run.
Set timeoutSeconds: 5, failureThreshold: 5 (period stays 30s -> ~150s of
sustained unresponsiveness before a kill). Memory left at 256Mi/512Mi: the pod
is never OOMKilled and sits at ~371Mi, so a limit bump isn't justified.1 parent aa63c5a commit 9df66e8
1 file changed
Lines changed: 7 additions & 0 deletions
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
55 | 62 | | |
56 | 63 | | |
57 | 64 | | |
| |||
0 commit comments