Skip to content

Commit 412afda

Browse files
committed
hub: optional startupProbe for slow-startup migration windows
Adds an optional hub.startupProbe (disabled by default to preserve existing behaviour). When enabled, kubelet gives Hub the full failureThreshold × periodSeconds window to become Ready before the liveness/readiness probes kick in. Default tuning when enabled: 30 × 5s = 150s. Sized for the heaviest real-world Hub migration observed in the earthly-internal dogfood cluster on 2026-05-26 (lunar v175 → v182, ~52s start time including the sqlapi role creation). Without this, the existing liveness probe (initialDelaySeconds=0, periodSeconds=5, failureThreshold=3 = 15s) killed the pod mid-migration, requiring an out-of-band kubectl patch to recover. Backward compatible: existing installs see no change unless they explicitly set hub.startupProbe.enabled=true.
1 parent 41126bc commit 412afda

3 files changed

Lines changed: 40 additions & 0 deletions

File tree

charts/lunar/CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,19 @@ History starts at 1.0.0 (the snippet→script rename and ghcr.io
99
switchover); earlier 0.x versions had no production users. For 0.x
1010
history see `git log -- charts/lunar/`.
1111

12+
## [Unreleased]
13+
14+
### Added
15+
16+
- **Optional `hub.startupProbe`.** New optional probe (disabled by default
17+
to preserve current behaviour) that gives Hub a longer window to come
18+
Ready before liveness/readiness kick in. Useful when Hub takes longer
19+
than `livenessProbe.failureThreshold × periodSeconds` to start — e.g.
20+
on a cold start with a substantial Postgres schema migration.
21+
Default tuning when enabled: `30 × 5s = 150s` startup window. Set
22+
`hub.startupProbe.enabled: true` and tune `failureThreshold` /
23+
`periodSeconds` to taste.
24+
1225
## [2.0.0] - 2026-05-20
1326

1427
### Breaking

charts/lunar/templates/hub-deployment.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,17 @@ spec:
183183
failureThreshold: {{ .failureThreshold }}
184184
{{- end }}
185185
{{- end }}
186+
{{- if .Values.hub.startupProbe.enabled }}
187+
{{- with .Values.hub.startupProbe }}
188+
startupProbe:
189+
httpGet:
190+
path: /health
191+
port: hub-health
192+
initialDelaySeconds: {{ .initialDelaySeconds }}
193+
periodSeconds: {{ .periodSeconds }}
194+
failureThreshold: {{ .failureThreshold }}
195+
{{- end }}
196+
{{- end }}
186197
ports:
187198
- name: hub-grpc
188199
containerPort: 8000

charts/lunar/values.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,22 @@ hub:
265265
periodSeconds: 5
266266
failureThreshold: 3
267267

268+
# Optional startup probe (disabled by default to preserve existing
269+
# behaviour). Enable when Hub takes longer to become Ready than the
270+
# liveness probe's failureThreshold × periodSeconds tolerates — e.g.
271+
# on a cold start with a substantial Postgres schema migration. While
272+
# the startupProbe is running, kubelet suppresses liveness/readiness
273+
# probe failures, so Hub gets `failureThreshold × periodSeconds`
274+
# seconds total before kubelet starts considering it unhealthy.
275+
#
276+
# Default tuning when enabled is 30 × 5s = 150s. Raise
277+
# `failureThreshold` for larger databases with longer migrations.
278+
startupProbe:
279+
enabled: false
280+
initialDelaySeconds: 0
281+
periodSeconds: 5
282+
failureThreshold: 30
283+
268284
labels: {}
269285
annotations: {}
270286
podAnnotations: {}

0 commit comments

Comments
 (0)