| Status | Active |
|---|---|
| Owner | HyperFleet Platform Team |
| Last Updated | 2026-03-20 |
This document defines the standard contract for health and readiness endpoints across all HyperFleet components (API, Sentinel, Adapters).
- Consistency: All components expose the same endpoints on the same ports
- Kubernetes Integration: Proper liveness and readiness probe configuration
- Observability: Metrics endpoint for Prometheus scraping
All HyperFleet components MUST use the following configuration:
| Port | Endpoint | Purpose | Probe Type |
|---|---|---|---|
8080 |
/healthz |
Liveness - is the process alive? | livenessProbe |
8080 |
/readyz |
Readiness - can it receive traffic? Can a rolling update proceed? | readinessProbe |
9090 |
/metrics |
Prometheus metrics | ServiceMonitor |
Purpose: Indicates whether the application is running.
| Status | Meaning | Kubernetes Action |
|---|---|---|
200 OK |
Application is alive | None |
503 Service Unavailable |
Application is unhealthy | Restart pod |
Response Body:
{ "status": "ok" }Or on failure:
{ "status": "error", "message": "out of memory" }What to Check:
- Application can respond to HTTP requests (implicitly verified by the probe itself)
- No fatal internal state (e.g., unrecoverable panic, deadlock)
What NOT to Check:
- External dependencies (database, API, broker)
- Downstream service availability
Rationale: Liveness probes should only verify the process itself is healthy. Checking external dependencies can cause cascading restarts during infrastructure issues. If a dependency is down, the pod should remain running but marked as not ready (via /readyz).
Purpose:
- Indicates whether the application is ready to receive traffic.
- Controls replacement and concurrency during rollout
| Status | Meaning | Kubernetes Action |
|---|---|---|
200 OK |
Ready to receive traffic, perform rolling update | Add to service endpoints |
503 Service Unavailable |
Not ready | Remove from service endpoints |
For services that do not serve traffic, the readyz probe does not affect the pods serving behind a service but have effect during a rolling update
-
A Pod that is NotReady:
- counts as Unavailable
- blocks Kubernetes from terminating old Pods (depending on maxUnavailable)
-
A Pod that becomes Ready:
- is considered a valid replacement
- allows Kubernetes to scale down old consumers
➡️ Readiness controls when Kubernetes is allowed to reduce old consumers.
Response Body:
{
"status": "ok",
"checks": {
"config": "ok",
"broker": "ok",
"api": "ok"
}
}Or on failure:
{
"status": "error",
"checks": {
"config": "ok",
"broker": "error",
"api": "ok"
},
"message": "broker connection failed"
}What to Check:
- Configuration loaded successfully
- Required connections established (broker, API client)
- Startup initialization complete
Component-Specific Checks:
| Component | Readiness Checks |
|---|---|
| API | Database connection, configuration loaded |
| Sentinel | HyperFleet API reachable, broker connected, configuration loaded |
| Adapters | Broker subscription active, HyperFleet API reachable, configuration loaded |
Purpose: Expose application metrics for Prometheus scraping.
Response: Prometheus text format (OpenMetrics compatible)
Required Metrics: See component-specific documentation:
- Sentinel Operator Guide - Sentinel metrics
- Adapter Metrics - Adapter metrics
| Probe | initialDelaySeconds | periodSeconds | timeoutSeconds | failureThreshold |
|---|---|---|---|---|
| Liveness | 15 | 20 | 5 | 3 |
| Readiness | 5 | 10 | 3 | 3 |
observability:
healthPort: 8080
metricsPort: 9090
probes:
liveness:
path: /healthz
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
readiness:
path: /readyz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3terminationGracePeriodSeconds: 30
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3- Start HTTP servers for health and metrics endpoints immediately
/healthzreturns200 OKas soon as the process starts/readyzreturns503 Service Unavailableuntil initialization completes- Once all readiness checks pass,
/readyzreturns200 OK
- On
SIGTERM, set/readyzto return503 Service Unavailable - Kubernetes removes pod from Service endpoints
- Graceful shutdown completes in-flight work
- Exit cleanly
For complete shutdown specifications, timeout configuration, and code examples, see Graceful Shutdown Standard.