|
| 1 | +--- |
| 2 | +aliases: |
| 3 | + - /en/docs3-v2/golang-sdk/tutorial/governance/monitor/probe/ |
| 4 | + - /en/docs3-v3/golang-sdk/tutorial/governance/monitor/probe/ |
| 5 | +description: "Dubbo-Go Kubernetes Probe (liveness / readiness / startup) user manual" |
| 6 | +title: Kubernetes Lifecycle Probe |
| 7 | +type: docs |
| 8 | +weight: 3 |
| 9 | +--- |
| 10 | + |
| 11 | +# Dubbo-Go Kubernetes Lifecycle Probe |
| 12 | + |
| 13 | +Dubbo-Go provides a built-in **Kubernetes HTTP Probe service** that supports: |
| 14 | + |
| 15 | +* ✅ `liveness` |
| 16 | +* ✅ `readiness` |
| 17 | +* ✅ `startup` |
| 18 | + |
| 19 | +The probe service runs on an independent HTTP port and supports: |
| 20 | + |
| 21 | +* Custom health check logic |
| 22 | +* Optional alignment with Dubbo internal lifecycle state |
| 23 | +* Controlled restart risk management |
| 24 | + |
| 25 | +For a complete runnable example, see: |
| 26 | + |
| 27 | +> [https://github.com/apache/dubbo-go-samples/tree/main/metrics](https://github.com/apache/dubbo-go-samples/tree/main/metrics) |
| 28 | +
|
| 29 | +--- |
| 30 | + |
| 31 | +# 1. Design Goals |
| 32 | + |
| 33 | +| Goal | Description | |
| 34 | +| ------------------- | -------------------------------------------------------- | |
| 35 | +| Extensibility | Supports custom health check callbacks | |
| 36 | +| Risk Control | Liveness does not bind complex internal logic by default | |
| 37 | +| Lifecycle Alignment | Readiness and startup can align with Dubbo lifecycle | |
| 38 | +| Independent Port | Isolated from business service port | |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +# 2. Default Behavior |
| 43 | + |
| 44 | +When Probe is enabled, it exposes endpoints on: |
| 45 | + |
| 46 | +``` |
| 47 | +Port: 22222 |
| 48 | +``` |
| 49 | + |
| 50 | +The following paths are available: |
| 51 | + |
| 52 | +| Endpoint | Description | |
| 53 | +| ------------ | ------------------------- | |
| 54 | +| GET /live | Process liveness check | |
| 55 | +| GET /ready | Service readiness check | |
| 56 | +| GET /startup | Application startup check | |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Response Rules |
| 61 | + |
| 62 | +| Condition | HTTP Status Code | |
| 63 | +| --------------- | ---------------- | |
| 64 | +| All checks pass | 200 | |
| 65 | +| Any check fails | 503 | |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +# 3. Configuration |
| 70 | + |
| 71 | +Dubbo-Go supports both **New API (recommended)** and **Old API (YAML)** configuration styles. |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## 3.1 New API Configuration (Recommended) |
| 76 | + |
| 77 | +```go |
| 78 | +ins, err := dubbo.NewInstance( |
| 79 | + dubbo.WithMetrics( |
| 80 | + metrics.WithProbeEnabled(), |
| 81 | + metrics.WithProbePort(22222), |
| 82 | + metrics.WithProbeLivenessPath("/live"), |
| 83 | + metrics.WithProbeReadinessPath("/ready"), |
| 84 | + metrics.WithProbeStartupPath("/startup"), |
| 85 | + metrics.WithProbeUseInternalState(true), |
| 86 | + ), |
| 87 | +) |
| 88 | +``` |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## Available Options |
| 93 | + |
| 94 | +| Option | Description | |
| 95 | +| ------------------------------- | ------------------------------------- | |
| 96 | +| WithProbeEnabled() | Enable Probe | |
| 97 | +| WithProbePort(int) | Set Probe HTTP port | |
| 98 | +| WithProbeLivenessPath(string) | Set liveness path | |
| 99 | +| WithProbeReadinessPath(string) | Set readiness path | |
| 100 | +| WithProbeStartupPath(string) | Set startup path | |
| 101 | +| WithProbeUseInternalState(bool) | Enable internal lifecycle state check | |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## 3.2 Old API YAML Configuration |
| 106 | + |
| 107 | +```yaml |
| 108 | +metrics: |
| 109 | + probe: |
| 110 | + enabled: true |
| 111 | + port: 22222 |
| 112 | + liveness-path: "/live" |
| 113 | + readiness-path: "/ready" |
| 114 | + startup-path: "/startup" |
| 115 | + use-internal-state: true |
| 116 | +``` |
| 117 | +
|
| 118 | +--- |
| 119 | +
|
| 120 | +## Configuration Fields |
| 121 | +
|
| 122 | +| Field | Description | |
| 123 | +| ------------------ | ------------------------------------------ | |
| 124 | +| enabled | Enable probe service | |
| 125 | +| port | HTTP port | |
| 126 | +| liveness-path | Liveness endpoint path | |
| 127 | +| readiness-path | Readiness endpoint path | |
| 128 | +| startup-path | Startup endpoint path | |
| 129 | +| use-internal-state | Whether to enable internal lifecycle state | |
| 130 | +
|
| 131 | +--- |
| 132 | +
|
| 133 | +# 4. Internal Lifecycle State (UseInternalState) |
| 134 | +
|
| 135 | +When: |
| 136 | +
|
| 137 | +```yaml |
| 138 | +use-internal-state: true |
| 139 | +``` |
| 140 | +
|
| 141 | +Probe attaches Dubbo internal lifecycle checks. |
| 142 | +
|
| 143 | +--- |
| 144 | +
|
| 145 | +## Internal State Mechanism |
| 146 | +
|
| 147 | +| Probe Type | Depends On | |
| 148 | +| ---------- | -------------------------------------- | |
| 149 | +| readiness | `probe.SetReady(true/false)` | |
| 150 | +| startup | `probe.SetStartupComplete(true/false)` | |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +## Default Behavior |
| 155 | + |
| 156 | +* When `Server.Serve()` executes successfully: |
| 157 | + |
| 158 | + * ready = true |
| 159 | + * startup = true |
| 160 | + |
| 161 | +* During graceful shutdown: |
| 162 | + |
| 163 | + * ready = false |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## When Set to false |
| 168 | + |
| 169 | +If: |
| 170 | + |
| 171 | +```yaml |
| 172 | +use-internal-state: false |
| 173 | +``` |
| 174 | + |
| 175 | +The probe result is **fully determined by user-registered callbacks**. |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +# 5. Custom Health Checks (Recommended) |
| 180 | + |
| 181 | +You can extend probe logic by registering callbacks. |
| 182 | + |
| 183 | +```go |
| 184 | +import "dubbo.apache.org/dubbo-go/v3/metrics/probe" |
| 185 | +
|
| 186 | +// Liveness example |
| 187 | +probe.RegisterLiveness("db", func(ctx context.Context) error { |
| 188 | + // check database connectivity |
| 189 | + return nil |
| 190 | +}) |
| 191 | +
|
| 192 | +// Readiness example |
| 193 | +probe.RegisterReadiness("cache", func(ctx context.Context) error { |
| 194 | + // check downstream dependency |
| 195 | + return nil |
| 196 | +}) |
| 197 | +
|
| 198 | +// Startup example |
| 199 | +probe.RegisterStartup("warmup", func(ctx context.Context) error { |
| 200 | + // check warmup completion |
| 201 | + return nil |
| 202 | +}) |
| 203 | +``` |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## Execution Logic |
| 208 | + |
| 209 | +* All registered checks will be executed. |
| 210 | +* If any check returns an error, |
| 211 | +* The probe returns HTTP 503. |
| 212 | + |
| 213 | +--- |
| 214 | + |
| 215 | +# 6. Semantic Recommendations |
| 216 | + |
| 217 | +## Liveness |
| 218 | + |
| 219 | +Recommended usage: |
| 220 | + |
| 221 | +* Detect process crashes |
| 222 | +* Detect fatal core dependency failure |
| 223 | + |
| 224 | +⚠️ Failure will trigger Pod restart. |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## Readiness |
| 229 | + |
| 230 | +May bind to: |
| 231 | + |
| 232 | +* Service registry state |
| 233 | +* Database |
| 234 | +* Redis |
| 235 | +* Downstream RPC |
| 236 | +* Local cache |
| 237 | + |
| 238 | +Controls whether traffic is routed to the Pod. |
| 239 | + |
| 240 | +--- |
| 241 | + |
| 242 | +## Startup |
| 243 | + |
| 244 | +Suitable for: |
| 245 | + |
| 246 | +* Cold start handling |
| 247 | +* Warm-up logic |
| 248 | +* Data loading |
| 249 | +* Model initialization |
| 250 | + |
| 251 | +Prevents premature restart during slow initialization. |
| 252 | + |
| 253 | +--- |
| 254 | + |
| 255 | +# 7. Kubernetes Configuration Example |
| 256 | + |
| 257 | +```yaml |
| 258 | +livenessProbe: |
| 259 | + httpGet: |
| 260 | + path: /live |
| 261 | + port: 22222 |
| 262 | + initialDelaySeconds: 15 |
| 263 | + periodSeconds: 10 |
| 264 | + timeoutSeconds: 2 |
| 265 | + failureThreshold: 3 |
| 266 | +
|
| 267 | +readinessProbe: |
| 268 | + httpGet: |
| 269 | + path: /ready |
| 270 | + port: 22222 |
| 271 | + initialDelaySeconds: 5 |
| 272 | + periodSeconds: 5 |
| 273 | + timeoutSeconds: 2 |
| 274 | + failureThreshold: 2 |
| 275 | +
|
| 276 | +startupProbe: |
| 277 | + httpGet: |
| 278 | + path: /startup |
| 279 | + port: 22222 |
| 280 | + periodSeconds: 5 |
| 281 | + timeoutSeconds: 2 |
| 282 | + failureThreshold: 25 # 120s startup budget => ceil(120 / 5) + 1 |
| 283 | +``` |
| 284 | + |
| 285 | +--- |
| 286 | + |
| 287 | +# 8. Example Usage |
| 288 | + |
| 289 | +Example path: |
| 290 | + |
| 291 | +``` |
| 292 | +metrics/probe/ |
| 293 | +``` |
| 294 | + |
| 295 | +--- |
| 296 | + |
| 297 | +## Run Locally |
| 298 | + |
| 299 | +```bash |
| 300 | +go run ./metrics/probe/go-server/cmd/main.go |
| 301 | +``` |
| 302 | + |
| 303 | +--- |
| 304 | + |
| 305 | +## Monitor Probe Status in Real Time |
| 306 | + |
| 307 | +```bash |
| 308 | +watch -n 1 ' |
| 309 | +for p in live ready startup; do |
| 310 | + url="http://127.0.0.1:22222/$p" |
| 311 | +
|
| 312 | + body=$(curl -sS --max-time 2 "$url" 2>&1) |
| 313 | + code=$(curl -s -o /dev/null --max-time 2 -w "%{http_code}" "$url" 2>/dev/null) |
| 314 | +
|
| 315 | + printf "%-8s [%s] %s\n" "$p" "$code" "$body" |
| 316 | +done |
| 317 | +' |
| 318 | +``` |
| 319 | + |
| 320 | +--- |
| 321 | + |
| 322 | +## Expected Behavior |
| 323 | + |
| 324 | +| Phase | /live | /ready | /startup | |
| 325 | +| ---------------- | ----- | ------ | -------- | |
| 326 | +| Just started | 200 | 503 | 503 | |
| 327 | +| Warm-up phase | 200 | 503 | 503 | |
| 328 | +| Warm-up complete | 200 | 200 | 200 | |
| 329 | + |
| 330 | +--- |
| 331 | + |
| 332 | +# 9. Production Best Practices |
| 333 | + |
| 334 | +## Recommended Starting Values |
| 335 | + |
| 336 | +| Probe Type | Recommended Values | Notes | |
| 337 | +| ---------- | -------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | |
| 338 | +| liveness | `initialDelaySeconds: 10-30`, `periodSeconds: 10`, `timeoutSeconds: 1-3`, `failureThreshold: 3` | Use only for process survival and unrecoverable failures, not for databases, registries, or Redis | |
| 339 | +| readiness | `initialDelaySeconds: 2-5`, `periodSeconds: 5`, `timeoutSeconds: 1-3`, `failureThreshold: 2-3` | Remove traffic quickly when dependencies fail, and recover quickly after they return | |
| 340 | +| startup | `periodSeconds: 5-10`, `timeoutSeconds: 1-3`, `failureThreshold = ceil(maxStartupSeconds / periodSeconds) + 1` | Budget for the longest cold-start, warm-up, and config-loading path | |
| 341 | + |
| 342 | +For example, if the application may need up to `120s` to start and `periodSeconds: 5` is used: |
| 343 | + |
| 344 | +```text |
| 345 | +failureThreshold = ceil(120 / 5) + 1 = 25 |
| 346 | +``` |
| 347 | + |
| 348 | +## Operational Guidance |
| 349 | + |
| 350 | +* Keep `liveness` simple and reserve it for failures that require a restart |
| 351 | +* Put service registry, database, Redis, and downstream RPC checks in `readiness` |
| 352 | +* Let `startup` absorb slow initialization instead of inflating `liveness.initialDelaySeconds` |
| 353 | +* In microservice clusters, enable `use-internal-state: true` and combine it with `probe.SetReady(...)` for proactive traffic draining |
0 commit comments