fix(active-checks): use HTTP/1.1 for health check probes with automatic version negotiation#176
fix(active-checks): use HTTP/1.1 for health check probes with automatic version negotiation#176
Conversation
Previously, active health check probes sent HTTP/1.0 requests, which caused upstream servers that only support HTTP/1.1 to respond with 426 Upgrade Required, incorrectly marking healthy targets as unhealthy. Switch the default probe request from HTTP/1.0 to HTTP/1.1 with Connection: close header. HTTP/1.1 is backward-compatible with all servers that accept HTTP/1.0, so this change requires no configuration updates from users. Refactored run_single_check() into focused helpers: - build_http_headers(): builds and caches serialized header string - establish_connection(): TCP connect + optional TLS handshake - probe_http(): sends HTTP/1.1 GET and reports result FTI-7389 Signed-off-by: Walker Zhao <walker.zhao@konghq.com>
|
|
Luacheck Report1 tests 1 ✅ 0s ⏱️ Results for commit d7244d4. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Pull request overview
This PR updates the active HTTP healthcheck probe behavior to default to HTTP/1.1 (with Connection: close) and adds automatic HTTP/1.0/1.1 fallback/upgrade logic, with per-target in-memory caching of the working HTTP version.
Changes:
- Refactors
checker:run_single_check()into helper functions and introduces HTTP version negotiation + caching. - Updates existing request-header tests to expect HTTP/1.1 request format and
Connection: close. - Adds new negotiation test suites (for both
worker-eventsandresty-events) covering 505/426 fallback, caching, and failure cases.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| lib/resty/healthcheck.lua | Implements HTTP/1.1-by-default probing, retry negotiation logic, and caching; refactors probe flow into helpers. |
| t/with_worker-events/18-req-headers.t | Updates expected probe request line/headers to HTTP/1.1 + Connection: close. |
| t/with_resty-events/18-req-headers.t | Same header expectation updates for the resty-events variant. |
| t/with_worker-events/20-http-version-negotiation.t | New test coverage for HTTP version negotiation and caching with worker-events. |
| t/with_resty-events/20-http-version-negotiation.t | New test coverage for HTTP version negotiation and caching with resty-events. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…llback Previously, active health check probes sent HTTP/1.0 requests. Upstream servers that only support HTTP/1.1 would respond with 426 (Upgrade Required), which is not in the default healthy/unhealthy status lists, causing health checks to silently become no-ops for those targets. Switch the default probe request from HTTP/1.0 to HTTP/1.1 with a Connection: close header. Add bidirectional version auto-detection in run_single_check() that automatically negotiates the HTTP version: - On 505 (HTTP Version Not Supported): retry with the other version - On 426 (Upgrade Required) while using HTTP/1.0: retry with HTTP/1.1 - On any non-healthy status with no cached version: retry with the other version to handle non-standard server implementations The working HTTP version is cached per-target in memory to avoid repeated retries. The cache self-heals when servers change their supported HTTP version. Refactored run_single_check() into focused helpers: - build_http_headers(): builds and caches serialized header string - establish_connection(): TCP connect + optional TLS handshake - probe_http(): sends HTTP request and returns status code FTI-7389 Signed-off-by: Walker Zhao <walker.zhao@konghq.com>
50201fa to
e862fab
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a136018 to
a33e104
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -- 3. Any non-healthy + no cache -> try the other (first-probe discovery) | ||
| local should_retry = (status == 505) | ||
| or (status == 426 and http_version == "1.0") | ||
| or (not is_healthy and not has_cached_version) |
There was a problem hiding this comment.
The should_retry condition retries once on any non-healthy status when there is no cached version (or (not is_healthy and not has_cached_version)). This means first probes for genuinely unhealthy targets (e.g. 500/502/503) will still incur an extra connection, which contradicts the PR description/overhead table that claims 0 extra connections for genuinely unhealthy responses. Either update the PR description to match the implemented behavior, or tighten the retry condition (e.g., don’t retry when the status is already in the configured unhealthy.http_statuses).
There was a problem hiding this comment.
Why was the behavior changed to retry on any non-healthy status?
In my opinion we should only retry on 426 or 505 statuses if user didn't configure those statuses as healthy statuses.
| -- 3. Any non-healthy + no cache -> try the other (first-probe discovery) | |
| local should_retry = (status == 505) | |
| or (status == 426 and http_version == "1.0") | |
| or (not is_healthy and not has_cached_version) | |
| -- 3. 426 and 505 are considered unhealthy statuses + no cache -> try the other (first-probe discovery) | |
| local should_retry = (not is_healthy and not has_cached_version) and | |
| ((status == 505) or | |
| (status == 426 and http_version == "1.0")) |
There was a problem hiding this comment.
I removed has_cached_version and refactor the condition here. Now it will only retry when it is unhealthy and status code is 505 or 426, please review this PR again, thanks!
The commit is d7244d4.
…y retry logic Extract the HTTP version negotiation logic from run_single_check() into a dedicated negotiate_http_version() local function, reducing run_single_check() from ~85 lines to ~20 lines with flat control flow. Remove the unknown-status retry trigger and http_version_locked flag, keeping only 3 clear retry triggers: 505, 426-on-1.0, and non-healthy-with-no-cache. Unknown status codes are silently ignored by report_http_status(), matching the original behavior. Fix bad status line handling: probe_http() now returns 0 instead of nil for malformed HTTP responses, preserving the original behavior where report_http_status() treats 0 as unhealthy. Previously, the refactor caused these to be silently ignored. FTI-7389 Signed-off-by: Walker Zhao <walker.zhao@konghq.com>
a33e104 to
e71b857
Compare
…426 status codes Remove overly broad trigger that retried on any non-healthy status for uncached targets, which caused unnecessary retry connections for genuinely unhealthy servers (e.g., 500). Now only standard HTTP version negotiation codes (505 HTTP Version Not Supported, 426 Upgrade Required) trigger a retry, gated on `not is_healthy` to respect user-configured healthy status lists while preserving self-healing for cached targets. Signed-off-by: Walker Zhao <walker.zhao@konghq.com>
Summary
Upgrade active health check probes from HTTP/1.0 to HTTP/1.1 with automatic version negotiation for backward compatibility.
Problem
Active health checks previously sent HTTP/1.0 requests. Modern upstream servers that only support HTTP/1.1+ respond with
426 Upgrade Required, which is not in the default healthy or unhealthy status lists — causing health checks to silently become no-ops. Users had to work around this by adding 426 tochecks.active.healthy.http_statuses.Solution
Connection: close(probes are one-shot)Behavior by server type
Refactoring
run_single_checkwas decomposed into focused helpers for maintainability:build_http_headers()— caches serialized user-configured headersestablish_connection()— TCP connect + optional TLS handshakeprobe_http()— sends request, returns status codenegotiate_http_version()— retry logic + version cachingTest plan
prove -I. -r t/with_resty-events/20-http-version-negotiation.t— 9 tests covering: 505 auto-fallback, 426 auto-upgrade, version caching, permanent failure, normal HTTP/1.1, 426 on HTTP/1.1 (no retry), non-standard server (no retry), genuinely unhealthy (no retry), failed retry reports original statusprove -I. -r t/with_worker-events/20-http-version-negotiation.t— mirror tests for worker-events backendprove -I. -r t/with_resty-events/18-req-headers.t— updated to expect HTTP/1.1 formatprove -I. -r t/with_worker-events/18-req-headers.t— mirrorFix FTI-7389