Skip to content

Commit 792aba2

Browse files
authored
fix(active-checks): use HTTP/1.1 for health check probes with automatic version negotiation (#176)
* fix(active-checks): use HTTP/1.1 for health check probes Previously, active health check probes sent HTTP/1.0 requests, which caused upstream servers that only support HTTP/1.1 to respond with 426 Upgrade Required, incorrectly marking healthy targets as unhealthy. Switch the default probe request from HTTP/1.0 to HTTP/1.1 with Connection: close header. HTTP/1.1 is backward-compatible with all servers that accept HTTP/1.0, so this change requires no configuration updates from users. Refactored run_single_check() into focused helpers: - build_http_headers(): builds and caches serialized header string - establish_connection(): TCP connect + optional TLS handshake - probe_http(): sends HTTP/1.1 GET and reports result FTI-7389 Signed-off-by: Walker Zhao <walker.zhao@konghq.com> * fix(active-checks): use HTTP/1.1 for health check probes with auto-fallback Previously, active health check probes sent HTTP/1.0 requests. Upstream servers that only support HTTP/1.1 would respond with 426 (Upgrade Required), which is not in the default healthy/unhealthy status lists, causing health checks to silently become no-ops for those targets. Switch the default probe request from HTTP/1.0 to HTTP/1.1 with a Connection: close header. Add bidirectional version auto-detection in run_single_check() that automatically negotiates the HTTP version: - On 505 (HTTP Version Not Supported): retry with the other version - On 426 (Upgrade Required) while using HTTP/1.0: retry with HTTP/1.1 - On any non-healthy status with no cached version: retry with the other version to handle non-standard server implementations The working HTTP version is cached per-target in memory to avoid repeated retries. The cache self-heals when servers change their supported HTTP version. Refactored run_single_check() into focused helpers: - build_http_headers(): builds and caches serialized header string - establish_connection(): TCP connect + optional TLS handshake - probe_http(): sends HTTP request and returns status code FTI-7389 Signed-off-by: Walker Zhao <walker.zhao@konghq.com> * refactor(active-checks): extract negotiate_http_version() and simplify retry logic Extract the HTTP version negotiation logic from run_single_check() into a dedicated negotiate_http_version() local function, reducing run_single_check() from ~85 lines to ~20 lines with flat control flow. Remove the unknown-status retry trigger and http_version_locked flag, keeping only 3 clear retry triggers: 505, 426-on-1.0, and non-healthy-with-no-cache. Unknown status codes are silently ignored by report_http_status(), matching the original behavior. Fix bad status line handling: probe_http() now returns 0 instead of nil for malformed HTTP responses, preserving the original behavior where report_http_status() treats 0 as unhealthy. Previously, the refactor caused these to be silently ignored. FTI-7389 Signed-off-by: Walker Zhao <walker.zhao@konghq.com> * fix(active-checks): narrow version negotiation retry to only 505 and 426 status codes Remove overly broad trigger that retried on any non-healthy status for uncached targets, which caused unnecessary retry connections for genuinely unhealthy servers (e.g., 500). Now only standard HTTP version negotiation codes (505 HTTP Version Not Supported, 426 Upgrade Required) trigger a retry, gated on `not is_healthy` to respect user-configured healthy status lists while preserving self-healing for cached targets. Signed-off-by: Walker Zhao <walker.zhao@konghq.com> --------- Signed-off-by: Walker Zhao <walker.zhao@konghq.com>
1 parent 1566027 commit 792aba2

File tree

5 files changed

+1330
-79
lines changed

5 files changed

+1330
-79
lines changed

lib/resty/healthcheck.lua

Lines changed: 177 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -985,60 +985,18 @@ end
985985
--============================================================================
986986

987987

988-
-- Runs a single healthcheck probe
989-
function checker:run_single_check(ip, port, hostname, hostheader)
990-
991-
local sock, err = ngx.socket.tcp()
992-
if not sock then
993-
self:log(ERR, "failed to create stream socket: ", err)
994-
return
995-
end
996-
997-
sock:settimeout(self.checks.active.timeout * 1000)
998-
999-
local ok
1000-
ok, err = sock:connect(ip, port)
1001-
if not ok then
1002-
if err == "timeout" then
1003-
sock:close() -- timeout errors do not close the socket.
1004-
return self:report_timeout(ip, port, hostname, "active")
1005-
end
1006-
return self:report_tcp_failure(ip, port, hostname, "connect", "active")
1007-
end
1008-
1009-
if self.checks.active.type == "tcp" then
1010-
sock:close()
1011-
return self:report_success(ip, port, hostname, "active")
1012-
end
1013-
1014-
if self.checks.active.type == "https" then
1015-
local https_sni, session, err
1016-
https_sni = self.checks.active.https_sni or hostheader or hostname
1017-
if self.ssl_cert and self.ssl_key then
1018-
ok, err = sock:setclientcert(self.ssl_cert, self.ssl_key)
1019-
1020-
if not ok then
1021-
self:log(ERR, "failed to set client certificate: ", err)
1022-
end
1023-
end
1024-
1025-
session, err = sock:sslhandshake(nil, https_sni,
1026-
self.checks.active.https_verify_certificate)
1027-
1028-
if not session then
1029-
sock:close()
1030-
self:log(ERR, "failed SSL handshake with '", hostname or "", " (", ip, ":", port, ")', using server name (sni) '", https_sni, "': ", err)
1031-
return self:report_tcp_failure(ip, port, hostname, "connect", "active")
1032-
end
1033-
988+
-- Builds and caches the serialized user-configured headers string for HTTP/1.x probes.
989+
-- Uses ~= nil so that a cached empty string ("") is also a cache hit.
990+
local function build_http_headers(self)
991+
if self.checks.active._headers_str ~= nil then
992+
return self.checks.active._headers_str
1034993
end
1035994

1036995
local req_headers = self.checks.active.headers
1037996
local headers
1038-
if self.checks.active._headers_str then
1039-
headers = self.checks.active._headers_str
1040-
elseif req_headers == nil then
1041-
headers = ""
997+
998+
if req_headers == nil then
999+
headers = ""
10421000
else
10431001
local headers_length = nkeys(req_headers)
10441002
if headers_length > 0 then
@@ -1065,22 +1023,91 @@ function checker:run_single_check(ip, port, hostname, hostheader)
10651023
headers = headers .. "\r\n"
10661024
end
10671025
end
1068-
self.checks.active._headers_str = headers or ""
10691026
end
10701027

1028+
self.checks.active._headers_str = headers or ""
1029+
return self.checks.active._headers_str
1030+
end
1031+
1032+
1033+
-- Establishes a TCP connection and optionally performs a TLS handshake for
1034+
-- https type. Returns the connected socket, or nil when a failure has
1035+
-- already been reported via report_timeout / report_tcp_failure.
1036+
local function establish_connection(self, ip, port, hostname, hostheader, typ)
1037+
local sock, err = ngx.socket.tcp()
1038+
if not sock then
1039+
self:log(ERR, "failed to create stream socket: ", err)
1040+
return nil
1041+
end
1042+
1043+
sock:settimeout(self.checks.active.timeout * 1000)
1044+
1045+
local ok
1046+
ok, err = sock:connect(ip, port)
1047+
if not ok then
1048+
if err == "timeout" then
1049+
sock:close() -- timeout errors do not close the socket.
1050+
self:report_timeout(ip, port, hostname, "active")
1051+
else
1052+
self:report_tcp_failure(ip, port, hostname, "connect", "active")
1053+
end
1054+
return nil
1055+
end
1056+
1057+
if typ == "https" then
1058+
local https_sni = self.checks.active.https_sni or hostheader or hostname
1059+
if self.ssl_cert and self.ssl_key then
1060+
ok, err = sock:setclientcert(self.ssl_cert, self.ssl_key)
1061+
if not ok then
1062+
self:log(ERR, "failed to set client certificate: ", err)
1063+
end
1064+
end
1065+
1066+
local session
1067+
session, err = sock:sslhandshake(nil, https_sni,
1068+
self.checks.active.https_verify_certificate)
1069+
if not session then
1070+
sock:close()
1071+
self:log(ERR, "failed SSL handshake with '", hostname or "", " (", ip, ":", port, ")', using server name (sni) '", https_sni, "': ", err)
1072+
self:report_tcp_failure(ip, port, hostname, "connect", "active")
1073+
return nil
1074+
end
1075+
end
1076+
1077+
return sock
1078+
end
1079+
1080+
1081+
-- Sends an HTTP GET request over an already-connected socket.
1082+
-- Returns the parsed HTTP status code (number), or nil if a transport-level
1083+
-- error occurred (timeout / TCP failure are reported internally).
1084+
-- @param http_version "1.0" or "1.1" (default "1.1"). For "1.1",
1085+
-- Connection: close is injected so the server closes the connection after
1086+
-- responding (health probes are one-shot).
1087+
local function probe_http(self, sock, ip, port, hostname, hostheader, http_version)
1088+
local headers = build_http_headers(self)
10711089
local path = self.checks.active.http_path
1072-
local request = ("GET %s HTTP/1.0\r\n%sHost: %s\r\n\r\n"):format(path, headers, hostheader or hostname or ip)
1090+
local host = hostheader or hostname or ip
1091+
1092+
local request
1093+
if http_version == "1.0" then
1094+
request = ("GET %s HTTP/1.0\r\n%sHost: %s\r\n\r\n"):format(path, headers, host)
1095+
else
1096+
request = ("GET %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\n%s\r\n"):format(
1097+
path, host, headers)
1098+
end
10731099
self:log(DEBUG, "request head: ", request)
10741100

1075-
local bytes
1076-
bytes, err = sock:send(request)
1101+
local bytes, err = sock:send(request)
10771102
if not bytes then
10781103
self:log(ERR, "failed to send http request to '", hostname, " (", ip, ":", port, ")': ", err)
10791104
if err == "timeout" then
10801105
sock:close() -- timeout errors do not close the socket.
1081-
return self:report_timeout(ip, port, hostname, "active")
1106+
self:report_timeout(ip, port, hostname, "active")
1107+
else
1108+
self:report_tcp_failure(ip, port, hostname, "send", "active")
10821109
end
1083-
return self:report_tcp_failure(ip, port, hostname, "send", "active")
1110+
return nil
10841111
end
10851112

10861113
local status_line
@@ -1089,9 +1116,11 @@ function checker:run_single_check(ip, port, hostname, hostheader)
10891116
self:log(ERR, "failed to receive status line from '", hostname, " (",ip, ":", port, ")': ", err)
10901117
if err == "timeout" then
10911118
sock:close() -- timeout errors do not close the socket.
1092-
return self:report_timeout(ip, port, hostname, "active")
1119+
self:report_timeout(ip, port, hostname, "active")
1120+
else
1121+
self:report_tcp_failure(ip, port, hostname, "receive", "active")
10931122
end
1094-
return self:report_tcp_failure(ip, port, hostname, "receive", "active")
1123+
return nil
10951124
end
10961125

10971126
local from, to = re_find(status_line,
@@ -1102,12 +1131,101 @@ function checker:run_single_check(ip, port, hostname, hostheader)
11021131
status = tonumber(status_line:sub(from, to))
11031132
else
11041133
self:log(ERR, "bad status line from '", hostname, " (", ip, ":", port, ")': ", status_line)
1105-
-- note: 'status' will be reported as 'nil'
1134+
status = 0 -- report_http_status treats 0 as unhealthy
11061135
end
11071136

11081137
sock:close()
11091138

11101139
self:log(DEBUG, "Reporting '", hostname, " (", ip, ":", port, ")' (got HTTP ", status, ")")
1140+
return status
1141+
end
1142+
1143+
1144+
-- Negotiates the HTTP version for a target based on the probe result.
1145+
-- If the status suggests a version mismatch, retries with the other version.
1146+
-- Updates the target's cached version preference.
1147+
-- Returns the final status to report, or nil if a transport error occurred.
1148+
local function negotiate_http_version(self, target, ip, port, hostname,
1149+
hostheader, typ, http_version, status)
1150+
local is_healthy = self.checks.active.healthy.http_statuses[status]
1151+
1152+
-- Version auto-detection (only for standard HTTP version codes):
1153+
-- 1. 505 (HTTP Version Not Supported) -> try the other version
1154+
-- 2. 426 (Upgrade Required) on HTTP/1.0 -> try HTTP/1.1
1155+
-- Both triggers are gated on `not is_healthy` to respect user configuration.
1156+
-- Both triggers fire regardless of cache state, enabling self-healing when
1157+
-- a server changes its supported HTTP version.
1158+
local should_retry = not is_healthy and
1159+
((status == 505) or
1160+
(status == 426 and http_version == "1.0"))
1161+
1162+
if not should_retry then
1163+
return status
1164+
end
1165+
1166+
local other_version = (http_version == "1.0") and "1.1" or "1.0"
1167+
self:log(WARN, "target '", hostname or "", " (", ip, ":", port,
1168+
")' returned ", status, " on HTTP/", http_version,
1169+
", retrying with HTTP/", other_version)
1170+
1171+
local sock = establish_connection(self, ip, port, hostname, hostheader, typ)
1172+
if not sock then
1173+
return nil
1174+
end
1175+
1176+
local retry_status = probe_http(self, sock, ip, port, hostname, hostheader, other_version)
1177+
if not retry_status then
1178+
return nil
1179+
end
1180+
1181+
-- Decide which status to report and cache the version preference.
1182+
-- If retry gave a healthy result, the other version works — adopt it.
1183+
-- Otherwise, stick with the original version and its status so that
1184+
-- health reporting reflects the version we actually cache.
1185+
local retry_is_healthy = self.checks.active.healthy.http_statuses[retry_status]
1186+
local final_status
1187+
1188+
if target then
1189+
if retry_is_healthy then
1190+
final_status = retry_status
1191+
target.http_version = other_version
1192+
else
1193+
final_status = status
1194+
target.http_version = http_version
1195+
end
1196+
end
1197+
1198+
return final_status
1199+
end
1200+
1201+
1202+
-- Runs a single healthcheck probe
1203+
function checker:run_single_check(ip, port, hostname, hostheader)
1204+
local typ = self.checks.active.type
1205+
1206+
local sock = establish_connection(self, ip, port, hostname, hostheader, typ)
1207+
if not sock then
1208+
return
1209+
end
1210+
1211+
if typ == "tcp" then
1212+
sock:close()
1213+
return self:report_success(ip, port, hostname, "active")
1214+
end
1215+
1216+
local target = get_target(self, ip, port, hostname)
1217+
local http_version = (target and target.http_version) or "1.1"
1218+
1219+
local status = probe_http(self, sock, ip, port, hostname, hostheader, http_version)
1220+
if not status then
1221+
return
1222+
end
1223+
1224+
status = negotiate_http_version(self, target, ip, port, hostname,
1225+
hostheader, typ, http_version, status)
1226+
if not status then
1227+
return
1228+
end
11111229

11121230
return self:report_http_status(ip, port, hostname, status, "active")
11131231
end

t/with_resty-events/18-req-headers.t

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,10 @@ true
7979
--- error_log
8080
checking healthy targets: nothing to do
8181
checking healthy targets: #1
82-
GET /status HTTP/1.0
83-
User-Agent: curl/7.29.0
82+
GET /status HTTP/1.1
8483
Host: 127.0.0.1
84+
Connection: close
85+
User-Agent: curl/7.29.0
8586
8687
8788
@@ -128,9 +129,10 @@ true
128129
--- error_log
129130
checking healthy targets: nothing to do
130131
checking healthy targets: #1
131-
GET /status HTTP/1.0
132-
User-Agent: curl
132+
GET /status HTTP/1.1
133133
Host: 127.0.0.1
134+
Connection: close
135+
User-Agent: curl
134136
135137
136138
=== TEST 3: headers: { ["User-Agent"] = "curl" }
@@ -176,9 +178,10 @@ true
176178
--- error_log
177179
checking healthy targets: nothing to do
178180
checking healthy targets: #1
179-
GET /status HTTP/1.0
180-
User-Agent: curl
181+
GET /status HTTP/1.1
181182
Host: 127.0.0.1
183+
Connection: close
184+
User-Agent: curl
182185
183186
184187
@@ -225,9 +228,10 @@ true
225228
--- error_log
226229
checking healthy targets: nothing to do
227230
checking healthy targets: #1
228-
GET /status HTTP/1.0
229-
User-Agent: curl
231+
GET /status HTTP/1.1
230232
Host: 127.0.0.1
233+
Connection: close
234+
User-Agent: curl
231235
232236
233237
@@ -274,7 +278,8 @@ true
274278
--- error_log
275279
checking healthy targets: nothing to do
276280
checking healthy targets: #1
277-
GET /status HTTP/1.0
281+
GET /status HTTP/1.1
282+
Host: 127.0.0.1
283+
Connection: close
278284
User-Agent: curl
279285
User-Agent: nginx
280-
Host: 127.0.0.1

0 commit comments

Comments
 (0)