Skip to content

caddytest: make TestReverseProxyHealthCheck deterministic with poll instead of sleep#7474

Merged
francislavoie merged 1 commit intocaddyserver:masterfrom
Amirhf1:fix/reverseproxy-healthcheck-test-deterministic
Feb 17, 2026
Merged

caddytest: make TestReverseProxyHealthCheck deterministic with poll instead of sleep#7474
francislavoie merged 1 commit intocaddyserver:masterfrom
Amirhf1:fix/reverseproxy-healthcheck-test-deterministic

Conversation

@Amirhf1
Copy link
Contributor

@Amirhf1 Amirhf1 commented Feb 12, 2026

the test was flaky because health checks can still be warming up when the first request is made. The 100ms sleep was not reliable: sometimes the upstream was still reported unhealthy, and the request returned 503 instead of 200, causing CI noise and test failures.

Solution:

  • Removed the fixed time.Sleep(100ms) and its TODO comment.
  • Added a poll loop that:
    • Sends a GET request to the proxy every 20ms.
    • Succeeds as soon as the response is 200 with body "Hello, World!".
    • Fails with a clear message if no success within a 5s timeout.

This keeps the test deterministic, removes arbitrary wait times, and aligns with the maintainer’s request for a polling mechanism with a timeout.

Fixes #7468

@CLAassistant
Copy link

CLAassistant commented Feb 12, 2026

CLA assistant check
All committers have signed the CLA.

@Amirhf1 Amirhf1 force-pushed the fix/reverseproxy-healthcheck-test-deterministic branch from 281f340 to 7c8ad4c Compare February 12, 2026 22:15
@francislavoie
Copy link
Member

Looks fine I guess, but I'd want to understand why the upstream is considered unhealthy to start in the first place. It's running as part of the same server, so it doesn't make sense to me that it would return 5xx at all if the server is reachable.

@Amirhf1
Copy link
Contributor Author

Amirhf1 commented Feb 12, 2026

with active health checks the upstream is only marked healthy after the first health check passes.
that first check runs in a goroutine right after config load so there’s a race: if the test hits the proxy before that check completes the upstream is still unhealthy and we get 503 (no upstreams available) the poll just waits until the health state has converged (we get 200) making the test deterministic

@francislavoie
Copy link
Member

I think we should probably assume configured/known upstreams are healthy unless discovered otherwise, to at least allow early requests a chance at making their way through 🤔 WDYT @mholt

@francislavoie
Copy link
Member

After thinking about it some more, I think a better fix would be to run a separate server before running the test, so that the active health check is run against ANOTHER server rather than the same Caddy instance that is currently running, since that's what causes the race condition. Caddy starts the active health checks during provisioning, but since we're still in provisioning the server isn't ready yet to accept requests so it can't connect to itself.

This patch I think is more reasonable, if you'd like to apply it

diff --git a/caddytest/integration/reverseproxy_test.go b/caddytest/integration/reverseproxy_test.go
index cbfe8433b..f6e1a9ba7 100644
--- a/caddytest/integration/reverseproxy_test.go
+++ b/caddytest/integration/reverseproxy_test.go
@@ -8,7 +8,6 @@ import (
        "runtime"
        "strings"
        "testing"
-       "time"
 
        "github.com/caddyserver/caddy/v2/caddytest"
 )
@@ -327,6 +326,37 @@ func TestReverseProxyWithPlaceholderTCPDialAddress(t *testing.T) {
 }
 
 func TestReverseProxyHealthCheck(t *testing.T) {
+       // start lightweight backend servers so they're ready before Caddy's
+       // active health checker runs; this avoids a startup race where the
+       // health checker probes backends that haven't yet begun accepting
+       // connections and marks them unhealthy.
+
+       // backend server that responds to proxied requests
+       helloSrv := &http.Server{
+               Handler: http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
+                       w.Write([]byte("Hello, World!"))
+               }),
+       }
+       ln0, err := net.Listen("tcp", "127.0.0.1:2020")
+       if err != nil {
+               t.Fatalf("failed to listen on 127.0.0.1:2020: %v", err)
+       }
+       go helloSrv.Serve(ln0)
+       t.Cleanup(func() { helloSrv.Close(); ln0.Close() })
+
+       // backend server that serves health checks
+       healthSrv := &http.Server{
+               Handler: http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
+                       w.Write([]byte("ok"))
+               }),
+       }
+       ln1, err := net.Listen("tcp", "127.0.0.1:2021")
+       if err != nil {
+               t.Fatalf("failed to listen on 127.0.0.1:2021: %v", err)
+       }
+       go healthSrv.Serve(ln1)
+       t.Cleanup(func() { healthSrv.Close(); ln1.Close() })
+
        tester := caddytest.NewTester(t)
        tester.InitServer(`
        {
@@ -336,12 +366,6 @@ func TestReverseProxyHealthCheck(t *testing.T) {
                https_port    9443
                grace_period 1ns
        }
-       http://localhost:2020 {
-               respond "Hello, World!"
-       }
-       http://localhost:2021 {
-               respond "ok"
-       }
        http://localhost:9080 {
                reverse_proxy {
                        to localhost:2020
@@ -355,8 +379,6 @@ func TestReverseProxyHealthCheck(t *testing.T) {
                }
        }
        `, "caddyfile")
-
-       time.Sleep(100 * time.Millisecond) // TODO: for some reason this test seems particularly flaky, getting 503 when it should be 200, unless we wait
        tester.AssertGetResponse("http://localhost:9080/", 200, "Hello, World!")
 }
 

@mholt
Copy link
Member

mholt commented Feb 16, 2026

Oh, yeah, that's actually a better approach. Good thinking!!

…nstead of sleep

Start lightweight backend servers before starting Caddy so active health checks
probe a ready backend instead of the same Caddy instance during provisioning.
This removes the startup race without fixed sleeps or polling.
@Amirhf1 Amirhf1 force-pushed the fix/reverseproxy-healthcheck-test-deterministic branch from 7c8ad4c to 6c970ca Compare February 17, 2026 11:17
@Amirhf1
Copy link
Contributor Author

Amirhf1 commented Feb 17, 2026

Thanks for the suggestion, which makes a lot of sense

I updated the test to start two lightweight backend servers (Hello + health) before starting Caddy, so the active health checker probes a ready backend instead of the same Caddy instance while it’s still provisioning. This removes the startup race without fixed sleeps or polling.

@francislavoie

Tested locally:

  • go test ./caddytest/integration -run TestReverseProxyHealthCheck -v
=== RUN   TestReverseProxyHealthCheck
{"level":"info","ts":1771326501.569343,"msg":"maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined"}
{"level":"info","ts":1771326501.5693681,"msg":"GOMEMLIMIT is updated","GOMEMLIMIT":7730941132,"previous":9223372036854775807}
{"level":"info","ts":1771326501.56937,"msg":"using config from file","file":"/var/folders/_r/q482p7bd78x4wxv5p_ps9n_r0000gp/T/511271183"}
{"level":"info","ts":1771326501.569371,"msg":"adapted config to JSON","adapter":"caddyfile"}
{"level":"info","ts":1771326501.570507,"logger":"admin","msg":"admin endpoint started","address":"localhost:2999","enforce_origin":false,"origins":["//[::1]:2999","//127.0.0.1:2999","//localhost:2999"]}
{"level":"info","ts":1771326501.573667,"msg":"autosaved config (load with --resume flag)","file":"/Users/amirhossainfalah/Library/Application Support/Caddy/autosave.json"}
{"level":"info","ts":1771326501.5736718,"msg":"serving initial configuration"}
{"level":"info","ts":1771326502.5724201,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51034","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.574481,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51036","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.575906,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2999","uri":"/load","remote_ip":"127.0.0.1","remote_port":"51038","headers":{"Accept-Encoding":["gzip"],"Content-Length":["313"],"Content-Type":["text/caddyfile"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.5809631,"logger":"admin","msg":"admin endpoint started","address":"localhost:2999","enforce_origin":false,"origins":["//localhost:2999","//[::1]:2999","//127.0.0.1:2999"]}
{"level":"warn","ts":1771326502.5814521,"logger":"http.auto_https","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv0","http_port":9080}
{"level":"info","ts":1771326502.581571,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0x3eb90268680"}
{"level":"warn","ts":1771326502.5820868,"logger":"http","msg":"HTTP/2 skipped because it requires TLS","network":"tcp","addr":":9080"}
{"level":"warn","ts":1771326502.582113,"logger":"http","msg":"HTTP/3 skipped because it requires TLS","network":"tcp","addr":":9080"}
{"level":"info","ts":1771326502.582117,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"info","ts":1771326502.58241,"logger":"pki.ca.local","msg":"root certificate trust store installation disabled; unconfigured clients may show warnings","path":"storage:pki/authorities/local/root.crt"}
{"level":"info","ts":1771326502.583092,"msg":"autosaved config (load with --resume flag)","file":"/Users/amirhossainfalah/Library/Application Support/Caddy/autosave.json"}
{"level":"info","ts":1771326502.583114,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1771326502.583313,"msg":"caddytest: config load time took 8.468667ms"}
{"level":"info","ts":1771326502.584502,"logger":"admin","msg":"stopped previous server","address":"localhost:2999"}
{"level":"info","ts":1771326502.587381,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51042","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.587944,"msg":"caddytest: redirecting the dialer from localhost:9080 to 127.0.0.1:9080"}
--- PASS: TestReverseProxyHealthCheck (1.03s)
=== RUN   TestReverseProxyHealthCheckUnixSocket
{"level":"info","ts":1771326502.5904799,"logger":"tls","msg":"cleaning storage unit","storage":"FileStorage:/Users/amirhossainfalah/Library/Application Support/Caddy"}
{"level":"info","ts":1771326502.591048,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51042","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.592395,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51047","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.592708,"logger":"http.handlers.reverse_proxy.health_checker.active","msg":"HTTP request failed","host":"localhost:2021","error":"Get \"http://localhost:2021/health\": dial tcp [::1]:2020: connect: connection refused"}
{"level":"info","ts":1771326502.5937002,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2999","uri":"/load","remote_ip":"127.0.0.1","remote_port":"51051","headers":{"Accept-Encoding":["gzip"],"Content-Length":["326"],"Content-Type":["text/caddyfile"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.596218,"logger":"admin","msg":"admin endpoint started","address":"localhost:2999","enforce_origin":false,"origins":["//localhost:2999","//[::1]:2999","//127.0.0.1:2999"]}
{"level":"warn","ts":1771326502.59644,"logger":"http.auto_https","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv0","http_port":9080}
{"level":"info","ts":1771326502.596583,"logger":"pki.ca.local","msg":"root certificate trust store installation disabled; unconfigured clients may show warnings","path":"storage:pki/authorities/local/root.crt"}
{"level":"error","ts":1771326502.596693,"logger":"http.handlers.reverse_proxy.health_checker.active","msg":"active health check failed","address":"/var/folders/_r/q482p7bd78x4wxv5p_ps9n_r0000gp/T/2918371307.sock:2021","error":"making request: parse \"http://%2Fvar%2Ffolders%2F_r%2Fq482p7bd78x4wxv5p_ps9n_r0000gp%2FT%2F2918371307.sock:2021/health\": invalid URL escape \"%2F\""}
{"level":"warn","ts":1771326502.596725,"logger":"http","msg":"HTTP/2 skipped because it requires TLS","network":"tcp","addr":":9080"}
{"level":"warn","ts":1771326502.596736,"logger":"http","msg":"HTTP/3 skipped because it requires TLS","network":"tcp","addr":":9080"}
{"level":"info","ts":1771326502.596738,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"info","ts":1771326502.59677,"logger":"http","msg":"servers shutting down; grace period initiated","duration":0.000000001}
{"level":"info","ts":1771326502.598701,"msg":"autosaved config (load with --resume flag)","file":"/Users/amirhossainfalah/Library/Application Support/Caddy/autosave.json"}
{"level":"info","ts":1771326502.598722,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1771326502.598929,"msg":"caddytest: config load time took 6.228417ms"}
{"level":"info","ts":1771326502.5996008,"logger":"tls","msg":"finished cleaning storage units"}
{"level":"info","ts":1771326502.5998468,"logger":"admin","msg":"stopped previous server","address":"localhost:2999"}
{"level":"info","ts":1771326502.600841,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51053","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.601204,"msg":"caddytest: redirecting the dialer from localhost:9080 to 127.0.0.1:9080"}
--- PASS: TestReverseProxyHealthCheckUnixSocket (0.01s)
=== RUN   TestReverseProxyHealthCheckUnixSocketWithoutPort
{"level":"info","ts":1771326502.602475,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51053","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.603267,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51056","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.604188,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2999","uri":"/load","remote_ip":"127.0.0.1","remote_port":"51058","headers":{"Accept-Encoding":["gzip"],"Content-Length":["306"],"Content-Type":["text/caddyfile"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.605812,"logger":"admin","msg":"admin endpoint started","address":"localhost:2999","enforce_origin":false,"origins":["//localhost:2999","//[::1]:2999","//127.0.0.1:2999"]}
{"level":"warn","ts":1771326502.605987,"logger":"http.auto_https","msg":"server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server","server_name":"srv0","http_port":9080}
{"level":"info","ts":1771326502.60625,"logger":"pki.ca.local","msg":"root certificate trust store installation disabled; unconfigured clients may show warnings","path":"storage:pki/authorities/local/root.crt"}
{"level":"warn","ts":1771326502.606339,"logger":"http","msg":"HTTP/2 skipped because it requires TLS","network":"tcp","addr":":9080"}
{"level":"warn","ts":1771326502.606343,"logger":"http","msg":"HTTP/3 skipped because it requires TLS","network":"tcp","addr":":9080"}
{"level":"info","ts":1771326502.606345,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"info","ts":1771326502.606369,"logger":"http","msg":"servers shutting down; grace period initiated","duration":0.000000001}
{"level":"info","ts":1771326502.606685,"msg":"autosaved config (load with --resume flag)","file":"/Users/amirhossainfalah/Library/Application Support/Caddy/autosave.json"}
{"level":"info","ts":1771326502.606693,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1771326502.606953,"msg":"caddytest: config load time took 3.458083ms"}
{"level":"info","ts":1771326502.607082,"logger":"admin","msg":"stopped previous server","address":"localhost:2999"}
{"level":"info","ts":1771326502.6084769,"logger":"admin.api","msg":"received request","method":"GET","host":"localhost:2999","uri":"/config/","remote_ip":"127.0.0.1","remote_port":"51060","headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1771326502.608932,"msg":"caddytest: redirecting the dialer from localhost:9080 to 127.0.0.1:9080"}
--- PASS: TestReverseProxyHealthCheckUnixSocketWithoutPort (0.01s)
PASS
ok      github.com/caddyserver/caddy/v2/caddytest/integration   (cached)

@francislavoie francislavoie added this to the v2.11.0 milestone Feb 17, 2026
@francislavoie francislavoie added the optimization 📉 Performance or cost improvements label Feb 17, 2026
@francislavoie francislavoie merged commit 091add5 into caddyserver:master Feb 17, 2026
28 checks passed
This was referenced Feb 20, 2026
@francislavoie francislavoie mentioned this pull request Feb 20, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization 📉 Performance or cost improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make reverse proxy health check integration test deterministic

4 participants