-
Notifications
You must be signed in to change notification settings - Fork 14
Description
If two clients concurrently call the /liveness route on the REST API, one of them will time out. This is easy to reproduce from the command line. Note that I use a & after the first curl command so that it runs asynchronously alongside the second curl command. (localhost:17303 is the REST API for kafka scheduler for me)
% ( time curl -i http://localhost:17303/liveness & ; time curl -i http://localhost:17303/liveness )
HTTP/1.1 200 OK
Date: Thu, 15 Sep 2022 03:02:51 GMT
Content-Length: 0
curl -i http://localhost:17303/liveness 0.00s user 0.01s system 0% cpu 2.351 total
HTTP/1.1 500 Internal Server Error
Date: Thu, 15 Sep 2022 03:02:53 GMT
Content-Length: 0
curl -i http://localhost:17303/liveness 0.00s user 0.01s system 0% cpu 5.024 total
The first one completes successfully, as expected. But the second one times out. The server logs show a line like:
[00] ERRO[2022-09-15T03:02:53Z] timeout for isalive probe from liveness channel
This presents a problem if multiple things in a distributed system are simultaneously checking the health. For example, EC2 target health checks documentation points out that "Health checks for a Network Load Balancer are distributed and use a consensus mechanism to determine target health. Therefore, targets receive more than the configured number of health checks."
As best I can tell, the issue is that the IsAlive function fakes a scheduled message from Kafka at
| case storeEvents <- isAliveSchedule(epoch): |
Which has a hard-coded ID of
||is-alive||: | const isAliveID string = "||is-alive||" |
Two concurrent calls to
IsAlive will result in two timers with the same ID being added. But the timer code deduplicates those using the ID: kafka-message-scheduler/internal/timers/timers.go
Lines 61 to 62 in 9b44c3c
| // if found, we stop the existing timer | |
| if t, ok := ts.items[s.ID()]; ok { |
IsAlive call stomps over the first IsAlive call in timers, and thus only one timer event is ever returned in the livenessChan.