Description
Since #92755 the master node will report a WARN
log if a node leaves the cluster and then rejoins without restarting, including the reason disconnected
if the node left the cluster because a connection from the master node was closed. This normally indicates an unstable network. However, this can also happen during a messy election if the following (improbable!) sequence of events occur:
- master node M wins election in term T and publishes cluster state version V in this term
- another node N sends a join request to M
- M's
Coordinator#handleJoinRequest
opens a connection to N for the duration of the join - M publishes cluster state version (V+1) in term T adding N to the cluster
- M accepts its own publication and writes cluster state version (V+1) to disk
- M moves to term (T+1), failing the publication in term T before it commits
- M wins another election in term (T+1) and publishes cluster state version (V+2) in this term; this state is based on the on-disk state with version (V+1) so it includes N in the cluster
- the failure to commit state version (V+1) completes the waiting join listener which closes the connection opened by
handleJoinRequest
- the connection closure enqueues a
node-left
task for N which is published as state version (V+3) - N rejoins the cluster having not restarted, triggering the warning.
There's various places we could fix this, maybe just suppressing the warning, but the fundamental problem is IMO that we drop the outbound connection to a node while it still might be joining the cluster (in the sense that we haven't committed a state which excludes it). The failure of the first node-join
publication does not imply that the join failed, because it may be retried and eventually succeed: