Master node disconnects from joining node too early during re-election

Since #92755 the master node will report a `WARN` log if a node leaves the cluster and then rejoins without restarting, including the reason `disconnected` if the node left the cluster because a connection from the master node was closed. This normally indicates an unstable network. However, this can also happen during a messy election if the following (improbable!) sequence of events occur:

- master node M wins election in term T and publishes cluster state version V in this term
- another node N sends a join request to M
- M's `Coordinator#handleJoinRequest` opens a connection to N for the duration of the join
- M publishes cluster state version (V+1) in term T adding N to the cluster
- M accepts its own publication and writes cluster state version (V+1) to disk
- M moves to term (T+1), failing the publication in term T before it commits
- M wins another election in term (T+1) and publishes cluster state version (V+2) in this term; this state is based on the on-disk state with version (V+1) so it includes N in the cluster
- the failure to commit state version (V+1) completes the waiting join listener which closes the connection opened by `handleJoinRequest`
- the connection closure enqueues a `node-left` task for N which is published as state version (V+3)
- N rejoins the cluster having not restarted, triggering the warning.

There's various places we could fix this, maybe just suppressing the warning, but the fundamental problem is IMO that we drop the outbound connection to a node while it still might be joining the cluster (in the sense that we haven't committed a state which excludes it). The failure of the first `node-join` publication does not imply that the join failed, because it may be retried and eventually succeed:

https://github.com/elastic/elasticsearch/blob/a59c182f9f7e9d1bf3d6eecbc0e44f24ff91d053/server/src/main/java/org/elasticsearch/cluster/ClusterStateTaskListener.java#L20-L22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master node disconnects from joining node too early during re-election #126192

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	* state publication fails then this method receives a {@link FailedToCommitClusterStateException}. If publication fails then a new
	* master is elected and the update might or might not take effect, depending on whether or not the newly-elected master accepted the
	* published state that failed to be committed.

Master node disconnects from joining node too early during re-election #126192

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions