Skip to content

Master node disconnects from joining node too early during re-election #126192

Open
@DaveCTurner

Description

@DaveCTurner

Since #92755 the master node will report a WARN log if a node leaves the cluster and then rejoins without restarting, including the reason disconnected if the node left the cluster because a connection from the master node was closed. This normally indicates an unstable network. However, this can also happen during a messy election if the following (improbable!) sequence of events occur:

  • master node M wins election in term T and publishes cluster state version V in this term
  • another node N sends a join request to M
  • M's Coordinator#handleJoinRequest opens a connection to N for the duration of the join
  • M publishes cluster state version (V+1) in term T adding N to the cluster
  • M accepts its own publication and writes cluster state version (V+1) to disk
  • M moves to term (T+1), failing the publication in term T before it commits
  • M wins another election in term (T+1) and publishes cluster state version (V+2) in this term; this state is based on the on-disk state with version (V+1) so it includes N in the cluster
  • the failure to commit state version (V+1) completes the waiting join listener which closes the connection opened by handleJoinRequest
  • the connection closure enqueues a node-left task for N which is published as state version (V+3)
  • N rejoins the cluster having not restarted, triggering the warning.

There's various places we could fix this, maybe just suppressing the warning, but the fundamental problem is IMO that we drop the outbound connection to a node while it still might be joining the cluster (in the sense that we haven't committed a state which excludes it). The failure of the first node-join publication does not imply that the join failed, because it may be retried and eventually succeed:

* state publication fails then this method receives a {@link FailedToCommitClusterStateException}. If publication fails then a new
* master is elected and the update might or might not take effect, depending on whether or not the newly-elected master accepted the
* published state that failed to be committed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions