Better leave_restart_consul #619
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Perform the consul leave on secondaries before leader .
Ensure the cluster failure tolerance is sufficient before proceeding with the next node.
SUMMARY
The current rolling restart runs on host in the inventory order, regardless of the leader status. The consul documentation recommends running rolling restarts on the leader last, for a good reason: if we restart the leader first, we then cause 2 switchovers in fast succession.
-> I added a check on consul leadership and a reordering of hosts before the rolling restart. This change aims at being non invasive: if there is no leader, the initial host order will be used.
The current rolling restarts waits for consul info to respond. However, this does not signify that the node has properly rejoined the cluster as an active voter, and if we proceed with the next consul leave in such a state, we will have an unbalanced cluster.
-> I replaced the consul info by a check on 'Failure Tolerance' status which should be >=1 before proceeding.