Description
The problem/use-case that the feature addresses
A command to tell a cluster primary to step down and trigger a failover to one of its replicas. Selecting the best replica shall be done automatically and ASAP.
A use case is to do this before taking down the node. @madolson knows more about the use cases. (Please edit.)
Description of the feature
The primary should pause writes, wait for one of the replicas to match its replication offset, then trigger a manual failover to it without data loss.
Replicas should still vote for the new replica, to avoid two primaries for the same shard in splitbrain scenarios. If the replication offset is already matching, then the replica can do CLUSTER FAILOVER FORCE but not TAKEOVER.
Ideally, the mechanisms for this should be the same as for #1091.
Alternatives you've considered
- Manually selecting a replica and triggering a failover to it. This may not be the one that fastest matches the replication offset.
- Manually pausing writes, waiting for any one of the replicas to match replication offset, then triggering a CLUSTER FAILOVER to it. If INFO is sent repeatedly, then this adds some extra latency to the procedure.
- Trigger manual failover on SIGTERM / shutdown to cluster primary #1091 is an alternative for some of the use cases: when you want to shutdown the primary.
Additional information
...