Description
Proposal
The FailoverHeartbeatTTL
used when the leader node goes down is hard coded to 5 minutes.
This is a very long time in our use case where we're running clients and servers colocated on 3 nodes.
We need this parameter to be configurable so that it can be lowered to make recovery on colocated server+client nodes faster, since the tasks from the lost client node will not be replaced until the FailoverHeartbeatTTL
time expires.
Short: make FailoverHeartbeatTTL
configurable in the server config file.
Use-cases
Faster recovery when running a cluster with co-located server and client nodes, for instance 3VMs with nomad server+client on each, and no other nodes in the cluster.
Attempted Solutions
None available, FailoverHeartBeatTTL
is hard coded and not exposed as a configuration parameter.
Metadata
Metadata
Assignees
Type
Projects
Status