Add timeout to server_join


### Proposal
It will be nice if `server_join` stanza can have a `timeout` field in the `server_join` stanza. 

### Use-cases

Purposefully gave a wrong config to a Nomad agent running as server and from the logs:

```
Aug 02 12:38:23 nomad-node-0 nomad[50839]: ==> Newer Nomad version available: 1.1.3 (currently running: 1.1.2)
Aug 02 12:38:29 nomad-node-0 nomad[50839]:     2021-08-02T12:38:29.914+0530 [INFO]  client: node registration complete
Aug 02 12:39:20 nomad-node-0 nomad[50839]:     2021-08-02T12:39:20.239+0530 [WARN]  agent.joiner: join failed: error="2 errors occurred:
Aug 02 12:39:20 nomad-node-0 nomad[50839]:         * Failed to join 1.1.1.1: dial tcp 1.1.1.1:4648: i/o timeout
Aug 02 12:39:20 nomad-node-0 nomad[50839]:         * Failed to join 2.2.2.2: dial tcp 2.2.2.2:4648: i/o timeout
Aug 02 12:39:20 nomad-node-0 nomad[50839]: " retry=15s


Aug 02 12:40:35 nomad-node-0 nomad[50839]:     2021-08-02T12:40:35.243+0530 [WARN]  agent.joiner: join failed: error="2 errors occurred:
Aug 02 12:40:35 nomad-node-0 nomad[50839]:         * Failed to join 1.1.1.1: dial tcp 1.1.1.1:4648: i/o timeout
Aug 02 12:40:35 nomad-node-0 nomad[50839]:         * Failed to join 2.2.2.2: dial tcp 2.2.2.2:4648: i/o timeout
Aug 02 12:40:35 nomad-node-0 nomad[50839]: " retry=15s
Aug 02 12:41:50 nomad-node-0 nomad[50839]:     2021-08-02T12:41:50.248+0530 [WARN]  agent.joiner: join failed: error="2 errors occurred:
Aug 02 12:41:50 nomad-node-0 nomad[50839]:         * Failed to join 1.1.1.1: dial tcp 1.1.1.1:4648: i/o timeout
Aug 02 12:41:50 nomad-node-0 nomad[50839]:         * Failed to join 2.2.2.2: dial tcp 2.2.2.2:4648: i/o timeout
Aug 02 12:41:50 nomad-node-0 nomad[50839]: " retry=15s
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.252+0530 [ERROR] agent.joiner: max join retry exhausted, exiting
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.253+0530 [INFO]  agent: requesting shutdown
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.253+0530 [INFO]  client: shutting down
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.253+0530 [INFO]  client.plugin: shutting down plugin manager: plugin-type=device
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.256+0530 [INFO]  client.plugin: plugin manager finished: plugin-type=device
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.256+0530 [INFO]  client.plugin: shutting down plugin manager: plugin-type=driver
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.259+0530 [INFO]  client.plugin: plugin manager finished: plugin-type=driver
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.259+0530 [INFO]  client.plugin: shutting down plugin manager: plugin-type=csi
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.261+0530 [INFO]  client.plugin: plugin manager finished: plugin-type=csi
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.262+0530 [INFO]  nomad: shutting down server
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.262+0530 [WARN]  nomad: serf: Shutdown without a Leave
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.263+0530 [INFO]  nomad: cluster leadership lost
Aug 02 12:43:05 nomad-node-0 nomad[50839]:     2021-08-02T12:43:05.263+0530 [INFO]  agent: shutdown complete
```

You can see that Nomad almost took 5 minutes to see that the servers is unable to join and then the service exited. 

Since there's no `timeout` defined, I am guessing it waits for a default of 60s or something higher. There's no way to configure that, which makes `retry_interval` also useless since the next retry will happen only once the first attempt failed (which is 75s according to the logs I shared). 

So maybe we can add a `timeout` and give a sane config like `5s` or something as a default as well (It should be less than `retry_interval`). 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add timeout to server_join #10986

Proposal

Use-cases

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add timeout to server_join #10986

Description

Proposal

Use-cases

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions