Skip to content

Update descriptions for connection timeouts and tcp #6666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions distributed/distributed-schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -754,8 +754,14 @@ properties:
properties:
connect:
type: string
description: |
Timeout after which to error when estabilishing a connection.
For example, when creating a connection between client and worker,
client and scheduler, etc.
Comment on lines +758 to +760
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Timeout after which to error when estabilishing a connection.
For example, when creating a connection between client and worker,
client and scheduler, etc.
All connection attempts are retried until this timeout expires before an
exception is raised.
For example, when creating a connection between client and worker,
client and scheduler, etc.

tcp:
type: string
description: |
Timeout after which to error when creating a TCP/Socket connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate. This timeout sets a couple of kernel level timeouts that take effect once a connection is established.

Specifically, it sets TCP_USER_TIMEOUT (See https://man7.org/linux/man-pages/man7/tcp.7.html) and configures a KEEPALIVE probe with appropriate intervals. I'm not sure if it is worth it to go into that much detail, though.

The combination of these two settings ensures that a TCP connection is automatically closed if the remote is dead, or rather, the kernel hasn't acknowledged any TCP package in TCP_USER_TIMEOUTs which very likely means the remote is dead.

We use this mechanism, for instance, to infer whether or not a worker died.


require-encryption:
type:
Expand Down