-
-
Notifications
You must be signed in to change notification settings - Fork 171
Description
Summary
When monitoring remote nodes using Monitor(target), network disconnection detection takes approximately 150 seconds due to default system TCP keep-alive settings. It would be beneficial to expose TCP keep-alive configuration at the node level.
Problem Description
I'm using Monitor(target) to track remote nodes. When a node loses network connectivity (e.g., systemctl stop network), the session termination is detected only after ~150 seconds. This delay corresponds to system-level keep-alive settings and genDefaultKeepAlivePeriod (15 seconds):
$ sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
Modifying system-wide settings is not desirable in many production environments.
Proposed Solution
Add KeepAliveConfig to the node configuration options:
type KeepAliveConfig struct {
// Enable indicates whether keep-alive probes are enabled.
Enable bool
// Idle is the time that the connection must be idle before
// the first keep-alive probe is sent.
// If zero, a default value of 15 seconds is used.
Idle time.Duration
// Interval is the time between keep-alive probes.
// If zero, a default value of 15 seconds is used.
Interval time.Duration
// Count is the maximum number of keep-alive probes that
// can go unanswered before dropping a connection.
// If zero, a default value of 9 is used.
Count int
}This would allow per-connection keep-alive tuning using Go's net.TCPConn.SetKeepAliveConfig() (available since Go 1.24).
Use Case
- Faster detection of network failures in distributed systems
- No need to modify system-level TCP settings
- Fine-grained control over connection health monitoring
Additional Context
Go 1.24 introduced net.KeepAliveConfig which allows setting these parameters per-connection:
https://pkg.go.dev/net#KeepAliveConfig