-
-
Notifications
You must be signed in to change notification settings - Fork 648
[client] Feature/lazy connection #3379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fix thread handling
@pappz let me know if you need help testing. I have some IoT devices were I had to disable Netbird again, as it was consuming too much data, so I'm really interesting in helping this PR along :) |
Check the minimum version number of the remote peer, and if the remote peer does not support the lazy connection, then always use a permanent connection. Keep in mind just because the user updated the agent, the management server will not push a new network map with the new version information.
@Lauszus, thank you! |
No problem. I can enable it and test it on my side. I have ~50 devices where I can try it on. |
…ng (#3747) Fix the exclude list handling. When the agent gets a new network map, it manages the changes in the excluded peers. Cases: Extend the exclude list with a new peer: start permanent peer connection Remove a peer from the exclude list: we do not know the actual state of the peer connection, so we must suppose it is not in "idle". Start the inactivity listener for peer. If a peer is newly created in the management server and it will be in exclude list (i.e. it is a router) then the manager can not start the connection immediately because the peer.Conn is not exist yet. It is normal and the engine code will create the peer and start the connection in the next step.
Co-authored-by: hakansa <[email protected]>
fix idle state managmenet Validate the threshold env variable If the remote peer initiates the connection, we just reset the inactivity listener, but do not start it. As a result, after a disconnection, our peer never switches back to the idle state. For every peer, we instantiate an inactivity listener. It is a reusable timer, but it must start on a go routine after it has been stopped. The key challenge is to ensure that we never start the timer multiple times and always clean it up properly when the peer is in the idle state or it has been set to the exclusion list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description doesn't match code anymore (env vars)
if d.isClosed { | ||
d.peerCfg.Log.Debugf("exit from activity listener") | ||
} else { | ||
d.peerCfg.Log.Errorf("failed to read from activity listener: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not retry (continue) reading if there's an error != closed? Maybe with a ctx check in-between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With it, we risk a high load loop. If it fails to read from UDP once, what makes us think it will succeed the second time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With it, we risk a high load loop.
True
If it fails to read from UDP once, what makes us think it will succeed the second time?
It could be a temporary error, e.g. OS ran out of file descriptors.
But yeah, in general this should be rare. Maybe we can add a comment here explaining the implication for the time being?
Add an option to the management proto to enable or disable the lazy connection feature on the peer side. Command-line options and environment variables override the management configuration. - When the feature is enabled: Start the lazy connection manager and add all peers to the manager in the "active" st-ate. - When the feature is disabled: Stop the lazy connection manager and establish connections to all peers in the store. Fix deadlock The conn_mgr is protected by a mutex. However, because some internal functions are called via callbacks from the lazy connection manager, a deadlock can occur. The solution is to completely eliminate the callbacks.
|
Describe your changes
With the lazy connection feature, the peer will connect to target peers on-demand. The trigger can be any IP traffic.
This feature can be enabled with the NB_ENABLE_EXPERIMENTAL_LAZY_CONN environment variable.
When the engine receives a network map, it binds a free UDP port for every remote peer, and the system configures WireGuard endpoints for these ports. When traffic appears on a UDP socket, the system removes this listener and starts the peer connection procedure immediately.
Key changes
netbird status -d
commandIssue ticket number and link
Checklist