-
Notifications
You must be signed in to change notification settings - Fork 68
Description
I've been working on a port of GGPO to Rust: backroll-rs, and built a transport abstraction layer backroll-transport that I think generalizes very well and may be better served as a part of Laminar than as it's own isolated crate.
Motivation
Provide a reliable UDP implementation that works well in async executors for games (i.e. bevy_tasks) that works over any I/O layer implementation: raw UDP, WebRTC, or proprietary sockets like the one provided by Steamwork's ISteamNetworkingSockets or Epic Online Services. Laminar already has a fairly well defined reliability model built atop a raw std::net::UdpSocket, so adding this kind of abstraction can easily leverage the existing types and implementations in Laminar's codebase.
Design
The cornerstone of this design calls for the heavy use of a bidirectional version of async-channel to pass messages around independent tasks on a futures executor. This acts as three things: a representation of the connection state (is the channel still open or not), a buffered connection to pass messages along, and an point of abstraction. Channels are generic only on the type of message they pass, so there is no need to create nested structs with generic parameters to connect up two distinct layers of the network stack (a la ConnectionManager).
A specialization of this channel, known in backroll_transport as a Peer, is a newtype around BidirectionalAsyncChannel that solely focuses on Box<[u8]>, raw binary packets. By default, Peers are assumed to pass only unreliable, unordered packets: a direct UDP analogue. This has already been implemented into backroll_transport, and is used as the main abstraction point for the actual I/O layer (see backroll_transport_udp).
To help manage multiple ongoing connections, backroll_transport has a Peers<T>, a newtype wrapper around DashMap<T, Peer>, which only presents active and open connections by removing non-connected peers from public access. Dropping the entire Peers<T> struct also closes all associated Peers.
By using infinitely looping tasks around these bidrectional channels, it's possible to create a graph of interconnected tasks that construct the connection itself. Below is an example of one such connection (each box is a independent task, all outgoing and incoming arrows are separate multiple producer, multiple consumer channels):
The main game logic regularly polls the top level streams to get notifications about updated connection state and uses it to update the game state. All other tasks run independently of the game loop and continue to run as more I/O is performed. Any additional logic for creating reliability, ordering, or sequencing is performed inline in this graph of tasks. A current implementation is not available, but it may not be too much work to refactor a good number of Laminar's connection state machine components to work with this kind of design. It may be possible to provide a newtyped ReliablePeer, OrderedPeer, etc. that wrap an existing Peer to provide those connection properties as needed. Alternatively, if a I/O layer implementation supports reliable, ordered, or sequenced connections, it can return it's own newtyped peer (see Steam's ISteamNetworkingSockets) rather than relying on Laminar's. This also allows the connection stack to be only as long as needed: unreliable packets do not require the overhead of checking packet reliability headers.
These individual tasks can be very simple, and encapsulate the running connection state well. For example, the heartbeat/keep-alive task for keeping the connection open in backroll is written simply as the following:
async fn heartbeat(self, interval: Duration) {
while let Ok(()) = self.send(MessageData::KeepAlive) {
debug!("Sent keep alive packet");
Delay::new(interval).await;
}
}As these tasks terminate when the associated channels are closed, disconnecting with a connection with a remote peer is as simple as closing the associated channels. The tasks will then terminate in a cascading fashion as each channel is closed, eventually removing the connection from the I/O layer Peers<T> tracker.
One additional benefit is that the I/O layer can be abstracted without replacing the actual types for communication. It does not matter if UDP, WebRTC, or the proprietary sockets for Steam or Epic Online Service's are used as the I/O implementation, they all return Peer. With one caveat, as a peer is considered valid so long as the connection is open, implementations where there is an initial handshake to establish a connection may need to return impl Future<Output=Result<Peer, ...>> instead.
Pros
- Bidirectional channels are a very close approximation to how two way operations on connections operate.
async-channeltypes implementStreamandSink, which makes it easy to use combinators to handle transformative logic.- As futures only poll when awoken, there is little to zero overhead while connections are not active. This may improve scalability and enable support for higher active connection counts per process (i.e. MMO servers).
- This integrates well with existing Rust game engines like Bevy, which come with their own async executors.
- This removes the need to manually poll each
ConnectionManagerevery so often, outside of potentially flushing disconnected peers fromPeers<T>. - If the underlying tasks are exposed as a part of Laminar's public API, it may be possible to allow users of the library to inject logic at a lower level (i.e. packet level encryption/compression).
- Having an abstraction using
Peerallows unit testing with an opaque in-process connection rather than requiring a generic fake socket implementation.FakeSocketcan be replaced with a singular task that connects two distinctPeers that emulate packet loss and latency.
Cons
- An futures executor is required.
- For those setting up a runtime environment that supports this may need additional boilerplate to set up.
- Error handling may not be so clear beyond sending events instead of raw binary packets and logging.
- Unit tests for both Laminar and consumers of the API will need to run async tests.
Alternatives considered
Potentially this could be put under a separate async module enabled via feature flag (see how redis-rs handles the sync/async split) instead of a ground up rewrite.
For generalizing across different socket types, allowing DatagramSocket to be generically implemented any network remote ID, not just SocketAddr would be very useful.
