Add TransportKind option to channels and channel builder#820
Add TransportKind option to channels and channel builder#820
Conversation
ericmlujan
left a comment
There was a problem hiding this comment.
Had a question about wrapping this enum in an outer TransportOptions struct to give us some flexibility in the future, wanted to get your thoughts on that
| /// not currently [`WebSocketServer`](crate::WebSocketServer) or [`McapWriter`](crate::McapWriter). | ||
| /// Messages are still delivered to the Foxglove in-order. | ||
| /// | ||
| /// Note: this only applies to messages under 15kb, larger messages will use Reliable transport. |
There was a problem hiding this comment.
Are we planning on dispatching based on message size? Or is this more saying "you must use Reliable transport if message size exceeds 15K"?
There was a problem hiding this comment.
We can do either. I think dynamic dispatch on message size is the best user experience (at least when not limited by network bandwidth), so I lean that way.
| sinks: LogSinkSet, | ||
| closed: AtomicBool, | ||
| #[allow(dead_code)] | ||
| transport_kind: TransportKind, |
There was a problem hiding this comment.
Instead of exposing this at the top-level of a Channel, I wonder if we nest this in a TransportOptions struct, so the Channel API doesn't need to change as we introduce new QoS and transport configuration settings. Thoughts?
There was a problem hiding this comment.
Yes, that's a good idea
| /// Options that control how data for the channel is sent over the network with applicable sinks. | ||
| #[derive(Debug, Clone, Copy, Default)] | ||
| #[repr(u8)] | ||
| pub enum TransportKind { |
There was a problem hiding this comment.
Naming: TransportReliability?
There was a problem hiding this comment.
or TransportDelivery? Naming things...
There was a problem hiding this comment.
I lean towards TransportReliability. TransportKind and TransportDelivery are a bit too generic for my taste; that could encapsulate things like History and Durability, which are orthogonal to this axes of delivery.
| sinks: LogSinkSet, | ||
| closed: AtomicBool, | ||
| #[allow(dead_code)] | ||
| transport_kind: TransportKind, |
There was a problem hiding this comment.
I don't think this belongs on the channel at all. Transport options are particular to a (channel, sink) pair. I'd prefer that we rework this along the lines of sink channel filters, but of course only for those sinks that can offer differentiated QoS.
There was a problem hiding this comment.
I agree with that, it's specific to the (channel, sink) pair. But, it's imperative that it be efficient to access. If it requires a hash table lookup on each log call, I think it's too much. Given that, I think the Channel gives a logical and efficient place to store the setting, and sinks are free to ignore it unless they need it.
Can you think of another way we could implement this efficiently?
There was a problem hiding this comment.
I'm not really worried about a hashmap lookup on the fast path. If this becomes a measurable bottleneck, and we want to optimize, we could potentially store it during channel subscription, as part of the LogSinkSet.
There was a problem hiding this comment.
Keep in mind it's not just a hashmap lookup, but taking a read() lock on a RwLock, doing the lookup, and releasing the lock. About ~400 cycles as a rough estimate, depending on caching and architecture.
At 10k log messages/sec it'd be around 4 million cycles / sec just for checking a flag, which could have been a pointer indirection. If we don't make this part of the public interface, having it on the channel as an implementation detail seems ok to me. Less code to maintain too. We can always refactor it if we want to later.
There was a problem hiding this comment.
OK, so then let's see what it would take to configure this via the sink, and store it in the LogSinkSet. At this point I care more about interfacing than performance.
Changelog
None
Docs
None
Description
Adds a TransportKind option to channels that lets users decide between lossy and reliable transport on a per-channel basis. The default is Lossy, but will only affect the new CloudSink. This is not a full QoS solution by any means, but meant to allow users to override the Lossy default for channels that are critically important. Most users shouldn't need to touch this.
I've only added Rust support, if we agree with the direction I'll add Python and C++ before marking it ready for review.