Skip to content

Conversation

@alexpyattaev
Copy link

@alexpyattaev alexpyattaev commented May 2, 2025

Problem

Gossip Extensions mechanism is effectively dead, TLV could allow us to make it do what it was supposed to

Summary of Changes

  • Add TLV encoding mechanism to extensions
  • Implementation of TLV logic for our needs

@alexpyattaev alexpyattaev force-pushed the tlv branch 3 times, most recently from 27c106d to da79d4a Compare May 5, 2025 20:18
@codecov-commenter
Copy link

codecov-commenter commented May 5, 2025

Codecov Report

Attention: Patch coverage is 82.19178% with 13 lines in your changes missing coverage. Please review.

Project coverage is 82.9%. Comparing base (5494638) to head (f1d2e47).
Report is 125 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6082   +/-   ##
=======================================
  Coverage    82.9%    82.9%           
=======================================
  Files         842      843    +1     
  Lines      377680   377751   +71     
=======================================
+ Hits       313116   313195   +79     
+ Misses      64564    64556    -8     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks goooooood! just a few thoughts. also i think the walking the packet concern is kind of a moot point since we have to do that anyway for sockets: Vec<SocketEntry>. also thank you for refreshing my macro game.

Comment on lines -136 to +153
extensions: Vec<Extension>,
extensions: Vec<TlvRecord>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh this is nice. so rollout strategy here is switch to Vec<TlvRecord> and leave empty until everyone adopts. since those not updated will fail to accept/deser ContactInfo with non-empty vecs here. and then we can start populating extensions with TlvRecords.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pretty much. At that point any client can do whatever they want with Extensions as long as we do not get conflicting tags.

Comment on lines +75 to +83
pub(crate) fn parse<'a, T: TryFrom<&'a TlvRecord>>(entries: &'a [TlvRecord]) -> Vec<T> {
entries.iter().filter_map(|v| T::try_from(v).ok()).collect()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may not be a huge deal since its a sender issue but if there are multiple TlvRecords with the same typ we just keep the last one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why that would be the case. It will skip invalid entries, but will happily produce as many identical TLVs as there are in the original entries slice. Granted, the Vec allocation here is grotesque, but its a PoC =)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya turns out i can't read lol. ignore

$(
$typ => Ok(Self::$variant(bincode::deserialize::<$inner>(&value.bytes)?)),
)*
_ => Err(bincode::Error::custom("Invalid type")),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe would be nice to differentiate between invalid type and some other decode failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how that would work to be honest, can you elaborate?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the code kind of differentiates already. since if the type is valid we just return some bincode::Error::<error> and if the type is invalid then we get bincode::Error::custom("invalid type").

but from a caller error handling/clarity/debugging perspective it may be nice to provide a way to match on the error rather than having to do match err.kind() { ... } and possibly have to do string matching.
we could wrap the error in some TlvDecodeError like:

pub enum TlvDecodeError {
    #[error("Unknown type: {0}")]
    UnknownType(u8),
    #[error("Malformed payload: {0}")]
    MalformedPayload(#[from] bincode::Error),
}

or change the error string to 'invalid tlv type" so we don't get crossed up with someone in the future defining a bincode error with "invalid type".

won't die on this hill. but would just clean/clear up error messages and handling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good idea, I'll put that in. Thanks!

Comment on lines 121 to 124
// The Extension enum for optional features in gossip
define_tlv_enum! (pub(crate) enum Extension {
1=>SupportSenderSignatures(u8),
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to add a more detailed comment here about how to add a new tlv type:
/// TLV features in ContactInfo
///
/// On the wire each record is: [type: u8][len of vec][bytes vec]
/// Unknown types are ignored by tlv::parse, so new types can be added without breaking legacy code
///
/// Always add new TLV records to the end of this enum
/// Never reorder or reuse a type

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 78a0053

// Make sure legacy recover works correctly
let legacy: Vec<ExtensionLegacy> = crate::tlv::parse(&tlv_vec);
assert!(matches!(legacy[0], ExtensionLegacy::Test(42)));
assert_eq!(legacy.len(), 1, "Legacy parser should only recover ")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy parse should only recover ... what?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 78a0053

@ravyu-jump
Copy link

TLVs seem like a good fit for the extensions vec. Have you considered something more standard like protobufs? That way we can define types in a platform-agnostic matter (and share definitions in a centralized repo)

@alexpyattaev
Copy link
Author

@ravyu-jump my main motivation was based on the fact that bincode is used nearly everywhere in solana networking stack, so it is a "devil we know", and Firedancer already has relevant implementation details sorted out (and for Zig there is a libraray that does the job). So the concern with protobuf is that it is a fairly complex format, and bringing that into the network-facing part of the clients would make the attack surface larger.

Transitioning gossip entirely to protobuf instead of bincode could be an interesting proposition though, as protobuf has a far smoother mechanism for protocol evolution over time (aside from the advantages you have mentioned w.r.t. cross-language compatbility). We should definitely discuss this more and see if such change is justified (and whether protobuf is even a good option for that or we should go with something fancier like capnproto).

@alexpyattaev alexpyattaev marked this pull request as ready for review May 28, 2025 15:49
@alexpyattaev alexpyattaev requested a review from gregcusack May 28, 2025 15:49
Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks mostly good. just would add clearer error handling as discussed here: #6082 (comment)

@gregcusack gregcusack self-requested a review May 28, 2025 20:32
Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@alexpyattaev alexpyattaev merged commit 7be24c2 into anza-xyz:master May 29, 2025
47 checks passed
@alexpyattaev alexpyattaev deleted the tlv branch May 29, 2025 13:21
mircea-c pushed a commit to mircea-c/agave that referenced this pull request Jun 12, 2025
* TLV PoC for Gossip extensions

---------

Co-authored-by: Greg Cusack <[email protected]>
mircea-c added a commit to mircea-c/agave that referenced this pull request Jun 12, 2025
* TLV PoC for Gossip extensions

---------

Co-authored-by: Greg Cusack <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants