Skip to content

Add default implementation for tracking metadata #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

sourabhmarathe
Copy link
Contributor

Adds NetworkTracker trait that allows users to implement interface
between their metadata and their view of NetworkGraph. Users can
implement their own metadata structs, but at a minimum will have to
implement the UpdateMetadata trait (even if the functions do nothing).

By default, this trait uses a new struct called DefaultMetadata
which just tracks previously failed routes, and can find out if a
route has previously failed.

Adds NetworkTracker trait that allows users to implement interface
between their metadata and their view of NetworkGraph. Users can
implement their own metadata structs, but at a minimum will have to
implement the UpdateMetadata trait (even if the functions do nothing).

By default, this trait uses a new struct called DefaultMetadata
which just tracks previously failed routes, and can find out if a
route has previously failed.
@codecov
Copy link

codecov bot commented Jun 7, 2020

Codecov Report

Merging #636 into master will decrease coverage by 0.03%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #636      +/-   ##
==========================================
- Coverage   91.26%   91.22%   -0.04%     
==========================================
  Files          35       35              
  Lines       20918    20927       +9     
==========================================
+ Hits        19090    19091       +1     
- Misses       1828     1836       +8     
Impacted Files Coverage Δ
lightning/src/routing/network_graph.rs 90.79% <0.00%> (-0.83%) ⬇️
lightning/src/ln/functional_tests.rs 97.13% <0.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f08d610...bb5c725. Read the comment docs.

Also add route_succeeded() function which removes the successful route
off the failed_route Vec.
Track the number of successes a failed channel needs to get off the
"failed" list. If it is on the list, it penalty fee is u64::MAX. If
route_success is called and the channel ID is on the route, its score is
decremented until it hits 0, at which point it is removed from the
failed list.
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're doing a lot of back-and-forth on this scoring interface without explicitly stating its requirements:

  • We do want an interface flexible to encompass a lot different cases (see Add ChannelScore struct (WIP) #626 (comment)). Ideally this different cases may be combined by client to avoid each of them to reinvent the wheel
  • It's hard to predict whole flexibility desired right now without going a bit further, and actually implementing a least a default one
  • Beyond, a channel scoring algorithm, we want to be sure failed routes aren't submitted again (Channel fee penalty (WIP) #635 (comment))
  • Moving forward, how design here (Channel fee penalty (WIP) #635 (review)) or its refinement is incompatible with the "I literally just tried that route and it failed", you may just pass a high severity score score_payment_success ?

Does anyone see issues not stated ?

@@ -428,6 +428,70 @@ impl Readable for NodeInfo {
}
}

/// Allows for updating user's network metadata.
pub trait UpdateMetadata {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me this trait is trying to encompass both inputs consuming to compute score (add_failed_route) and scores consumption (get_channel_fee_penalty). Second is likely to be consumed by Router and any-kind of peer selection, but first-one is really tied about the origin of data consumed.

If it's event-based, like receiving a PaymentSuccess or PaymentFailed, client is going to call a add_failed_route method-like. If it's self-running, like a thread probing/tracking peer liveness,, you just call something like spawn_liveness_thread. So I would favor dissociate those traits even if their are implemented on the same default struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can disassociate these traits, I think that makes sense. Maybe the NetworkGraph <---> User Metadata connection should be called UpdatesMetadata, and the User metadata <---> router should be called ScoresChannels. The user would implement both traits, and the network would, by default, send certain data to the traits, and the router would by, default, ask for channel scores when conducting its A* search.

}

/// A default metadata object that is used to implement the default functionality for the
/// NetworkTracker trait. A user could make their own Metadata object and extend the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a question of code sharing here, if people start to work on fancy scoring logic, they may want to combine them, and having each of them implement a custom UpdateMetadata they won't be able to do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why? You could write a combiner, but ultimately we have to have a way to linearize any kind of fancy scoring logic users want into one parameter (namely fee paid) so that we can calculate a route based on it. No matter what users do, they have to be able to linearize it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we agree on the need to linearize it, my point was more on dissociating linearization method (get_channel_fee_penalty) from score-feeding ones (route_succeeded/route_failed). Linearization one should be on such combiner. I don't think we have to do this now, but we may have to introduce such combiner in the future.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. Can you add a parameter to the get_route function to take this and use it?

@sourabhmarathe
Copy link
Contributor Author

I added the route_succeeded method to the RoutingFeePenalty trait and added a parameter to get_route that must implement the RoutingFeePenalty trait. I was also curious to hear what you think of the idea to separate the data feeding (eg methods used to update user metadata) and the data consumption methods (eg getting routing_fee from the user).

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think API is good enough for now. Just few code improvements suggestions. Reusing tests from previous PRs could be great, but they can be added as a follow-up. It's new code there is no risk of regressions due to scoring methods not yet used.

Also, as a next step, we treat uniformly channels which have never being used and ones who have known successes, we should bookmark good ones to favor them with a decreasing fee penalty.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, I think. We should have ChannelManager call route_succeeded and route_failed, though.

fn route_failed(&mut self, route: Vec<RouteHop>, failed_hop: RouteHop) {
for route_hop in route {
if route_hop == failed_hop {
*self.failed_channels.entry(failed_hop.short_channel_id).or_insert(5) += 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH this feels a bit weird and arbitrary - you need 5 successes to un-fail a channel but once we have even 1 failure we refuse to route through a channel at all. I think maybe a more interesting approach would be to consider the time of the last failure and de-prioritize exponentially less as a failure ages, removing it entirely if we succeed. Because we cannot call current-time functions in the core library, we can't do that exactly, however something that may approximate would be to fail a channel until the next success - ie if we fail a channel, we wont use it the next time we route, but we'll use it any time after that. A user could, of course, do something better themselves by re-implementing the API. Separately, we could maybe add a feature flag for "we can call gettimeofday()".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that it feels weird/arbitrary. I like the idea of having the exponential backoff too, but as mentioned we do not have access to time. For now, I am implementing "the opposite" where on success routing, all channels are removed from the failed_channels list, and on failure, channels have a "consecutive fails since last successful payment" count incremented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but because we fully-refuse to route through a channel after its failed once, we should never be able to send through it ever again if it fails once (because we'll never succeed routing through it if we never try...).

On success: all channels on the route are removed from the failed
channels list. On failure: all channels are incremented a count
tracking consecutive failed payments executated.
@TheBlueMatt
Copy link
Collaborator

This needs rebase.

@jkczyz
Copy link
Contributor

jkczyz commented Oct 30, 2021

Closing as this was mostly implemented across #1124, #1133, #1144, and #1146.

@jkczyz jkczyz closed this Oct 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants