-
Notifications
You must be signed in to change notification settings - Fork 404
Avoid reusing just-failed channels in the router, making the impossibility penalty configurable #1600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid reusing just-failed channels in the router, making the impossibility penalty configurable #1600
Conversation
149fc6e
to
848afc3
Compare
Codecov Report
@@ Coverage Diff @@
## main #1600 +/- ##
==========================================
+ Coverage 90.86% 91.00% +0.14%
==========================================
Files 80 80
Lines 44437 45502 +1065
Branches 44437 45502 +1065
==========================================
+ Hits 40377 41409 +1032
- Misses 4060 4093 +33
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, generally looks good, just some comments.
848afc3
to
e6a114c
Compare
Rebased and addressed feedback. |
e6a114c
to
846291f
Compare
e3396ef
to
cc1a045
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lightning/src/routing/router.rs
Outdated
if contributes_sufficient_value && doesnt_exceed_max_path_length && | ||
doesnt_exceed_cltv_delta_limit && may_overpay_to_meet_path_minimum_msat { | ||
doesnt_exceed_cltv_delta_limit && !payment_failed_on_this_channel && | ||
may_overpay_to_meet_path_minimum_msat | ||
{ | ||
hit_minimum_limit = true; | ||
} else if contributes_sufficient_value && doesnt_exceed_max_path_length && | ||
doesnt_exceed_cltv_delta_limit && over_path_minimum_msat { | ||
doesnt_exceed_cltv_delta_limit && over_path_minimum_msat && | ||
!payment_failed_on_this_channel | ||
{ | ||
// Note that low contribution here (limited by available_liquidity_msat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking payment_failed_on_this_channel
twice, consider having a leading if
expression.
if payment_failed_on_this_channel {
} else if /* ... */ {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good idea, I moved all the existing if conditions to that!
cc1a045
to
a863778
Compare
// Equivalent to hitting the else clause below with the amount equal to the effective | ||
// capacity and without any certainty on the liquidity upper bound, plus the | ||
// impossibility penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs to be updated now since it's not only when equal to the effective capacity. Seems like the earlier change that added this now removed logic is what caused calculates_log10_without_overflowing_u64_max_value
to not exercise the correct code path as mentioned earlier on that test (i.e., it doesn't hit the negative_log10_times_2048
case below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the comment differently - I read the comment to say "the calculation we're doing here is equivalent to..." which is still true, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, you're right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely looks good but one comment.
// Equivalent to hitting the else clause below with the amount equal to the effective | ||
// capacity and without any certainty on the liquidity upper bound, plus the | ||
// impossibility penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, you're right!
let negative_log10_times_2048 = NEGATIVE_LOG10_UPPER_BOUND * 2048; | ||
self.combined_penalty_msat(amount_msat, negative_log10_times_2048, params) | ||
.saturating_add(params.considered_impossible_penalty_msat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to do the max of these two rather than adding? Is the idea that we want this to be >=
anything given in the else
clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that's the idea. I suppose we could do a max
. Originally I had it not adding the combined_penalty
call at all but that seemed to brittle to deal with so switched to adding. I don't honestly have a strong opinion between adding and max, either way the docs can tell users what's going on, but for max i kinda worry users will acidentally set it too low and get no penalty here, which seems strange?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They would also need to set the other params lower, though, since max
would select the combined penalty over the considered_impossible_penalty_msat
if the latter were too low. So it's all kinda relative. Don't feel too strongly either though note that max
may make debugging a little easier as otherwise you may need to mentally subtract some combined penalty if it is not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sure, I just meant it'd be easy for a user to look at the field and think "okay, let me pick something a bit higher than the liquidity offset, forget to multiply by 2 or whatever, and end up with a penalty equal to the liquidity penalty, which seems wrong? I dunno, I'm happy to mentally convert when we're debugging. Unless you feel strongly I'd suggest we leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sure, I just meant it'd be easy for a user to look at the field and think "okay, let me pick something a bit higher than the liquidity offset, forget to multiply by 2 or whatever, and end up with a penalty equal to the liquidity penalty, which seems wrong?
Plus base and amount penalty, FWIW.
I dunno, I'm happy to mentally convert when we're debugging. Unless you feel strongly I'd suggest we leave it.
Sure we can leave it. Though one thing I just realized is that either way now the penalty will be variable across channels depending on the amount. Maybe less so when using max
but maybe that's an argument in favor of adding. That way if left to choose only channels exceeding the maximum liquidity, we'd prefer ones that would otherwise be penalized less.
a863778
to
85b842c
Compare
Squashed without further changes. |
85b842c
to
eea6311
Compare
eea6311
to
8a1b418
Compare
Squashed yet again. Change since yesterday was:
|
66ca68a mentions failures on a payment-level, but isn't it really on a path-level? i.e., if two parts of an MPP fail on different channels, retrying one part could retry over the channel failed on the other part? |
8a1b418
to
5bff5f9
Compare
Oops, right, sorry, rewrote that commit message without changes to the diff, added a note that it this does have the drawback of potentially retrying different parts along the same path, but hopefully the scorer doesn't let that happen unless the payment is gonna fail anyway. |
Yeah, same across payments, but as you said hopefully the scorer will learn quickly enough. |
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment part level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. This does have the drawback that it could allow two separate part of a payment to traverse the same path even though that path just failed, however this should only occur if the payment is going to fail anyway, at least as long as the scorer is properly learning. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When we consider sending an HTLC over a given channel impossible due to our current knowledge of the channel's liquidity, we currently always assign a penalty of `u64::max_value()`. However, because we now refuse to retry a payment along the same path in the router itself, we can now make this value configurable. This allows users to have a relatively high knowledge decay interval without the side-effect of refusing to try the only available path in cases where a channel is intermittently available.
In general we should avoid taking paths that we are confident will not work as much possible, but we should be willing to try each payment at least once, even if its over a channel that failed recently. A full Bitcoin penalty for such a channel seems reasonable - lightning fees are unlikely to ever reach that point so such channels will be scored much worse than any other potential path, while still being below `u64::max_value()`.
5bff5f9
to
a3547e2
Compare
Rebased to address merge conflict. |
Some users want to keep a very long scorer data half-life to maintain knowledge for a longer period of time. Its somewhat unclear if that's optimal, but it is clear that it can cause the scorer to refuse to build a route when the only available channel failed most recently within the halflife. This is rather unexpected behavior, and the fact that the scorer must behave this way to avoid #1241 is very annoying in that it prevents fixing this.
Here we move the avoidance of just-failed channels into the router itself, allowing us to make the impossibility penalty configurable, which we do as well.
Closes #1241, superseding #1252.