-
Notifications
You must be signed in to change notification settings - Fork 404
Ensure we don't ever retry a payment along a just-failed path #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure we don't ever retry a payment along a just-failed path #1252
Conversation
If we try to pay a mobile client behind an LSP, its not strange for the singular last-hop hint to fail with a Temporary Channel Failure (indicating the mobile app is not currently open and connected to the LSP). In this case, we will penalize the last-hop channel but try again along the same path anyway, because we have no other path. This changes the retryer to simply refuse to do so, failing the payment instead. Fixes lightningdevkit#1241.
Codecov Report
@@ Coverage Diff @@
## main #1252 +/- ##
==========================================
+ Coverage 90.41% 90.43% +0.01%
==========================================
Files 70 70
Lines 38087 38117 +30
==========================================
+ Hits 34437 34471 +34
+ Misses 3650 3646 -4
Continue to review full report at Codecov.
|
fn retry_payment( | ||
&self, payment_id: PaymentId, payment_hash: PaymentHash, params: &RouteParameters | ||
fn retry_payment(&self, payment_id: PaymentId, payment_hash: PaymentHash, | ||
params: &RouteParameters, avoid_scid: Option<u64> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would be cleaner if part of the RouteParameters
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And avoided directly in find_route
, that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean passing it through to the router itself and asking it to completely avoid an SCID? That feels like its better done via the Score
implementer, which I guess is ultimately the problem here - that the Scorer
in use int he sample (ie our default one) doesn't strictly refuse to pay over a channel that just failed. That said, I do feel like the InvoicePayer
should be robust against a braindead scorer, whether its our own or a user-provided one, so it feels nice to have it here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but also don't put the onus on the event handler to set it and pass it to find_route
. Simply have the ChannelManager
set it when creating the PaymentPathFailed
event. Then it is completely transparent to anyone handling the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I do feel like "avoid this channel" is really more of a Score
thing than a router thing - we have a whole interface for it, it seems annoying to duplicate that interface here. Its not a lot of code change, but still awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... but this use case is (a) ephemeral as it only applies to a specific payment -- lower payment amounts may be successful for another payment or even the failed path if further split on retry -- and (b) being handled by the caller not the scorer in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right in this instance its strangely dual-caller-scorer handling it - the scorer de-prioritizes and the caller handles the "oh, this went wrong, we cant do this, scorer or router are busted" case. I guess two more practical questions on behavior that may inform this more:
a) do we want to track this information across payment attempts - if there's two available last-hop hints do we want to just go back and forth between them until we run out of attempts,
b) do we care about avoiding the path in the router or are we okay with failing if we find the same path again (ie if the scorer is broken or doesn't learn, are we okay just failing the payment vs making sure the router picks another path)?
Both imply that the data should be in the RouteParameters
, I think, if we care about either (I'm not sure we do), but (a) implies it should be in the Payee
(to be renamed) not RouteParameters
, even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this more offline, sounds like we want to/should go with moving the logic as described here, will do.
I'm gonna put this on ice until #1227 lands as I don't really want to touch the router until then as its all a bit in-flux. |
Supersceded by #1600 |
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment part level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. This does have the drawback that it could allow two separate part of a payment to traverse the same path even though that path just failed, however this should only occur if the payment is going to fail anyway, at least as long as the scorer is properly learning. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment part level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. This does have the drawback that it could allow two separate part of a payment to traverse the same path even though that path just failed, however this should only occur if the payment is going to fail anyway, at least as long as the scorer is properly learning. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment part level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. This does have the drawback that it could allow two separate part of a payment to traverse the same path even though that path just failed, however this should only occur if the payment is going to fail anyway, at least as long as the scorer is properly learning. Closes lightningdevkit#1241, superseding lightningdevkit#1252.
If we try to pay a mobile client behind an LSP, its not strange for
the singular last-hop hint to fail with a Temporary Channel Failure
(indicating the mobile app is not currently open and connected to
the LSP). In this case, we will penalize the last-hop channel but
try again along the same path anyway, because we have no other
path. This changes the retryer to simply refuse to do so, failing
the payment instead.
Fixes #1241.