Ensure we don't ever retry a payment along a just-failed path #1252

TheBlueMatt · 2022-01-18T21:39:05Z

If we try to pay a mobile client behind an LSP, its not strange for
the singular last-hop hint to fail with a Temporary Channel Failure
(indicating the mobile app is not currently open and connected to
the LSP). In this case, we will penalize the last-hop channel but
try again along the same path anyway, because we have no other
path. This changes the retryer to simply refuse to do so, failing
the payment instead.

Fixes #1241.

If we try to pay a mobile client behind an LSP, its not strange for the singular last-hop hint to fail with a Temporary Channel Failure (indicating the mobile app is not currently open and connected to the LSP). In this case, we will penalize the last-hop channel but try again along the same path anyway, because we have no other path. This changes the retryer to simply refuse to do so, failing the payment instead. Fixes lightningdevkit#1241.

codecov-commenter · 2022-01-18T22:01:10Z

Codecov Report

Merging #1252 (51d9c54) into main (7b6a7bb) will increase coverage by 0.01%.
The diff coverage is 93.18%.

@@            Coverage Diff             @@
##             main    #1252      +/-   ##
==========================================
+ Coverage   90.41%   90.43%   +0.01%     
==========================================
  Files          70       70              
  Lines       38087    38117      +30     
==========================================
+ Hits        34437    34471      +34     
+ Misses       3650     3646       -4

Impacted Files	Coverage Δ
lightning-invoice/src/payment.rs	`92.96% <93.18%> (+0.20%)`	⬆️
lightning/src/ln/functional_tests.rs	`97.36% <0.00%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b6a7bb...51d9c54. Read the comment docs.

jkczyz · 2022-01-18T22:14:09Z

lightning-invoice/src/payment.rs

-	fn retry_payment(
-		&self, payment_id: PaymentId, payment_hash: PaymentHash, params: &RouteParameters
+	fn retry_payment(&self, payment_id: PaymentId, payment_hash: PaymentHash,
+		params: &RouteParameters, avoid_scid: Option<u64>


I wonder if this would be cleaner if part of the RouteParameters.

And avoided directly in find_route, that is.

You mean passing it through to the router itself and asking it to completely avoid an SCID? That feels like its better done via the Score implementer, which I guess is ultimately the problem here - that the Scorer in use int he sample (ie our default one) doesn't strictly refuse to pay over a channel that just failed. That said, I do feel like the InvoicePayer should be robust against a braindead scorer, whether its our own or a user-provided one, so it feels nice to have it here too?

Yeah, but also don't put the onus on the event handler to set it and pass it to find_route. Simply have the ChannelManager set it when creating the PaymentPathFailed event. Then it is completely transparent to anyone handling the event.

Right, I do feel like "avoid this channel" is really more of a Score thing than a router thing - we have a whole interface for it, it seems annoying to duplicate that interface here. Its not a lot of code change, but still awkward.

Hmm... but this use case is (a) ephemeral as it only applies to a specific payment -- lower payment amounts may be successful for another payment or even the failed path if further split on retry -- and (b) being handled by the caller not the scorer in this PR.

Right in this instance its strangely dual-caller-scorer handling it - the scorer de-prioritizes and the caller handles the "oh, this went wrong, we cant do this, scorer or router are busted" case. I guess two more practical questions on behavior that may inform this more:

a) do we want to track this information across payment attempts - if there's two available last-hop hints do we want to just go back and forth between them until we run out of attempts,
b) do we care about avoiding the path in the router or are we okay with failing if we find the same path again (ie if the scorer is broken or doesn't learn, are we okay just failing the payment vs making sure the router picks another path)?

Both imply that the data should be in the RouteParameters, I think, if we care about either (I'm not sure we do), but (a) implies it should be in the Payee (to be renamed) not RouteParameters, even.

Discussed this more offline, sounds like we want to/should go with moving the logic as described here, will do.

TheBlueMatt · 2022-01-26T18:36:43Z

I'm gonna put this on ice until #1227 lands as I don't really want to touch the router until then as its all a bit in-flux.

TheBlueMatt · 2022-07-06T21:23:53Z

Supersceded by #1600

When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. Closes lightningdevkit#1241, superseding lightningdevkit#1252.

When an HTLC fails, we currently rely on the scorer learning the failed channel and assigning an infinite (`u64::max_value()`) penalty to the channel so as to avoid retrying over the exact same path (if there's only one available path). This is common when trying to pay a mobile client behind an LSP if the mobile client is currently offline. This leads to the scorer being overly conservative in some cases - returning `u64::max_value()` when a given path hasn't been tried for a given payment may not be the best decision, even if that channel failed 50 minutes ago. By tracking channels which failed on a payment part level and explicitly refusing to route over them we can relax the requirements on the scorer, allowing it to make different decisions on how to treat channels that failed relatively recently without causing payments to retry the same path forever. This does have the drawback that it could allow two separate part of a payment to traverse the same path even though that path just failed, however this should only occur if the payment is going to fail anyway, at least as long as the scorer is properly learning. Closes lightningdevkit#1241, superseding lightningdevkit#1252.

TheBlueMatt added this to the 0.1 milestone Jan 18, 2022

jkczyz reviewed Jan 18, 2022

View reviewed changes

TheBlueMatt added the blocked on dependent pr label Jan 26, 2022

TheBlueMatt self-assigned this Feb 16, 2022

TheBlueMatt mentioned this pull request Jul 6, 2022

Avoid reusing just-failed channels in the router, making the impossibility penalty configurable #1600

Merged

TheBlueMatt closed this Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ensure we don't ever retry a payment along a just-failed path #1252

Ensure we don't ever retry a payment along a just-failed path #1252

Uh oh!

TheBlueMatt commented Jan 18, 2022

Uh oh!

codecov-commenter commented Jan 18, 2022 •

edited

Loading

Uh oh!

jkczyz Jan 18, 2022

Uh oh!

jkczyz Jan 18, 2022

Uh oh!

TheBlueMatt Jan 18, 2022

Uh oh!

jkczyz Jan 18, 2022

Uh oh!

TheBlueMatt Jan 18, 2022

Uh oh!

jkczyz Jan 18, 2022

Uh oh!

TheBlueMatt Jan 19, 2022

Uh oh!

TheBlueMatt Jan 25, 2022

Uh oh!

TheBlueMatt commented Jan 26, 2022

Uh oh!

TheBlueMatt commented Jul 6, 2022

Uh oh!

Uh oh!

Ensure we don't ever retry a payment along a just-failed path #1252

Ensure we don't ever retry a payment along a just-failed path #1252

Uh oh!

Conversation

TheBlueMatt commented Jan 18, 2022

Uh oh!

codecov-commenter commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Jan 26, 2022

Uh oh!

TheBlueMatt commented Jul 6, 2022

Uh oh!

Uh oh!

codecov-commenter commented Jan 18, 2022 •

edited

Loading