Rewrite InvoicePayer retry to correctly handle MPP partial failures #1141

TheBlueMatt · 2021-10-26T23:04:12Z

Based on #1059, this fixes some API holes in the return values of ChannelManager::send/retry_payment, and then uses those fixes to fix some MPP retry holes in InvoicePayer.

codecov · 2021-10-26T23:27:17Z

Codecov Report

Merging #1141 (79b9ee8) into main (c53048a) will decrease coverage by 0.01%.
The diff coverage is 85.46%.

❗ Current head 79b9ee8 differs from pull request most recent head 199d258. Consider uploading reports for the commit 199d258 to get more accurate results

@@            Coverage Diff             @@
##             main    #1141      +/-   ##
==========================================
- Coverage   90.21%   90.20%   -0.02%     
==========================================
  Files          70       70              
  Lines       36201    36358     +157     
==========================================
+ Hits        32660    32796     +136     
- Misses       3541     3562      +21

Impacted Files	Coverage Δ
lightning/src/ln/functional_test_utils.rs	`95.14% <ø> (ø)`
lightning/src/ln/channelmanager.rs	`83.72% <66.66%> (-0.10%)`	⬇️
lightning-invoice/src/payment.rs	`91.13% <87.31%> (-1.66%)`	⬇️
lightning/src/ln/chanmon_update_fail_tests.rs	`97.65% <100.00%> (ø)`
lightning/src/ln/functional_tests.rs	`97.27% <0.00%> (-0.04%)`	⬇️
lightning-invoice/src/utils.rs	`83.48% <0.00%> (+8.95%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c53048a...199d258. Read the comment docs.

TheBlueMatt · 2021-10-27T22:23:42Z

Pushed some integration tests that show the fixed issue :)

lightning/src/ln/channelmanager.rs

valentinewallace · 2021-10-28T16:12:25Z

lightning/src/ln/channelmanager.rs

-		for res in results.iter() {
+		let mut pending_amt_unsent = 0;
+		let mut max_unsent_cltv_delta = 0;
+		for (res, path) in results.iter().zip(route.paths.iter()) {
 			if res.is_ok() { has_ok = true; }
 			if res.is_err() { has_err = true; }
 			if let &Err(APIError::MonitorUpdateFailed) = res {


Given we're kind of reconceiving of TemporaryFailure as more like AsyncPersist, it may not make sense to have MonitorUpdateFailed be an Err anymore. But, fine to save this discussion for the PR renaming TemporaryFailure

Yea, I think that'd be a good idea, totally agreed. Also agree lets leave it for renaming TemporaryFailure.

valentinewallace · 2021-10-28T16:18:50Z

lightning/src/ln/channelmanager.rs

@@ -2226,10 +2237,25 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
 				has_err = true;
 				has_ok = true;
 				break;
+			} else if res.is_err() {
+				pending_amt_unsent += path.last().unwrap().fee_msat;
+				max_unsent_cltv_delta = cmp::max(max_unsent_cltv_delta, path.last().unwrap().cltv_expiry_delta);


Wouldn't it be more true to the original invoice parameters to send over the min_unsent_cltv_delta?

Maybe? I mean in principle if its just paying an invoice they should all be the same. I picked max because its the most conservative option from the pov of the recipient - if we include too high a CLTV they shouldn't complain, min maybe they might see as an error? dunno but I'd hope it doesn't really matter in practice.

valentinewallace · 2021-10-28T16:23:25Z

lightning/src/ln/channelmanager.rs

+			Err(PaymentSendFailure::PartialFailure {
+				results,
+				payment_id,
+				failed_paths_retry: if pending_amt_unsent != 0 {


Hmm, not sure how annoying this would be to implement rn, but I think it'd make more sense to consider MonitorUpdateFailed (if TemporaryFailure) a success, and won't result in a PartialFailure (since, iiuc, PartialFailure is mainly for retry purposes?). Then maybe failed_paths_retry could be a non-Option

Hmm, can we leave it for a followup? I don't want to touch any more code than strictly necessary here.

lightning-invoice/src/payment.rs

valentinewallace · 2021-10-28T17:00:45Z

lightning-invoice/src/payment.rs

+					let route = self.router.find_route(
+						&payer,
+						&params,
+						Some(&first_hops.iter().collect::<Vec<_>>()),


Should this be set to None if payer.first_hops() is empty? Guess that way payers could implicitly specify using the network graph first-hop channels

Hmm, I think generally not. Like, if we have all of our peers disconnected and no available channels we should just outright fail all payments instead of adding HTLC updates that won't complete until the peer comes back online (ie creating stuck payments).

lightning-invoice/src/payment.rs

valentinewallace

LGTM mod CI and squash

TheBlueMatt · 2021-10-28T20:16:17Z

Went ahead and squashed since there were no other reviewers yet with only a trivial change to fix build:

$ git diff-tree -U1 4b31eea0d f2ab405ab
diff --git a/lightning-invoice/src/payment.rs b/lightning-invoice/src/payment.rs
index e51ccf296..489d61776 100644
--- a/lightning-invoice/src/payment.rs
+++ b/lightning-invoice/src/payment.rs
@@ -240,3 +240,3 @@ where
 					};
-					if has_expired(params) {
+					if has_expired(&params) {
 						log_trace!(self.logger, "Invoice expired prior to first send for payment {}", log_bytes!(payment_hash.0));
$

TheBlueMatt · 2021-10-29T20:25:18Z

Rebased after merge of #1144.

jkczyz

Largely looks good. Comments are mainly about code organization to make the logic easier to follow.

jkczyz · 2021-10-29T23:00:27Z

lightning-invoice/src/payment.rs

 		let payment_id = Some(invoice_payer.pay_invoice(&invoice).unwrap());
+		std::thread::sleep(Duration::from_secs(2));


You shouldn't need to sleep in the test. Just give an expiry in the PaymentPathFailed event that would cause the condition to be hit.

Hmm? The point of this test is that, unlike fails_paying_invoice_after_expiration, we expire while waiting to retry instead of expiring before the original pay_invoice call.

Yeah, but as far as InvoicePayer is concerned, it doesn't care what the original Invoice said the expiry was when retrying. It just cares what's in the PaymentPathFailed event that the test is later feeding it. So that can be changed to have expired, thus simulating the passage of time.

Ah, that suggestion was too clever by half.

lightning-invoice/src/payment.rs

jkczyz · 2021-10-29T23:29:08Z

lightning-invoice/src/payment.rs

+				} else if let Some(params) = retry {
+					if self.retry_payment(!all_paths_failed, payment_id.unwrap(), *payment_hash, params).is_ok() {
+						// We retried at least somewhat, don't provide the PaymentPathFailed event to the user.
 						return;
 					}
-
-					// Either the payment was rejected, the maximum attempts were exceeded, or an
-					// error occurred when attempting to retry.
-					entry.remove();
 				} else {
-					unreachable!();
+					log_trace!(self.logger, "Payment {} missing retry params; not retrying", log_bytes!(payment_hash.0));
+					if *all_paths_failed { self.payment_cache.lock().unwrap().remove(payment_hash); }
 				}


Could you flip these conditions as it was before so the code is not nested and error cases come first? That way you can retain the fall-through that removes the entry at the end instead of repeating the code three times.

Ah, yes, good point, I'd moved some of the removal logic into retry_payment, but its always 1:1 with returning an Err so it can just be at the callsite.

jkczyz · 2021-10-29T23:34:30Z

lightning-invoice/src/payment.rs

+						Err(PaymentSendFailure::ParameterError(e)) =>
+							return Err(PaymentError::Sending(PaymentSendFailure::ParameterError(e))),
+						Err(PaymentSendFailure::PathParameterError(e)) =>
+							return Err(PaymentError::Sending(PaymentSendFailure::PathParameterError(e))),


Consider using a catch-all Err(e) instead of re-forming the errors.

Hmmmm, I kinda like doing the explicit catches. I'm pretty oldschool but I like having the verbosity because it means if we change the PaymentSendFailure enum to contain a new variant we are compiler-required to update this code vs it silently handling a new case that we don't have currently. If you feel strongly we can change it (grep for the enum always turns it up, of course), but I vaguely prefer it.

lightning-invoice/src/payment.rs

jkczyz · 2021-10-30T00:06:48Z

lightning-invoice/src/payment.rs

+		let htlc_msgs = nodes[0].node.get_and_clear_pending_msg_events();
+		assert_eq!(htlc_msgs.len(), 2);
+		check_added_monitors!(nodes[0], 2);
+		assert!(!*event_handled.borrow());


Not sure how useful this check is if nothing is fed into the event handler / invoice payer's event handler.

Well, that's the point? The InvoicePayer in this test shouldn't generate any user-visible events?

InvoicePayer doesn't generate events. It simply passes them on conditionally. But you aren't giving it any events from what I can see.

Sure, but ChannelManager does? I get ChannelManager doesn't currently generate any such events, but its nice to test the full thing in the integration tests IMO.

I think we want something like nodes[0].node.process_pending_events(&invoice_payer) here then. That way if ChannelManager does generate events, the test would fail if InvoicePayer delegated to the decorated event handler and wasn't supposed to.

I dropped the excess bool and just paniced in the function. We have to have a function, so no reason not to, but no need to test it in detail.

jkczyz · 2021-10-30T00:08:01Z

lightning-invoice/src/payment.rs

@@ -1246,4 +1246,50 @@ mod tests {
 		check_added_monitors!(nodes[0], 2);
 		assert!(!*event_handled.borrow());
 	}
+
+	#[test]
+	fn immediate_retry_on_failure() {


Similar comments for this test.

ariard

Overall SGTM, though still building my understanding of InvoicePayer.

lightning-invoice/src/payment.rs

ariard · 2021-10-30T00:01:49Z

lightning-invoice/src/payment.rs

-					final_cltv_expiry_delta: invoice.min_final_cltv_expiry() as u32,
+		let failed_paths_data = loop {
+			let mut payment_cache = self.payment_cache.lock().unwrap();
+			match payment_cache.entry(payment_hash) {


Should you be able to use a InvoicePayer instance for multiple invoices ? If yes how do we deal with collisioning invoices, i.e with the same payment_hash ? Not clear to me reading InvoicePayer high-level documentation.

We could key by payment_id. I think we added that after I wrote InvoicePayer but prior to merge. Can't recall if there was a reason to not use it.

Right, the idea is definitely that you'd have a single InvoicePayer for all your pending invoices. Still, it doesn't handle duplicate payment hashes explicitly - if you call it twice with the same invoice/payment_hash it will fail if the original one is still in-flight, but will go to pay if the original one is completed. I documented that the user is responsible for de-duplication.

ariard · 2021-10-30T00:17:43Z

lightning-invoice/src/payment.rs

+						Err(PaymentSendFailure::PathParameterError(e)) =>
+							return Err(PaymentError::Sending(PaymentSendFailure::PathParameterError(e))),
+						Err(PaymentSendFailure::AllFailedRetrySafe(e)) => {
+							if retry_count >= self.retry_attempts.0 {


IIUC, retry_attempts doesn't dissociate MPP and simple payment, it's the number of global retries, wherever the failure is located ? If I've a retry_attempts of value 3, and I send a MPP with 5 paths, I'm going to retry at most of 3 local path failure.

If so RetryAttempts documentation could be a bit clearer.

Yeah, it's spelled out a bit more in the module-level docs, but the docs on RetryAttempts could be similarly worded.

Yep, your understanding is correct, updated docs.

This should allow us to fix lightningdevkit/ldk-garbagecollected#52

This will allow `InvoicePayer` to properly retry payments that only partially failed to send.

Users can provide anything they want as `RouteParameters` so we shouldn't assume any fields are set any particular way, including `expiry_time` set at all.

jkczyz

Just a few comments but otherwise I think we can handle pay/retry refactors in a follow-up.

jkczyz · 2021-10-30T14:32:55Z

lightning-invoice/src/payment.rs

+		let htlc_msgs = nodes[0].node.get_and_clear_pending_msg_events();
+		assert_eq!(htlc_msgs.len(), 2);
+		check_added_monitors!(nodes[0], 2);
+		assert!(!*event_handled.borrow());


I think we want something like nodes[0].node.process_pending_events(&invoice_payer) here then. That way if ChannelManager does generate events, the test would fail if InvoicePayer delegated to the decorated event handler and wasn't supposed to.

jkczyz · 2021-10-30T14:33:57Z

lightning-invoice/src/payment.rs

+							break None;
+						},
+						Err(PaymentSendFailure::PartialFailure { results: _, failed_paths_retry, payment_id }) => {
+							if let Some(retry_data) = failed_paths_retry {
+								entry.insert(retry_count);
+								break Some((retry_data, payment_id));


Agreed about being able to reuse retry_handler, so might require a bit of re-work. Doesn't have to be done on this PR.

lightning-invoice/src/payment.rs

jkczyz

ACK 79b9ee8

This rewrites a good chunk of the retry logic in `InvoicePayer` to address two issues: * it was not considering the return value of `send_payment` (and `retry_payment`) may indicate a failure on some paths but not others, * it was not considering that more failures may still come later when removing elements from the retry count map. This could result in us seeing an MPP-partial-failure, failing to retry, removing the retries count entry, and then retrying other parts, potentially forever.

This tests the multi-part-single-failure-immediately fixes in the previous commit.

TheBlueMatt · 2021-10-31T17:45:32Z

Squashed fixups without other changes. Will land after CI.

$ git diff-tree -U1 79b9ee84 199d258b
$

TheBlueMatt added this to the 0.0.103 milestone Oct 26, 2021

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch 3 times, most recently from e17c968 to 04e3016 Compare October 27, 2021 17:02

TheBlueMatt mentioned this pull request Oct 27, 2021

Fix a minor memory leak on PermanentFailure mon errs when sending #1143

Merged

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from 04e3016 to 32d40c1 Compare October 27, 2021 22:23

valentinewallace self-requested a review October 28, 2021 16:16

valentinewallace reviewed Oct 28, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from 32d40c1 to ccab5c9 Compare October 28, 2021 17:22

valentinewallace reviewed Oct 28, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch 2 times, most recently from 072586e to 4b31eea Compare October 28, 2021 18:48

jkczyz self-requested a review October 28, 2021 20:08

valentinewallace approved these changes Oct 28, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from 4b31eea to f2ab405 Compare October 28, 2021 20:15

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from 47864f1 to c8e24b7 Compare October 29, 2021 20:21

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from c8e24b7 to 54b710f Compare October 29, 2021 20:25

jkczyz reviewed Oct 30, 2021

View reviewed changes

ariard reviewed Oct 30, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from 54b710f to 42b0cd7 Compare October 30, 2021 01:41

TheBlueMatt added 4 commits October 30, 2021 01:53

Move PaymentId to a [u8; 32] in bindings as for other hash objects

16f7428

This should allow us to fix lightningdevkit/ldk-garbagecollected#52

Provide payment retry data when an MPP payment failed partially

d73e434

This will allow `InvoicePayer` to properly retry payments that only partially failed to send.

Expand InvoicePayer documentation somewhat to clarify edge-cases

51dd344

Dont unwrap RouteParameter::expiry_time as users can set it

e9e8a7d

Users can provide anything they want as `RouteParameters` so we shouldn't assume any fields are set any particular way, including `expiry_time` set at all.

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch 2 times, most recently from a4e15f3 to c42ee6a Compare October 30, 2021 02:34

jkczyz reviewed Oct 30, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from c42ee6a to 79b9ee8 Compare October 30, 2021 23:30

jkczyz approved these changes Oct 31, 2021

View reviewed changes

valentinewallace approved these changes Oct 31, 2021

View reviewed changes

TheBlueMatt added 4 commits October 31, 2021 17:43

Add an integration test for InvoicePayer paying when one part fails

2b8c287

This tests the multi-part-single-failure-immediately fixes in the previous commit.

Add integration test for InvoicePayerretry on an immediate failure

9c5dccd

Check for invoice expiry in InvoicePayer before we send any HTLCs

199d258

TheBlueMatt force-pushed the 2021-10-payment-id-on-partial-failure branch from 79b9ee8 to 199d258 Compare October 31, 2021 17:43

TheBlueMatt merged commit 61341df into lightningdevkit:main Oct 31, 2021

		let payment_id = Some(invoice_payer.pay_invoice(&invoice).unwrap());
		std::thread::sleep(Duration::from_secs(2));

Rewrite InvoicePayer retry to correctly handle MPP partial failures #1141

Rewrite InvoicePayer retry to correctly handle MPP partial failures #1141

Uh oh!

Conversation

TheBlueMatt commented Oct 26, 2021

Uh oh!

codecov bot commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheBlueMatt commented Oct 27, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt commented Oct 29, 2021

Uh oh!

jkczyz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov bot commented Oct 26, 2021 •

edited

Loading

TheBlueMatt commented Oct 28, 2021 •

edited

Loading