Skip to content

Read ChannelManager even if we have no-peer post-update actions #3790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

In 93b4479 we fixed an issue which could cause a ChannelMonitorUpdate to get marked as blocked on itself, leading to an eventual force-closure.

One potential side-effect of that issue, however, is that any further ChannelMonitorUpdates to the same channel while it is blocked will not have any post-update actions processed (as there is a pending blocked ChannelMonitorUpdate sitting in the channel).

This can leave a dangling MonitorUpdateCompletionAction sitting around even after the channel is closed.

In 0.1, because ChannelMonitorUpdates to closed channels were finally fully tracked, we started enforcing that any post-update completion action we had on startup corresponded to a peer entry, while at the same time no longer creating peer entries just because we had serialized one in the data we were loading (only creating them if we had channel(s) or a ChannelMonitor).

This can cause some ChannelManager to no longer deserialize on 0.1 as we might have a left-over dangling
MonitorUpdateCompletionAction and will no longer always have a peer entry just because of it.

Here we fix this issue by specifically checking for dangling MonitorUpdateCompletionAction::PaymentClaim entries and dropping them if there is no corresponding channel or peer state entry. We only check for PaymentClaimed actions rather than allowing for any dangling actions as 93b4479 was only triggerable with MPP claims, so dangling
MonitorUpdateCompletionActions for forwarded payments should be exceedingly rare.

This also adds an upgrade test to test a slightly convoluted version of this scenario.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented May 22, 2025

I've assigned @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

let logger =
WithContext::from(&args.logger, Some(node_id), None, None);
log_error!(logger, "Got blocked actions {:?} without a per-peer-state for {}", actions, node_id);
return Err(DecodeError::InvalidValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all new code that adds a new error condition.

Here we fix this issue by specifically checking for dangling
MonitorUpdateCompletionAction::PaymentClaim entries and dropping
them if there is no corresponding channel or peer state entry.

How are they dropped now, compared to how it happens in main?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main in the no-peer-state case we always return an error, now we sometimes do not and ignore the completion actions. For the have-peer-state case, we used to store the completion actions even if we have no channel or monitor the completion actions are based on, but now we do not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

log_error!(WithContext::from(&args.logger, Some(node_id), None, None), "Got blocked actions without a per-peer-state for {}", node_id);
return Err(DecodeError::InvalidValue);
for actions in monitor_update_blocked_actions.values() {
for action in actions.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated code block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I wasn't quite sure it was worth DRYing it up any given its so short (and the comments are slightly different)

@TheBlueMatt TheBlueMatt force-pushed the 2025-05-0.1-upgrade-fail branch from 7105605 to bb2e29e Compare May 22, 2025 14:35
@TheBlueMatt
Copy link
Collaborator Author

Oops, the first set of checks were too eager, luckily test_reload_mon_update_completion_actions caught it -

$ git diff-tree -U1 710560554 bb2e29e97
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 84251ca33b..54e453d56b 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -14179,6 +14179,14 @@ where
 				let mut max_in_flight_update_id = 0;
+				let starting_len =  $chan_in_flight_upds.len();
 				$chan_in_flight_upds.retain(|upd| upd.update_id > $monitor.get_latest_update_id());
+				if $chan_in_flight_upds.len() < starting_len {
+					log_debug!(
+						$logger,
+						"{} ChannelMonitorUpdates completed after ChannelManager was last serialized",
+						starting_len - $chan_in_flight_upds.len()
+					);
+				}
 				let funding_txo = $monitor.get_funding_txo();
 				for update in $chan_in_flight_upds.iter() {
-					log_trace!($logger, "Replaying ChannelMonitorUpdate {} for {}channel {}",
+					log_debug!($logger, "Replaying ChannelMonitorUpdate {} for {}channel {}",
 						update.update_id, $channel_info_log, &$monitor.channel_id());
@@ -14746,24 +14754,7 @@ where
 					}
-					let peer_state = peer_state.lock().unwrap();
-					let channel_id_matches = |updates: &(_, Vec<ChannelMonitorUpdate>)| {
-						updates.1.iter().any(|update| update.channel_id == Some(*channel_id))
-					};
-					if !peer_state.in_flight_monitor_updates.values().any(channel_id_matches) {
-						// If there are no pending `ChannelMonitorUpdate`s for this channel but we
-						// have pending post-update actions, its possible that one was left over
-						// from pre-0.1 payment claims where MPP claims led to a channel blocked on
-						// itself and later `ChannelMonitorUpdate`s didn't get their post-update
-						// actions run.
-						// This should only have happened for `PaymentClaimed` post-update actions,
-						// which we check here.
-						for action in actions.iter() {
-							if let MonitorUpdateCompletionAction::PaymentClaimed { .. } = action {
-							} else {
-								let logger =
-									WithContext::from(&args.logger, Some(node_id), None, None);
-								log_error!(logger, "Got blocked actions {:?} without a per-peer-state for {}", actions, node_id);
-								return Err(DecodeError::InvalidValue);
-							}
-						}
-					}
+					// Note that we may have a post-update action for a channel that has no pending
+					// `ChannelMonitorUpdate`s, but unlike the no-peer-state case, it may simply be
+					// because we had a `ChannelMonitorUpdate` complete after the last time this
+					// `ChannelManager` was serialized. In that case, we'll run the post-update
+					// actions as soon as we get going.
 				}

Copy link

codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.

Project coverage is 89.78%. Comparing base (02b5564) to head (bb2e29e).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 0.00% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3790      +/-   ##
==========================================
- Coverage   89.80%   89.78%   -0.03%     
==========================================
  Files         159      159              
  Lines      128691   128703      +12     
  Branches   128691   128703      +12     
==========================================
- Hits       115573   115552      -21     
- Misses      10446    10472      +26     
- Partials     2672     2679       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ldk-reviews-bot
Copy link

✅ Added second reviewer: @valentinewallace

@TheBlueMatt TheBlueMatt force-pushed the 2025-05-0.1-upgrade-fail branch from bb2e29e to 5ca1019 Compare May 22, 2025 16:04
@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented May 22, 2025

rustfmt'd (and used match instead of if let)

In 93b4479 we fixed an issue which
could cause a `ChannelMonitorUpdate` to get marked as blocked on
itself, leading to an eventual force-closure.

One potential side-effect of that issue, however, is that any
further `ChannelMonitorUpdate`s to the same channel while it is
blocked will not have any post-update actions processed (as there
is a pending blocked `ChannelMonitorUpdate` sitting in the
channel).

This can leave a dangling `MonitorUpdateCompletionAction` sitting
around even after the channel is closed.

In 0.1, because `ChannelMonitorUpdate`s to closed channels were
finally fully tracked, we started enforcing that any post-update
completion action we had on startup corresponded to a peer entry,
while at the same time no longer creating peer entries just because
we had serialized one in the data we were loading (only creating
them if we had channel(s) or a `ChannelMonitor`).

This can cause some `ChannelManager` to no longer deserialize on
0.1 as we might have a left-over dangling
`MonitorUpdateCompletionAction` and will no longer always have a
peer entry just because of it.

Here we fix this issue by specifically checking for dangling
`MonitorUpdateCompletionAction::PaymentClaim` entries and dropping
them if there is no corresponding channel or peer state entry. We
only check for `PaymentClaimed` actions rather than allowing for
any dangling actions as 93b4479
was only triggerable with MPP claims, so dangling
`MonitorUpdateCompletionAction`s for forwarded payments should be
exceedingly rare.

This also adds an upgrade test to test a slightly convoluted
version of this scenario.
@TheBlueMatt TheBlueMatt force-pushed the 2025-05-0.1-upgrade-fail branch from 5ca1019 to 70b5552 Compare May 22, 2025 16:05
return Err(DecodeError::InvalidValue);
for actions in monitor_update_blocked_actions.values() {
for action in actions.iter() {
if matches!(action, MonitorUpdateCompletionAction::PaymentClaimed { .. }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I meant !matches! to avoid the else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I liked the comment in the block that's ignoring the PaymentClaimeds. I mean I can move it outside if you prefer but I often like comments in empty if blocks 🤷‍♂️

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM


// Finally, reload the node in the latest LDK. This previously failed.
let config = test_default_channel_config();
reload_node!(nodes[3], config, &node_d_ser, &[&mon_ser], persister, chain_mon, node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test LGTM, just took a slightly closer look. It also fails as expected on main.

@TheBlueMatt TheBlueMatt merged commit 63a5e03 into lightningdevkit:main May 23, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants