Skip to content

Don't bump the next_node_counter when using a removed counter #3367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

If we manage to pull a node_counter from removed_node_counters for reuse, add_channel_between_nodes would unwrap_or with the next_node_counter-incremented value. This visually looks right, except unwrap_or is always called, causing us to always increment next_node_counter even if we don't use it.

This will result in the node_counters always growing any time we add a new node to our graph, leading to somewhat larger memory usage when routing and a debug assertion failure in test_node_counter_consistency.

The fix is trivial, this is what unwrap_or_else is for.

This was included in the 0.0.125 release.

@TheBlueMatt TheBlueMatt added this to the 0.1 milestone Oct 14, 2024
Copy link

codecov bot commented Oct 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.64%. Comparing base (46d8a0d) to head (0c0cb6f).
Report is 73 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3367      +/-   ##
==========================================
+ Coverage   89.61%   89.64%   +0.03%     
==========================================
  Files         127      127              
  Lines      103533   104218     +685     
  Branches   103533   104218     +685     
==========================================
+ Hits        92778    93426     +648     
- Misses       8056     8107      +51     
+ Partials     2699     2685      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

arik-so
arik-so previously approved these changes Oct 16, 2024
Copy link
Contributor

@arik-so arik-so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, great find!

If we manage to pull a `node_counter` from `removed_node_counters`
for reuse, `add_channel_between_nodes` would `unwrap_or` with the
`next_node_counter`-incremented value. This visually looks right,
except `unwrap_or` is always called, causing us to always increment
`next_node_counter` even if we don't use it.

This will result in the `node_counter`s always growing any time we
add a new node to our graph, leading to somewhat larger memory
usage when routing and a debug assertion failure in
`test_node_counter_consistency`.

The fix is trivial, this is what `unwrap_or_else` is for.
@TheBlueMatt
Copy link
Collaborator Author

Grr, rustfmt was mad.

$ git diff-tree -U1 8913192b6 0c0cb6fcc
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index df9f9813c..9d85a6c58 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -2079,7 +2079,5 @@ where
 					let mut removed_node_counters = self.removed_node_counters.lock().unwrap();
-					**chan_info_node_counter = removed_node_counters
-						.pop()
-						.unwrap_or_else(|| {
-							self.next_node_counter.fetch_add(1, Ordering::Relaxed) as u32
-						});
+					**chan_info_node_counter = removed_node_counters.pop().unwrap_or_else(|| {
+						self.next_node_counter.fetch_add(1, Ordering::Relaxed) as u32
+					});
 					node_entry.insert(NodeInfo {

**chan_info_node_counter = removed_node_counters
.pop()
.unwrap_or(self.next_node_counter.fetch_add(1, Ordering::Relaxed) as u32);
**chan_info_node_counter = removed_node_counters.pop().unwrap_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a call to test_node_counter_consistency to the end of this method? From local testing it looks like this would've caught the bug. Not sure if other methods could use it as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't appear to have any test coverage which removes then adds a channel, so even with a new check in this method none of our tests fail. Ultimately we end up panicing immediately when the next message comes in, though, so running a real node with debug assertions on hit this pretty quick :)

Copy link
Contributor

@valentinewallace valentinewallace Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I get two test failures with this diff (i.e. reverting the fix and adding the check):

diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 9d85a6c58..f761e6a63 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -2077,9 +2077,9 @@ where
                                },
                                IndexedMapEntry::Vacant(node_entry) => {
                                        let mut removed_node_counters = self.removed_node_counters.lock().unwrap();
-                                       **chan_info_node_counter = removed_node_counters.pop().unwrap_or_else(|| {
+                                       **chan_info_node_counter = removed_node_counters.pop().unwrap_or(
                                                self.next_node_counter.fetch_add(1, Ordering::Relaxed) as u32
-                                       });
+                                       );
                                        node_entry.insert(NodeInfo {
                                                channels: vec![short_channel_id],
                                                announcement_info: None,
@@ -2088,6 +2088,9 @@ where
                                },
                        };
                }
+               core::mem::drop(channels);
+               core::mem::drop(nodes);
+               self.test_node_counter_consistency();

                Ok(())
        }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I'm an idiot, I ran the test without reverting the diff. I'll add this in a followup, anyway.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion but I'm happy to land this

@TheBlueMatt TheBlueMatt merged commit 299b7bd into lightningdevkit:main Oct 29, 2024
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants