Skip to content

bimodal pathfinding probability improvements #8330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bitromortac
Copy link
Collaborator

@bitromortac bitromortac commented Jan 2, 2024

Change Description

Fixes normalization bugs with respect to large channels and situations where the channel doesn't obey a bimodal distribution (fixes #8263 and #9085).

The probability calculation is avoided for amounts smaller than the last success amount in order to speed things up. This also fixes a case where mission control has outdated values (fixes #7553).

Copy link
Contributor

coderabbitai bot commented Nov 7, 2024

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Roasbeef Roasbeef modified the milestones: v0.19.0, v0.19 overflow Feb 5, 2025
@ziggie1984 ziggie1984 self-requested a review February 25, 2025 12:56
@saubyk saubyk modified the milestones: v0.19 overflow, v0.20.0 Mar 27, 2025
@saubyk saubyk added this to lnd v0.20 Mar 27, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Mar 27, 2025
@saubyk saubyk requested a review from yyforyongyu April 15, 2025 16:53
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Will do a second round to reload the bimodal to understand the implications. Meanwhile left some questions, also think this is missing a rebase (checked out locally and ignored files popped up).

@@ -40,6 +40,12 @@
a graceful shutdown of LND during the main chain backend sync check in certain
cases.

* [Bimodal pathfinding probability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this targeting 19 or 20?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to target 20, but I think it could be included earlier since it's basically a bug fix PR.

largeAmount = lnwire.MilliSatoshi(5_000_000)
capacity = lnwire.MilliSatoshi(10_000_000)
scale = lnwire.MilliSatoshi(400_000)
smallAmount = lnwire.MilliSatoshi(400_000_000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is the commit msg saying it doesn't matter what values to set here?

Copy link
Collaborator Author

@bitromortac bitromortac Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely different values, one should be able to multiply those values by any same number (here by 1000) and the tests should stay invariant, because that number cancels out if you look at primitive. I changed those values because for the default config of the bimodal scale we use 300000 sat and the 400000 sat is closer to that.

@@ -309,6 +309,20 @@ func TestIntegral(t *testing.T) {
}
}

// TestSpecialCase tests a value combination found by fuzz tests.
func TestSpecialCase(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is too generic and gives very little info about what's being tested here...maybe add a link to #9085?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, I changed it and added some description.

@bitromortac bitromortac force-pushed the 2401-bimodal-improvements branch from ff273aa to 8d96893 Compare April 16, 2025 15:06
@bitromortac bitromortac requested a review from yyforyongyu April 16, 2025 15:09
@yyforyongyu yyforyongyu requested a review from Copilot April 21, 2025 06:10
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the bimodal pathfinding probability calculations by fixing normalization bugs and optimizing performance when handling outdated mission control values. Key changes include:

  • Updated constants and tolerance handling in the test suite.
  • Revised the normalization formula in the primitive function and modified fallback logic in probabilityFormula.
  • Added new tests for small scale values and a fuzz-triggered edge case.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
routing/probability_bimodal_test.go Updated amount constants, removed per-test tolerance in favor of a default value, and added new tests.
routing/probability_bimodal.go Revised normalization in the primitive function and updated fallback logic when success exceeds fail.
docs/release-notes/release-notes-0.20.0.md Added release notes and contributor details related to the bimodal improvements.
Comments suppressed due to low confidence (2)

routing/probability_bimodal.go:516

  • [nitpick] Consider updating the log message to improve grammar (e.g., 'fail amount (%v) is smaller than or equal to the success amount (%v) for capacity (%v)').
log.Tracef("fail amount (%v) is smaller than or equal the " + "success amount (%v) for capacity (%v)", failAmountMsat, successAmountMsat, capacityMsat)

routing/probability_bimodal.go:450

  • The updated primitive function now includes an additional term 'x/(c*s)' in its numerator; please verify that the revised indefinite integral and normalization are mathematically correct and consistent with the intended bimodal distribution behavior.
return (-exs + excs + x/(c*s)) / norm


// We end up with the primitive function of the normalized P(x).
return (-exs + excs) / norm
return (-exs + excs + x/(c*s)) / norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting mix of bimodal and uniform distribution - can c here ever be 1? just thinking about edge cases here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still curious about the c, what happens if it is 1?

Copy link
Collaborator Author

@bitromortac bitromortac Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment got lost here. In practice that would mean that the combined distribution would be dominated by a uniform one. We could use a small constant as well here, but I wanted to avoid adding that. The 1/c term just represents a number that should be small with respect to the bimodal peaks and it cancels out in the norm calculation such that we only have 1/s left.

@bitromortac bitromortac force-pushed the 2401-bimodal-improvements branch from 8d96893 to 6c1a32b Compare April 22, 2025 14:36
@bitromortac bitromortac requested a review from yyforyongyu April 22, 2025 14:39
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the itest failures, the neutrino one is a known issue, the other one is weird and looks like it's related to the listunspent bug, could you rebase to the latest master to include the fix?

successAmount, capacity)

successAmount = capacity
// Mission control may have some outdated values with regard to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice doc💯

"%s to capacity %s", successAmountMsat,
failAmount, capacityMsat)

successAmount = capacity - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what's the reason for minus 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it's not that elegant, but it's to make both the success amount and fail amount together (where the fail amount must be one unit larger than the success amount) fall into the range [0, c]. In principle the fail amount could be allowed to be c+1, but I decided to always use c as the upper bound in the bimodal model, since it makes formulas and code nicer and we never expect to be able to send the full capacity due to the balance reserve anyhow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?

right, I think that's another way to say that there would be a logical contradiction to say that you can send the same amount as you fail to send


successAmount = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case should we keep the failAmount the same and only let it be changed under if failAmount > capacity {?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expresses that we don't know which of the fail/success amounts has the true info and overriding both expresses that we are unsure about both, which is why I prefer to have it in a symmetric way. An idea here is that we could only override the more outdated observation with the bounds (but I think that we should not encounter that line anyhow because of mission control enforcing the condition, so that would be over-engineered).


// We end up with the primitive function of the normalized P(x).
return (-exs + excs) / norm
return (-exs + excs + x/(c*s)) / norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still curious about the c, what happens if it is 1?

@bitromortac bitromortac force-pushed the 2401-bimodal-improvements branch from 6c1a32b to 8131b8a Compare April 23, 2025 07:43
@bitromortac bitromortac requested a review from yyforyongyu April 23, 2025 07:43
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM🌹 Since this is a bug fix, I wonder if it's possible to include it in 0.19? cc @saubyk

@lightninglabs-deploy
Copy link

@bitromortac, remember to re-request review from reviewers when ready

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice documentation of the problem. Had some minor comments, if they are resolved it's good to go.

// balance lies somewhere in the middle of the channel, a surprising result for
// the bimodal model, which predicts two distinct modes at the edges and
// therefore has numerical issues in the middle.
func TestBimodalFuzz9085(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LND has the possibility to add not fuzz tests into the concrete fuzz function:

// FuzzProbability checks that we don't encounter errors related to NaNs.
func FuzzProbability(f *testing.F) {
	// Predefined seed values resulting from
	// https://github.com/lightningnetwork/lnd/issues/9085.
	f.Add(
		uint64(1000000000),
		uint64(300000000),
		uint64(400000000),
		uint64(300000000),
	)

	estimator := BimodalEstimator{
		BimodalConfig: BimodalConfig{BimodalScaleMsat: scale},
	}

	f.Fuzz(func(t *testing.T, capacity, successAmt, failAmt, amt uint64) {
		if capacity == 0 {
			return
		}

		_, err := estimator.probabilityFormula(
			lnwire.MilliSatoshi(capacity),
			lnwire.MilliSatoshi(successAmt),
			lnwire.MilliSatoshi(failAmt), lnwire.MilliSatoshi(amt),
		)

		require.NoError(t, err, "c: %v s: %v f: %v a: %v", capacity,
			successAmt, failAmt, amt)
	})
}

or we add a new file in the lnd-fuzz directory. I think we should find a good strategy how we handle those cases resulting from fuzzing. cc @morehouse

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion, TIL 🎉. I have added the seed. We also don't lose coverage since the other test I introduced tests the same.

@@ -78,7 +81,6 @@ func TestSuccessProbability(t *testing.T) {
failAmount: capacity,
amount: smallAmount,
expectedProbability: 0.684,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: May I ask you how you came up with these values? Did you calculate the probability via other means and then just compared ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not too great I didn't calculate the number independently, but it serves more like a way to pin the behavior and to spot changes. The number itself is a sanity check however, since it's larger than the 0.5 in "no info, large amount". This is expected because for an amount that is similar to the scale you expect to find some leftovers, increasing the success probability.

_, err := estimator.probabilityFormula(
capacity, successAmount, failAmount, amtCloseSuccess,
)
require.ErrorContains(t, err, "normalization factor is zero")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that in only fails in the reNorm calculation not already when calculating the prob:

	prob := p.integral(capacity, amount, failAmount)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both calculations for prob and reNorm don't really "fail", but they return float64(0), not some non-zero small amount, because

	ecs := math.Exp(-c / s)
	exs := math.Exp(-x / s)

both give float64(0), because c/s and x/s being very large. So in that sense both calculations are almost correct, but due to the division by plain zero we fail.

require.NoError(t, err)
require.InDelta(t, 0.0, p, defaultTolerance)

// In the region where the bimodal model doesn't give good forecasts, we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: could you describe where the region is here where the bimodal is not good, I mean it is very dependant on the constants we use of the capacity and scale or ? I wonder why it scales linearly I thought it's more uniform now where the bimodal does fall of to sharply ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh ok given the integral the uniform distributions becomes linear, understand. But still interesting that a 25% into the channel-balance uncertainty we already have almost only the uniform part of the equation having the most effect. Probably needs some time to get an intuition for these values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it depends on the parameter combination, if you choose a huge s, every bimodal distribution will be a uniform one, since it doesn't matter how large the capacity is and this smears out the bimodal peaks.

If s is (very) small compared to c, the probability will drop quickly. If we then learn that the balance range is within the success/fail amounts range where the non-normalized probability is ~zero, we would still be able to retain the bimodal distribution within the newly learned range, but now starting at the success/fail amounts, if we had full numerical precision.

Because the numerical values of the exponential drop to zero if the scale is very small, at amounts >> s, we get the renormalization error. I think uncovering this error was nice, because I think keeping the bimodal distribution for the learned amounts isn't great and falling back to uniform distribution is better (since we it demonstrated that a bimodal model doesn't hold).

"%s to capacity %s", successAmountMsat,
failAmount, capacityMsat)

successAmount = capacity - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?

The bimodal model doesn't depend on the unit, which is why updating to
more realistic values doesn't require changes in tests.

We pin the scale in the fuzz test to not invalidate the corpus.
This test demonstrates an error found in a fuzz test by adding a
previously found seed, which will be fixed in an upcoming commit.

The following fuzz test is expected to fail:
go test -v -fuzz=Prob ./routing/
This test demonstrates that the current bimodal model leads to numerical
inaccuracies in a certain regime of successes and failures.
If the success and fail amounts indicate that a channel doesn't obey a
bimodal distribution, we fall back to a uniform/linear success
probability model. This also helps to avoid numerical normalization
issues with the bimodal model.

This is achieved by adding a very small summand to the balance
distribution P(x) ~ exp(-x/s) + exp((x-c)/s), 1/c that helps to
regularize the probability distribution. The distribution becomes finite
for intermediate balances where the exponentials would be evaluated to
an exact zero (float) otherwise. This regularization is effective in
edge cases and leads to falling back to a uniform model should the
bimodal model fail.

This affects the normalization to be s * (-2 * exp(-c/s) + 2 + 1/s) and
the primitive function to receive an extra term x/(cs).

The previously added fuzz seed is expected to be resolved with this.
We skip the evaluation of probabilities when the amount is lower than
the last success amount, as the probability would be evaluated to 1 in
that case.
If we encounter invalid mission control data, we fall back to no
knowledge about the node pair.
Mission control may have outdated success/failure amounts for node pairs
that have channels with differing capacities. In that case we assume to
still find the liquidity as before and rescale the amounts to the
according range.
@bitromortac bitromortac force-pushed the 2401-bimodal-improvements branch from 8131b8a to 328b28d Compare May 2, 2025 09:11
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bitromortac
Copy link
Collaborator Author

Thanks for the reviews @ziggie1984 and @yyforyongyu, I'm still performing some tests.

@bitromortac
Copy link
Collaborator Author

bitromortac commented May 7, 2025

I'm done with testing, this is ready. (Still should maybe decide if in 19.x or 20).

@saubyk saubyk removed this from lnd v0.20 May 8, 2025
@saubyk saubyk modified the milestones: v0.20.0, v0.19.0 May 8, 2025
@bitromortac bitromortac force-pushed the 2401-bimodal-improvements branch from 328b28d to 86249fb Compare May 8, 2025 16:32
@bitromortac
Copy link
Collaborator Author

Updated to address the 0.19 release notes.

@guggero guggero merged commit ee25c22 into lightningnetwork:master May 8, 2025
31 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants