-
Notifications
You must be signed in to change notification settings - Fork 2.2k
bimodal pathfinding probability improvements #8330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bimodal pathfinding probability improvements #8330
Conversation
320291c
to
ff273aa
Compare
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Will do a second round to reload the bimodal to understand the implications. Meanwhile left some questions, also think this is missing a rebase (checked out locally and ignored files popped up).
@@ -40,6 +40,12 @@ | |||
a graceful shutdown of LND during the main chain backend sync check in certain | |||
cases. | |||
|
|||
* [Bimodal pathfinding probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this targeting 19 or 20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to target 20, but I think it could be included earlier since it's basically a bug fix PR.
largeAmount = lnwire.MilliSatoshi(5_000_000) | ||
capacity = lnwire.MilliSatoshi(10_000_000) | ||
scale = lnwire.MilliSatoshi(400_000) | ||
smallAmount = lnwire.MilliSatoshi(400_000_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: is the commit msg saying it doesn't matter what values to set here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely different values, one should be able to multiply those values by any same number (here by 1000) and the tests should stay invariant, because that number cancels out if you look at primitive
. I changed those values because for the default config of the bimodal scale we use 300000 sat and the 400000 sat is closer to that.
routing/probability_bimodal_test.go
Outdated
@@ -309,6 +309,20 @@ func TestIntegral(t *testing.T) { | |||
} | |||
} | |||
|
|||
// TestSpecialCase tests a value combination found by fuzz tests. | |||
func TestSpecialCase(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is too generic and gives very little info about what's being tested here...maybe add a link to #9085?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, I changed it and added some description.
ff273aa
to
8d96893
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the bimodal pathfinding probability calculations by fixing normalization bugs and optimizing performance when handling outdated mission control values. Key changes include:
- Updated constants and tolerance handling in the test suite.
- Revised the normalization formula in the primitive function and modified fallback logic in probabilityFormula.
- Added new tests for small scale values and a fuzz-triggered edge case.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
routing/probability_bimodal_test.go | Updated amount constants, removed per-test tolerance in favor of a default value, and added new tests. |
routing/probability_bimodal.go | Revised normalization in the primitive function and updated fallback logic when success exceeds fail. |
docs/release-notes/release-notes-0.20.0.md | Added release notes and contributor details related to the bimodal improvements. |
Comments suppressed due to low confidence (2)
routing/probability_bimodal.go:516
- [nitpick] Consider updating the log message to improve grammar (e.g., 'fail amount (%v) is smaller than or equal to the success amount (%v) for capacity (%v)').
log.Tracef("fail amount (%v) is smaller than or equal the " + "success amount (%v) for capacity (%v)", failAmountMsat, successAmountMsat, capacityMsat)
routing/probability_bimodal.go:450
- The updated primitive function now includes an additional term 'x/(c*s)' in its numerator; please verify that the revised indefinite integral and normalization are mathematically correct and consistent with the intended bimodal distribution behavior.
return (-exs + excs + x/(c*s)) / norm
|
||
// We end up with the primitive function of the normalized P(x). | ||
return (-exs + excs) / norm | ||
return (-exs + excs + x/(c*s)) / norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very interesting mix of bimodal and uniform distribution - can c
here ever be 1? just thinking about edge cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still curious about the c
, what happens if it is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment got lost here. In practice that would mean that the combined distribution would be dominated by a uniform one. We could use a small constant as well here, but I wanted to avoid adding that. The 1/c term just represents a number that should be small with respect to the bimodal peaks and it cancels out in the norm calculation such that we only have 1/s left.
8d96893
to
6c1a32b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the itest failures, the neutrino one is a known issue, the other one is weird and looks like it's related to the listunspent
bug, could you rebase to the latest master to include the fix?
successAmount, capacity) | ||
|
||
successAmount = capacity | ||
// Mission control may have some outdated values with regard to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice doc💯
"%s to capacity %s", successAmountMsat, | ||
failAmount, capacityMsat) | ||
|
||
successAmount = capacity - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm what's the reason for minus 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it's not that elegant, but it's to make both the success amount and fail amount together (where the fail amount must be one unit larger than the success amount) fall into the range [0, c]. In principle the fail amount could be allowed to be c+1, but I decided to always use c as the upper bound in the bimodal model, since it makes formulas and code nicer and we never expect to be able to send the full capacity due to the balance reserve anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?
right, I think that's another way to say that there would be a logical contradiction to say that you can send the same amount as you fail to send
|
||
successAmount = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case should we keep the failAmount
the same and only let it be changed under if failAmount > capacity {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expresses that we don't know which of the fail/success amounts has the true info and overriding both expresses that we are unsure about both, which is why I prefer to have it in a symmetric way. An idea here is that we could only override the more outdated observation with the bounds (but I think that we should not encounter that line anyhow because of mission control enforcing the condition, so that would be over-engineered).
|
||
// We end up with the primitive function of the normalized P(x). | ||
return (-exs + excs) / norm | ||
return (-exs + excs + x/(c*s)) / norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still curious about the c
, what happens if it is 1?
6c1a32b
to
8131b8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM🌹 Since this is a bug fix, I wonder if it's possible to include it in 0.19? cc @saubyk
@bitromortac, remember to re-request review from reviewers when ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice documentation of the problem. Had some minor comments, if they are resolved it's good to go.
routing/probability_bimodal_test.go
Outdated
// balance lies somewhere in the middle of the channel, a surprising result for | ||
// the bimodal model, which predicts two distinct modes at the edges and | ||
// therefore has numerical issues in the middle. | ||
func TestBimodalFuzz9085(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LND has the possibility to add not fuzz tests into the concrete fuzz function:
// FuzzProbability checks that we don't encounter errors related to NaNs.
func FuzzProbability(f *testing.F) {
// Predefined seed values resulting from
// https://github.com/lightningnetwork/lnd/issues/9085.
f.Add(
uint64(1000000000),
uint64(300000000),
uint64(400000000),
uint64(300000000),
)
estimator := BimodalEstimator{
BimodalConfig: BimodalConfig{BimodalScaleMsat: scale},
}
f.Fuzz(func(t *testing.T, capacity, successAmt, failAmt, amt uint64) {
if capacity == 0 {
return
}
_, err := estimator.probabilityFormula(
lnwire.MilliSatoshi(capacity),
lnwire.MilliSatoshi(successAmt),
lnwire.MilliSatoshi(failAmt), lnwire.MilliSatoshi(amt),
)
require.NoError(t, err, "c: %v s: %v f: %v a: %v", capacity,
successAmt, failAmt, amt)
})
}
or we add a new file in the lnd-fuzz directory. I think we should find a good strategy how we handle those cases resulting from fuzzing. cc @morehouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion, TIL 🎉. I have added the seed. We also don't lose coverage since the other test I introduced tests the same.
@@ -78,7 +81,6 @@ func TestSuccessProbability(t *testing.T) { | |||
failAmount: capacity, | |||
amount: smallAmount, | |||
expectedProbability: 0.684, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: May I ask you how you came up with these values? Did you calculate the probability via other means and then just compared ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not too great I didn't calculate the number independently, but it serves more like a way to pin the behavior and to spot changes. The number itself is a sanity check however, since it's larger than the 0.5 in "no info, large amount". This is expected because for an amount that is similar to the scale you expect to find some leftovers, increasing the success probability.
routing/probability_bimodal_test.go
Outdated
_, err := estimator.probabilityFormula( | ||
capacity, successAmount, failAmount, amtCloseSuccess, | ||
) | ||
require.ErrorContains(t, err, "normalization factor is zero") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that in only fails in the reNorm calculation not already when calculating the prob:
prob := p.integral(capacity, amount, failAmount)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both calculations for prob
and reNorm
don't really "fail", but they return float64(0), not some non-zero small amount, because
ecs := math.Exp(-c / s)
exs := math.Exp(-x / s)
both give float64(0)
, because c/s
and x/s
being very large. So in that sense both calculations are almost correct, but due to the division by plain zero we fail.
require.NoError(t, err) | ||
require.InDelta(t, 0.0, p, defaultTolerance) | ||
|
||
// In the region where the bimodal model doesn't give good forecasts, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: could you describe where the region is here where the bimodal is not good, I mean it is very dependant on the constants we use of the capacity and scale or ? I wonder why it scales linearly I thought it's more uniform now where the bimodal does fall of to sharply ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh ok given the integral the uniform distributions becomes linear, understand. But still interesting that a 25% into the channel-balance uncertainty we already have almost only the uniform part of the equation having the most effect. Probably needs some time to get an intuition for these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it depends on the parameter combination, if you choose a huge s
, every bimodal distribution will be a uniform one, since it doesn't matter how large the capacity is and this smears out the bimodal peaks.
If s
is (very) small compared to c
, the probability will drop quickly. If we then learn that the balance range is within the success/fail amounts range where the non-normalized probability is ~zero, we would still be able to retain the bimodal distribution within the newly learned range, but now starting at the success/fail amounts, if we had full numerical precision.
Because the numerical values of the exponential drop to zero if the scale is very small, at amounts >> s
, we get the renormalization error. I think uncovering this error was nice, because I think keeping the bimodal distribution for the learned amounts isn't great and falling back to uniform distribution is better (since we it demonstrated that a bimodal model doesn't hold).
"%s to capacity %s", successAmountMsat, | ||
failAmount, capacityMsat) | ||
|
||
successAmount = capacity - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?
The bimodal model doesn't depend on the unit, which is why updating to more realistic values doesn't require changes in tests. We pin the scale in the fuzz test to not invalidate the corpus.
This test demonstrates an error found in a fuzz test by adding a previously found seed, which will be fixed in an upcoming commit. The following fuzz test is expected to fail: go test -v -fuzz=Prob ./routing/
This test demonstrates that the current bimodal model leads to numerical inaccuracies in a certain regime of successes and failures.
If the success and fail amounts indicate that a channel doesn't obey a bimodal distribution, we fall back to a uniform/linear success probability model. This also helps to avoid numerical normalization issues with the bimodal model. This is achieved by adding a very small summand to the balance distribution P(x) ~ exp(-x/s) + exp((x-c)/s), 1/c that helps to regularize the probability distribution. The distribution becomes finite for intermediate balances where the exponentials would be evaluated to an exact zero (float) otherwise. This regularization is effective in edge cases and leads to falling back to a uniform model should the bimodal model fail. This affects the normalization to be s * (-2 * exp(-c/s) + 2 + 1/s) and the primitive function to receive an extra term x/(cs). The previously added fuzz seed is expected to be resolved with this.
We skip the evaluation of probabilities when the amount is lower than the last success amount, as the probability would be evaluated to 1 in that case.
If we encounter invalid mission control data, we fall back to no knowledge about the node pair.
Mission control may have outdated success/failure amounts for node pairs that have channels with differing capacities. In that case we assume to still find the liquidity as before and rescale the amounts to the according range.
8131b8a
to
328b28d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the reviews @ziggie1984 and @yyforyongyu, I'm still performing some tests. |
I'm done with testing, this is ready. (Still should maybe decide if in 19.x or 20). |
328b28d
to
86249fb
Compare
Updated to address the 0.19 release notes. |
Change Description
Fixes normalization bugs with respect to large channels and situations where the channel doesn't obey a bimodal distribution (fixes #8263 and #9085).
The probability calculation is avoided for amounts smaller than the last success amount in order to speed things up. This also fixes a case where mission control has outdated values (fixes #7553).