Skip to content

bimodal pathfinding probability improvements #8330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/release-notes/release-notes-0.19.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,12 @@ when running LND with an aux component injected (custom channels).
address is added to LND using the `ImportTapscript` RPC, LND previously failed
to perform a cooperative close to that address.

* [Bimodal pathfinding probability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this targeting 19 or 20?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to target 20, but I think it could be included earlier since it's basically a bug fix PR.

improvements](https://github.com/lightningnetwork/lnd/pull/8330). A fallback
probability is used if the bimodal model is not applicable. Fixes are added
such that the probability is evaluated quicker and to be more accurate in
outdated scenarios.

# New Features

* Add support for [archiving channel backup](https://github.com/lightningnetwork/lnd/pull/9232)
Expand Down
95 changes: 58 additions & 37 deletions routing/probability_bimodal.go
Original file line number Diff line number Diff line change
Expand Up @@ -416,34 +416,38 @@ func cannotSend(failAmount, capacity lnwire.MilliSatoshi, now,

// primitive computes the indefinite integral of our assumed (normalized)
// liquidity probability distribution. The distribution of liquidity x here is
// the function P(x) ~ exp(-x/s) + exp((x-c)/s), i.e., two exponentials residing
// at the ends of channels. This means that we expect liquidity to be at either
// side of the channel with capacity c. The s parameter (scale) defines how far
// the liquidity leaks into the channel. A very low scale assumes completely
// unbalanced channels, a very high scale assumes a random distribution. More
// details can be found in
// the function P(x) ~ exp(-x/s) + exp((x-c)/s) + 1/c, i.e., two exponentials
// residing at the ends of channels. This means that we expect liquidity to be
// at either side of the channel with capacity c. The s parameter (scale)
// defines how far the liquidity leaks into the channel. A very low scale
// assumes completely unbalanced channels, a very high scale assumes a random
// distribution. More details can be found in
// https://github.com/lightningnetwork/lnd/issues/5988#issuecomment-1131234858.
// Additionally, we add a constant term 1/c to the distribution to avoid
// normalization issues and to fall back to a uniform distribution should the
// previous success and fail amounts contradict a bimodal distribution.
func (p *BimodalEstimator) primitive(c, x float64) float64 {
s := float64(p.BimodalScaleMsat)

// The indefinite integral of P(x) is given by
// Int P(x) dx = H(x) = s * (-e(-x/s) + e((x-c)/s)),
// Int P(x) dx = H(x) = s * (-e(-x/s) + e((x-c)/s) + x/(c*s)),
// and its norm from 0 to c can be computed from it,
// norm = [H(x)]_0^c = s * (-e(-c/s) + 1 -(1 + e(-c/s))).
// norm = [H(x)]_0^c = s * (-e(-c/s) + 1 + 1/s -(-1 + e(-c/s))) =
// = s * (-2*e(-c/s) + 2 + 1/s).
// The prefactors s are left out, as they cancel out in the end.
// norm can only become zero, if c is zero, which we sorted out before
// calling this method.
ecs := math.Exp(-c / s)
exs := math.Exp(-x / s)
norm := -2*ecs + 2 + 1/s

// It would be possible to split the next term and reuse the factors
// from before, but this can lead to numerical issues with large
// numbers.
excs := math.Exp((x - c) / s)

// norm can only become zero, if c is zero, which we sorted out before
// calling this method.
norm := -2*ecs + 2
exs := math.Exp(-x / s)

// We end up with the primitive function of the normalized P(x).
return (-exs + excs) / norm
return (-exs + excs + x/(c*s)) / norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting mix of bimodal and uniform distribution - can c here ever be 1? just thinking about edge cases here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still curious about the c, what happens if it is 1?

Copy link
Collaborator Author

@bitromortac bitromortac Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment got lost here. In practice that would mean that the combined distribution would be dominated by a uniform one. We could use a small constant as well here, but I wanted to avoid adding that. The 1/c term just represents a number that should be small with respect to the bimodal peaks and it cancels out in the norm calculation such that we only have 1/s left.

}

// integral computes the integral of our liquidity distribution from the lower
Expand Down Expand Up @@ -484,43 +488,60 @@ func (p *BimodalEstimator) probabilityFormula(capacityMsat, successAmountMsat,
return 0.0, nil
}

// Mission control may have some outdated values, we correct them here.
// TODO(bitromortac): there may be better decisions to make in these
// cases, e.g., resetting failAmount=cap and successAmount=0.

// failAmount should be capacity at max.
if failAmount > capacity {
log.Debugf("Correcting failAmount %v to capacity %v",
failAmount, capacity)
// The next statement is a safety check against an illogical condition.
// We discard the knowledge for the channel in that case since we have
// inconsistent data.
if failAmount <= successAmount {
log.Warnf("Fail amount (%s) is smaller than or equal to the "+
"success amount (%s) for capacity (%s)",
failAmountMsat, successAmountMsat, capacityMsat)

successAmount = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case should we keep the failAmount the same and only let it be changed under if failAmount > capacity {?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expresses that we don't know which of the fail/success amounts has the true info and overriding both expresses that we are unsure about both, which is why I prefer to have it in a symmetric way. An idea here is that we could only override the more outdated observation with the bounds (but I think that we should not encounter that line anyhow because of mission control enforcing the condition, so that would be over-engineered).

failAmount = capacity
}

// successAmount should be capacity at max.
if successAmount > capacity {
log.Debugf("Correcting successAmount %v to capacity %v",
successAmount, capacity)

successAmount = capacity
// Mission control may have some outdated values with regard to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice doc💯

// current channel capacity between a node pair. This can happen in case
// a large parallel channel was closed or if a channel was downscaled
// and can lead to success and/or failure amounts to be out of the range
// [0, capacity]. We assume that the liquidity situation of the channel
// is similar as before due to flow bias.

// In case we have a large success we need to correct it to be in the
// valid range. We set the success amount close to the capacity, because
// we assume to still be able to send. Any possible failure (that must
// in this case be larger than the capacity) is corrected as well.
if successAmount >= capacity {
log.Debugf("Correcting success amount %s and failure amount "+
"%s to capacity %s", successAmountMsat,
failAmount, capacityMsat)

// We choose the success amount to be one less than the
// capacity, to both fit success and failure amounts into the
// capacity range in a consistent manner.
successAmount = capacity - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what's the reason for minus 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it's not that elegant, but it's to make both the success amount and fail amount together (where the fail amount must be one unit larger than the success amount) fall into the range [0, c]. In principle the fail amount could be allowed to be c+1, but I decided to always use c as the upper bound in the bimodal model, since it makes formulas and code nicer and we never expect to be able to send the full capacity due to the balance reserve anyhow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the successAmt needs to be -1 smaller for the reNorm calculation to not be 0 basically ?

right, I think that's another way to say that there would be a logical contradiction to say that you can send the same amount as you fail to send

failAmount = capacity
}

// The next statement is a safety check against an illogical condition,
// otherwise the renormalization integral would become zero. This may
// happen if a large channel gets closed and smaller ones remain, but
// it should recover with the time decay.
if failAmount <= successAmount {
log.Tracef("fail amount (%v) is smaller than or equal the "+
"success amount (%v) for capacity (%v)",
failAmountMsat, successAmountMsat, capacityMsat)
// Having no or only a small success, but a large failure only needs
// adjustment of the failure amount.
if failAmount > capacity {
log.Debugf("Correcting failure amount %s to capacity %s",
failAmountMsat, capacityMsat)

return 0.0, nil
failAmount = capacity
}

// We cannot send more than the fail amount.
if amount >= failAmount {
return 0.0, nil
}

// We can send the amount if it is smaller than the success amount.
if amount <= successAmount {
return 1.0, nil
}

// The success probability for payment amount a is the integral over the
// prior distribution P(x), the probability to find liquidity between
// the amount a and channel capacity c (or failAmount a_f):
Expand Down
123 changes: 106 additions & 17 deletions routing/probability_bimodal_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,14 @@ import (
)

const (
smallAmount = lnwire.MilliSatoshi(400_000)
largeAmount = lnwire.MilliSatoshi(5_000_000)
capacity = lnwire.MilliSatoshi(10_000_000)
scale = lnwire.MilliSatoshi(400_000)
smallAmount = lnwire.MilliSatoshi(400_000_000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is the commit msg saying it doesn't matter what values to set here?

Copy link
Collaborator Author

@bitromortac bitromortac Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely different values, one should be able to multiply those values by any same number (here by 1000) and the tests should stay invariant, because that number cancels out if you look at primitive. I changed those values because for the default config of the bimodal scale we use 300000 sat and the 400000 sat is closer to that.

largeAmount = lnwire.MilliSatoshi(5_000_000_000)
capacity = lnwire.MilliSatoshi(10_000_000_000)
scale = lnwire.MilliSatoshi(400_000_000)

// defaultTolerance is the default absolute tolerance for comparing
// probability calculations to expected values.
defaultTolerance = 0.001
)

// TestSuccessProbability tests that we get correct probability estimates for
Expand All @@ -25,7 +29,6 @@ func TestSuccessProbability(t *testing.T) {
tests := []struct {
name string
expectedProbability float64
tolerance float64
successAmount lnwire.MilliSatoshi
failAmount lnwire.MilliSatoshi
amount lnwire.MilliSatoshi
Expand Down Expand Up @@ -78,7 +81,6 @@ func TestSuccessProbability(t *testing.T) {
failAmount: capacity,
amount: smallAmount,
expectedProbability: 0.684,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: May I ask you how you came up with these values? Did you calculate the probability via other means and then just compared ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not too great I didn't calculate the number independently, but it serves more like a way to pin the behavior and to spot changes. The number itself is a sanity check however, since it's larger than the 0.5 in "no info, large amount". This is expected because for an amount that is similar to the scale you expect to find some leftovers, increasing the success probability.

tolerance: 0.001,
},
// If we had an unsettled success, we are sure we can send a
// lower amount.
Expand Down Expand Up @@ -110,7 +112,6 @@ func TestSuccessProbability(t *testing.T) {
failAmount: capacity,
amount: smallAmount,
expectedProbability: 0.851,
tolerance: 0.001,
},
// If we had a large unsettled success before, we know we can
// send even larger payments with high probability.
Expand All @@ -122,7 +123,6 @@ func TestSuccessProbability(t *testing.T) {
failAmount: capacity,
amount: largeAmount,
expectedProbability: 0.998,
tolerance: 0.001,
},
// If we had a failure before, we can't send with the fail
// amount.
Expand Down Expand Up @@ -151,7 +151,6 @@ func TestSuccessProbability(t *testing.T) {
failAmount: largeAmount,
amount: smallAmount,
expectedProbability: 0.368,
tolerance: 0.001,
},
// From here on we deal with mixed previous successes and
// failures.
Expand Down Expand Up @@ -183,7 +182,6 @@ func TestSuccessProbability(t *testing.T) {
successAmount: smallAmount,
amount: smallAmount + largeAmount/10,
expectedProbability: 0.287,
tolerance: 0.001,
},
// We still can't send the fail amount.
{
Expand All @@ -194,22 +192,45 @@ func TestSuccessProbability(t *testing.T) {
amount: largeAmount,
expectedProbability: 0.0,
},
// Same success and failure amounts (illogical).
// Same success and failure amounts (illogical), which gets
// reset to no knowledge.
{
name: "previous f/s, same",
capacity: capacity,
failAmount: largeAmount,
successAmount: largeAmount,
amount: largeAmount,
expectedProbability: 0.0,
expectedProbability: 0.5,
},
// Higher success than failure amount (illogical).
// Higher success than failure amount (illogical), which gets
// reset to no knowledge.
{
name: "previous f/s, higher success",
name: "previous f/s, illogical",
capacity: capacity,
failAmount: smallAmount,
successAmount: largeAmount,
expectedProbability: 0.0,
amount: largeAmount,
expectedProbability: 0.5,
},
// Larger success and larger failure than the old capacity are
// rescaled to still give a very high success rate.
{
name: "smaller cap, large success/fail",
capacity: capacity,
failAmount: 2*capacity + 1,
successAmount: 2 * capacity,
amount: largeAmount,
expectedProbability: 1.0,
},
// A lower success amount is not rescaled.
{
name: "smaller cap, large fail",
capacity: capacity,
successAmount: smallAmount / 2,
failAmount: 2 * capacity,
amount: smallAmount,
// See "previous success, larger amount".
expectedProbability: 0.851,
},
}

Expand All @@ -228,7 +249,7 @@ func TestSuccessProbability(t *testing.T) {
test.failAmount, test.amount,
)
require.InDelta(t, test.expectedProbability, p,
test.tolerance)
defaultTolerance)
require.NoError(t, err)
})
}
Expand All @@ -244,6 +265,59 @@ func TestSuccessProbability(t *testing.T) {
})
}

// TestSmallScale tests that the probability formula works with small scale
// values.
func TestSmallScale(t *testing.T) {
var (
// We use the smallest possible scale value together with a
// large capacity. This is an extreme form of a bimodal
// distribution.
scale lnwire.MilliSatoshi = 1
capacity lnwire.MilliSatoshi = 7e+09

// Success and failure amounts are chosen such that the expected
// balance must be somewhere in the middle of the channel, a
// value not expected when dealing with a bimodal distribution.
// In this case, the bimodal model fails to give good forecasts
// due to the numerics of the exponential functions, which get
// evaluated to exact zero floats.
successAmount lnwire.MilliSatoshi = 1.0e+09
failAmount lnwire.MilliSatoshi = 4.0e+09
)

estimator := BimodalEstimator{
BimodalConfig: BimodalConfig{BimodalScaleMsat: scale},
}

// An amount that's close to the success amount should have a very high
// probability.
amtCloseSuccess := successAmount + 1
p, err := estimator.probabilityFormula(
capacity, successAmount, failAmount, amtCloseSuccess,
)
require.NoError(t, err)
require.InDelta(t, 1.0, p, defaultTolerance)

// An amount that's close to the fail amount should have a very low
// probability.
amtCloseFail := failAmount - 1
p, err = estimator.probabilityFormula(
capacity, successAmount, failAmount, amtCloseFail,
)
require.NoError(t, err)
require.InDelta(t, 0.0, p, defaultTolerance)

// In the region where the bimodal model doesn't give good forecasts, we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: could you describe where the region is here where the bimodal is not good, I mean it is very dependant on the constants we use of the capacity and scale or ? I wonder why it scales linearly I thought it's more uniform now where the bimodal does fall of to sharply ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh ok given the integral the uniform distributions becomes linear, understand. But still interesting that a 25% into the channel-balance uncertainty we already have almost only the uniform part of the equation having the most effect. Probably needs some time to get an intuition for these values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it depends on the parameter combination, if you choose a huge s, every bimodal distribution will be a uniform one, since it doesn't matter how large the capacity is and this smears out the bimodal peaks.

If s is (very) small compared to c, the probability will drop quickly. If we then learn that the balance range is within the success/fail amounts range where the non-normalized probability is ~zero, we would still be able to retain the bimodal distribution within the newly learned range, but now starting at the success/fail amounts, if we had full numerical precision.

Because the numerical values of the exponential drop to zero if the scale is very small, at amounts >> s, we get the renormalization error. I think uncovering this error was nice, because I think keeping the bimodal distribution for the learned amounts isn't great and falling back to uniform distribution is better (since we it demonstrated that a bimodal model doesn't hold).

// fall back to a uniform model, which interpolates probabilities
// linearly.
amtLinear := successAmount + (failAmount-successAmount)*1/4
p, err = estimator.probabilityFormula(
capacity, successAmount, failAmount, amtLinear,
)
require.NoError(t, err)
require.InDelta(t, 0.75, p, defaultTolerance)
}

// TestIntegral tests certain limits of the probability distribution integral.
func TestIntegral(t *testing.T) {
t.Parallel()
Expand Down Expand Up @@ -689,9 +763,24 @@ func TestLocalPairProbability(t *testing.T) {
// FuzzProbability checks that we don't encounter errors related to NaNs.
func FuzzProbability(f *testing.F) {
estimator := BimodalEstimator{
BimodalConfig: BimodalConfig{BimodalScaleMsat: scale},
BimodalConfig: BimodalConfig{BimodalScaleMsat: 400_000},
}

// Predefined seed reported in
// https://github.com/lightningnetwork/lnd/issues/9085. This test found
// a case where we could not compute a normalization factor because we
// learned that the balance lies somewhere in the middle of the channel,
// a surprising result for the bimodal model, which predicts two
// distinct modes at the edges and therefore has numerical issues in the
// middle. Additionally, the scale is small with respect to the values
// used here.
f.Add(
uint64(1_000_000_000),
uint64(300_000_000),
uint64(400_000_000),
uint64(300_000_000),
)

f.Fuzz(func(t *testing.T, capacity, successAmt, failAmt, amt uint64) {
if capacity == 0 {
return
Expand Down
Loading