Skip to content

Be less aggressive in outbound HTLC CLTV timeout checks #1119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions lightning/src/chain/channelmonitor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -253,18 +253,13 @@ pub const ANTI_REORG_DELAY: u32 = 6;
/// fail this HTLC,
/// 2) if we receive an HTLC within this many blocks of its expiry (plus one to avoid a race
/// condition with the above), we will fail this HTLC without telling the user we received it,
/// 3) if we are waiting on a connection or a channel state update to send an HTLC to a peer, and
/// that HTLC expires within this many blocks, we will simply fail the HTLC instead.
Comment on lines -256 to -257
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comment removals related to the main change? The check that was changed doesn't seem related to awaiting on a peer connection/state update..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Good catch - so these two things are really the same thing, but this comment was actually referring to a check in channel and not in channelmanager, even though they're the same check. I've updated the check in channel as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mirror this in ChannelManager::do_chain_event as well? It still uses HTLC_FAIL_BACK_BUFFER to check for timed out HTLCs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that one is about our own incoming HTLCs, not forwarded ones - that is case 2.

///
/// (1) is all about protecting us - we need enough time to update the channel state before we hit
/// CLTV_CLAIM_BUFFER, at which point we'd go on chain to claim the HTLC with the preimage.
///
/// (2) is the same, but with an additional buffer to avoid accepting an HTLC which is immediately
/// in a race condition between the user connecting a block (which would fail it) and the user
/// providing us the preimage (which would claim it).
///
/// (3) is about our counterparty - we don't want to relay an HTLC to a counterparty when they may
/// end up force-closing the channel on us to claim it.
pub(crate) const HTLC_FAIL_BACK_BUFFER: u32 = CLTV_CLAIM_BUFFER + LATENCY_GRACE_PERIOD_BLOCKS;

// TODO(devrandom) replace this with HolderCommitmentTransaction
Expand Down
7 changes: 5 additions & 2 deletions lightning/src/ln/channel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ use ln::chan_utils::{CounterpartyCommitmentSecrets, TxCreationKeys, HTLCOutputIn
use ln::chan_utils;
use chain::BestBlock;
use chain::chaininterface::{FeeEstimator,ConfirmationTarget};
use chain::channelmonitor::{ChannelMonitor, ChannelMonitorUpdate, ChannelMonitorUpdateStep, HTLC_FAIL_BACK_BUFFER};
use chain::channelmonitor::{ChannelMonitor, ChannelMonitorUpdate, ChannelMonitorUpdateStep, LATENCY_GRACE_PERIOD_BLOCKS};
use chain::transaction::{OutPoint, TransactionData};
use chain::keysinterface::{Sign, KeysInterface};
use util::ser::{Readable, ReadableArgs, Writeable, Writer, VecWriter};
Expand Down Expand Up @@ -4107,7 +4107,10 @@ impl<Signer: Sign> Channel<Signer> {
pub fn best_block_updated<L: Deref>(&mut self, height: u32, highest_header_time: u32, logger: &L)
-> Result<(Option<msgs::FundingLocked>, Vec<(HTLCSource, PaymentHash)>), msgs::ErrorMessage> where L::Target: Logger {
let mut timed_out_htlcs = Vec::new();
let unforwarded_htlc_cltv_limit = height + HTLC_FAIL_BACK_BUFFER;
// This mirrors the check in ChannelManager::decode_update_add_htlc_onion, refusing to
// forward an HTLC when our counterparty should almost certainly just fail it for expiring
// ~now.
let unforwarded_htlc_cltv_limit = height + LATENCY_GRACE_PERIOD_BLOCKS;
self.holding_cell_htlc_updates.retain(|htlc_update| {
match htlc_update {
&HTLCUpdateAwaitingACK::AddHTLC { ref payment_hash, ref source, ref cltv_expiry, .. } => {
Expand Down
17 changes: 12 additions & 5 deletions lightning/src/ln/channelmanager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1850,17 +1850,24 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
break Some(("Forwarding node has tampered with the intended HTLC values or origin node has an obsolete cltv_expiry_delta", 0x1000 | 13, Some(self.get_channel_update_for_unicast(chan).unwrap())));
}
let cur_height = self.best_block.read().unwrap().height() + 1;
// Theoretically, channel counterparty shouldn't send us a HTLC expiring now, but we want to be robust wrt to counterparty
// packet sanitization (see HTLC_FAIL_BACK_BUFFER rational)
// Theoretically, channel counterparty shouldn't send us a HTLC expiring now,
// but we want to be robust wrt to counterparty packet sanitization (see
// HTLC_FAIL_BACK_BUFFER rationale).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, HTLC_FAIL_BACK_BUFFER is now only used in channelmanager.rs so perhaps ideally it would live there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like keeping it next to the other block count security assumption constants above it? I don't feel super strongly but unless you do I'm not gonna bother moving it.

if msg.cltv_expiry <= cur_height + HTLC_FAIL_BACK_BUFFER as u32 { // expiry_too_soon
break Some(("CLTV expiry is too close", 0x1000 | 14, Some(self.get_channel_update_for_unicast(chan).unwrap())));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have functional test coverage for this check, I don't get a failure when I'm blanking it out ?

In fact, do we really need it, because I think if we check that incoming cltv_expiry is always inferior to outgoing_cltv_value + chan.get_cltv_expiry_delta and outgoing_cltv_value is always superior to cur_height + LATENCY_GRACE_PERIOD_BLOCKS, we implicitly enforce that incoming cltv_expiry is always superior to cur_height + LATENCY_GRACE_PERIOD_BLOCKS + chan.get_cltv_expiry_delta ? And MIN_CLTV_EXPIRY_DELTA is always superior to HTLC_FAIL_BACK_BUFFER.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I buy that we could drop this check. Doesn't mean we should, of course, but I buy that its actually unreachable. I appreciate the simplicity/readability of a <= and a > check right after each other with straight constants and not having to dig into channel/config code to make sure we don't accidentally do something stupid :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think ultimately we should be able to capture safety invariant of our HTLC forwarding policy with constants a la CHECK_CLTV_EXPIRY_SANITY. Tracking in #680 as that would be a good opportunity to re-think our BOLT4 checks enforcement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think this is basically doing that? I'm not suggesting we couldn't document more, only that this is a nice simple straightforward check, in addition to our configuration-based one.

}
if msg.cltv_expiry > cur_height + CLTV_FAR_FAR_AWAY as u32 { // expiry_too_far
break Some(("CLTV expiry is too far in the future", 21, None));
}
// In theory, we would be safe against unintentional channel-closure, if we only required a margin of LATENCY_GRACE_PERIOD_BLOCKS.
// But, to be safe against policy reception, we use a longer delay.
if (*outgoing_cltv_value) as u64 <= (cur_height + HTLC_FAIL_BACK_BUFFER) as u64 {
// If the HTLC expires ~now, don't bother trying to forward it to our
// counterparty. They should fail it anyway, but we don't want to bother with
// the round-trips or risk them deciding they definitely want the HTLC and
// force-closing to ensure they get it if we're offline.
// We previously had a much more aggressive check here which tried to ensure
// our counterparty receives an HTLC which has *our* risk threshold met on it,
// but there is no need to do that, and since we're a bit conservative with our
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be interesting to dissociate the two different types of risks we're exposed too. The first risk is missing the onchain confirmation of a HTLC on the backward link and as such accounting a loss in our forward HTLC balance. The second risk is the automatic force-close of a channel to achieve the onchain confirmation.

I think what this PR highlights is that we can decrease the latter without affecting the former. Onchain confirmation is function of when the forward link is going to go onchain (should_broadcast_holder_commitment_tx) and the CLTV delta between the HTLC on the backward link. None of those values are affected by this change.

However I think we increase our exposure to the second type of risk. In should_broadcast_holder_commitment_tx, we decide to go onchain only if an outbound HTLC is behind current height by LATENCY_GRACE_PERIOD_BLOCKS. Previously I think we had a buffer of HTLC_FAIL_BACK_BUFFER + LATENCY_GRACE_PERIOD_BLOCKS in the worst-case scenario of forwarding a HTLC with an expiration at the boundary height value. Now, we only have LATENCY_GRACE_PERIOD_BLOCKS * 2.

I think this is okay, though if we observe an increase rate of channel force-closed, we could still bump to an intermediary value between LATENCY_GRACE_PERIOD_BLOCKS and HTLC_FAIL_BACK_BUFFER.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think part of the argument in this PR is that we kinda leave correct handling up to our counterparty. If we give them an HTLC which expires soon they either need to claim it ASAP or fail it ASAP, and probably shouldn't themselves forward it. This is why we use LATENCY_GRACE_PERIOD_BLOCKS - its the "we may be desync'd from our peer by this many blocks, so give a bit of a grace period before we do anything drastic" constant.

If our counterparty does take a few blocks to fail/claim the HTLC and we have to force-close, I think that speaks more to our counterparty being buggy than us.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the only cases of non-buggy peers taking more than few blocks to fail/claim the HTLC are hold-invoices or Lightning Rod styles of setup. Though their inflated CLTVs are going to be encompassed within outgoing_cltv_value and should not interfere with LATENCY_GRACE_PERIOD_BLOCKS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the only cases of non-buggy peers taking more than few blocks to fail/claim the HTLC are hold-invoices or Lightning Rod styles of setup. Though their inflated CLTVs are going to be encompassed within outgoing_cltv_value and should not interfere with LATENCY_GRACE_PERIOD_BLOCKS.

I don't think that matters? By the time an invoice gets close to claimable, our counterparty needs to have failed the forwarding end on-chain so that they can do an offchain failure with us. If they don't or if they forward without enough time to do so, then they are buggy, long-CLTV or not.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that matters?

Yes we agree here, this is what I meant by "should not interfere" :)

// risk threshold it just results in failing to forward payments.
if (*outgoing_cltv_value) as u64 <= (cur_height + LATENCY_GRACE_PERIOD_BLOCKS) as u64 {
break Some(("Outgoing CLTV value is too soon", 0x1000 | 14, Some(self.get_channel_update_for_unicast(chan).unwrap())));
}

Expand Down
2 changes: 1 addition & 1 deletion lightning/src/ln/functional_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4056,7 +4056,7 @@ fn do_test_holding_cell_htlc_add_timeouts(forwarded_htlc: bool) {
check_added_monitors!(nodes[1], 0);
}

connect_blocks(&nodes[1], TEST_FINAL_CLTV - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS);
connect_blocks(&nodes[1], TEST_FINAL_CLTV - LATENCY_GRACE_PERIOD_BLOCKS);
assert!(nodes[1].node.get_and_clear_pending_msg_events().is_empty());
assert!(nodes[1].node.get_and_clear_pending_events().is_empty());
connect_blocks(&nodes[1], 1);
Expand Down