Skip to content

Commit 6ad8583

Browse files
committed
Do not broadcast commitment txn on Permanent mon update failure
See doc updates for more info on the edge case this prevents, and there isn't really a strong reason why we would need to broadcast the latest state immediately. Specifically, in the case of HTLC claims (the most important reason to ensure we have state on chain if it cannot be persisted), we will still force-close if there are HTLCs which need claiming and are going to expire. Surprisingly, there were no tests which failed as a result of this change, but a new one has been added.
1 parent 107c6c7 commit 6ad8583

File tree

4 files changed

+75
-52
lines changed

4 files changed

+75
-52
lines changed

lightning/src/chain/channelmonitor.rs

+15-9
Original file line numberDiff line numberDiff line change
@@ -1191,14 +1191,18 @@ impl<Signer: Sign> ChannelMonitor<Signer> {
11911191
}
11921192

11931193
/// Used by ChannelManager deserialization to broadcast the latest holder state if its copy of
1194-
/// the Channel was out-of-date. You may use it to get a broadcastable holder toxic tx in case of
1195-
/// fallen-behind, i.e when receiving a channel_reestablish with a proof that our counterparty side knows
1196-
/// a higher revocation secret than the holder commitment number we are aware of. Broadcasting these
1197-
/// transactions are UNSAFE, as they allow counterparty side to punish you. Nevertheless you may want to
1198-
/// broadcast them if counterparty don't close channel with his higher commitment transaction after a
1199-
/// substantial amount of time (a month or even a year) to get back funds. Best may be to contact
1200-
/// out-of-band the other node operator to coordinate with him if option is available to you.
1201-
/// In any-case, choice is up to the user.
1194+
/// the Channel was out-of-date.
1195+
///
1196+
/// You may also use this to broadcast the latest local commitment transaction, either because
1197+
/// a monitor update failed with [`ChannelMonitorUpdateErr::PermanentFailure`] or because we've
1198+
/// fallen behind (i.e we've received proof that our counterparty side knows a revocation
1199+
/// secret we gave them that they shouldn't know).
1200+
///
1201+
/// Broadcasting these transactions in the second case is UNSAFE, as they allow counterparty
1202+
/// side to punish you. Nevertheless you may want to broadcast them if counterparty doesn't
1203+
/// close channel with their commitment transaction after a substantial amount of time. Best
1204+
/// may be to contact the other node operator out-of-band to coordinate other options available
1205+
/// to you. In any-case, the choice is up to you.
12021206
pub fn get_latest_holder_commitment_txn<L: Deref>(&self, logger: &L) -> Vec<Transaction>
12031207
where L::Target: Logger {
12041208
self.inner.lock().unwrap().get_latest_holder_commitment_txn(logger)
@@ -1821,7 +1825,9 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
18211825
if *should_broadcast {
18221826
self.broadcast_latest_holder_commitment_txn(broadcaster, logger);
18231827
} else if !self.holder_tx_signed {
1824-
log_error!(logger, "You have a toxic holder commitment transaction avaible in channel monitor, read comment in ChannelMonitor::get_latest_holder_commitment_txn to be informed of manual action to take");
1828+
log_error!(logger, "WARNING: You have a potentially-toxic holder commitment transaction available to broadcast");
1829+
log_error!(logger, " in channel monitor for channel {}!", log_bytes!(self.funding_info.0.to_channel_id()));
1830+
log_error!(logger, " Read the docs for ChannelMonitor::get_latest_holder_commitment_txn and take manual action!");
18251831
} else {
18261832
// If we generated a MonitorEvent::CommitmentTxConfirmed, the ChannelManager
18271833
// will still give us a ChannelForceClosed event with !should_broadcast, but we

lightning/src/chain/mod.rs

+48-41
Original file line numberDiff line numberDiff line change
@@ -182,61 +182,68 @@ pub enum ChannelMonitorUpdateErr {
182182
/// our state failed, but is expected to succeed at some point in the future).
183183
///
184184
/// Such a failure will "freeze" a channel, preventing us from revoking old states or
185-
/// submitting new commitment transactions to the counterparty. Once the update(s) that failed
186-
/// have been successfully applied, a [`MonitorEvent::UpdateCompleted`] event should be returned
187-
/// via [`Watch::release_pending_monitor_events`] which will then restore the channel to an
188-
/// operational state.
189-
///
190-
/// Note that a given ChannelManager will *never* re-generate a given ChannelMonitorUpdate. If
191-
/// you return a TemporaryFailure you must ensure that it is written to disk safely before
192-
/// writing out the latest ChannelManager state.
193-
///
194-
/// Even when a channel has been "frozen" updates to the ChannelMonitor can continue to occur
195-
/// (eg if an inbound HTLC which we forwarded was claimed upstream resulting in us attempting
196-
/// to claim it on this channel) and those updates must be applied wherever they can be. At
197-
/// least one such updated ChannelMonitor must be persisted otherwise PermanentFailure should
198-
/// be returned to get things on-chain ASAP using only the in-memory copy. Obviously updates to
199-
/// the channel which would invalidate previous ChannelMonitors are not made when a channel has
200-
/// been "frozen".
201-
///
202-
/// Note that even if updates made after TemporaryFailure succeed you must still provide a
203-
/// [`MonitorEvent::UpdateCompleted`] to ensure you have the latest monitor and re-enable
204-
/// normal channel operation. Note that this is normally generated through a call to
205-
/// [`ChainMonitor::channel_monitor_updated`].
206-
///
207-
/// Note that the update being processed here will not be replayed for you when you return a
208-
/// [`MonitorEvent::UpdateCompleted`] event via [`Watch::release_pending_monitor_events`], so
209-
/// you must store the update itself on your own local disk prior to returning a
210-
/// TemporaryFailure. You may, of course, employ a journaling approach, storing only the
211-
/// ChannelMonitorUpdate on disk without updating the monitor itself, replaying the journal at
212-
/// reload-time.
185+
/// submitting new commitment transactions to the counterparty. Once the update(s) which failed
186+
/// have been successfully applied, [`ChannelManager::channel_monitor_updated`] can be used to
187+
/// restore the channel to an operational state.
188+
///
189+
/// Note that a given [`ChannelManager`] will *never* re-generate a [`ChannelMonitorUpdate`].
190+
/// If you return this error you must ensure that it is written to disk safely before writing
191+
/// the latest [`ChannelManager`] state, or you should return [`PermanentFailure`] instead.
192+
///
193+
/// Even when a channel has been "frozen", updates to the [`ChannelMonitor`] can continue to
194+
/// occur (e.g. if an inbound HTLC which we forwarded was claimed upstream, resulting in us
195+
/// attempting to claim it on this channel) and those updates must still be persisted.
196+
///
197+
/// No updates to the channel will be made which could invalidate other [`ChannelMonitor`]s
198+
/// until [`ChannelManager::channel_monitor_updated`] is called, even if you return no error on
199+
/// a later monitor update for the same channel.
213200
///
214201
/// For deployments where a copy of ChannelMonitors and other local state are backed up in a
215202
/// remote location (with local copies persisted immediately), it is anticipated that all
216203
/// updates will return TemporaryFailure until the remote copies could be updated.
217204
///
218-
/// [`ChainMonitor::channel_monitor_updated`]: chainmonitor::ChainMonitor::channel_monitor_updated
205+
/// [`PermanentFailure`]: ChannelMonitorUpdateErr::PermanentFailure
206+
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager
207+
/// [`ChannelManager::channel_monitor_updated`]: crate::ln::channelmanager::ChannelManager::channel_monitor_updated
219208
TemporaryFailure,
220-
/// Used to indicate no further channel monitor updates will be allowed (eg we've moved on to a
221-
/// different watchtower and cannot update with all watchtowers that were previously informed
222-
/// of this channel).
209+
/// Used to indicate no further channel monitor updates will be allowed (likely a disk failure
210+
/// or a remote copy of this [`ChannelMonitor`] is no longer reachable and thus not updatable).
223211
///
224-
/// At reception of this error, ChannelManager will force-close the channel and return at
225-
/// least a final ChannelMonitorUpdate::ChannelForceClosed which must be delivered to at
226-
/// least one ChannelMonitor copy. Revocation secret MUST NOT be released and offchain channel
227-
/// update must be rejected.
212+
/// When this is returned, [`ChannelManager`] will force-close the channel but *not* broadcast
213+
/// our current commitment transaction. This avoids a dangerous case where a local disk failure
214+
/// (e.g. the Linux-default remounting of the disk as read-only) causes [`PermanentFailure`]s
215+
/// for all monitor updates. If we were to broadcast our latest commitment transaction and then
216+
/// restart, we could end up reading a previous [`ChannelMonitor`] and [`ChannelManager`],
217+
/// revoking our now-broadcasted state before seeing it confirm and losing all our funds.
228218
///
229-
/// This failure may also signal a failure to update the local persisted copy of one of
230-
/// the channel monitor instance.
219+
/// Note that this is somewhat of a tradeoff - if the disk is really gone and we may have lost
220+
/// the data permanently, we really should broadcast immediately. If the data can be recovered
221+
/// with manual intervention, we'd rather close the channel, rejecting future updates to it,
222+
/// and broadcast the latest state only if we have HTLCs to claim which are timing out (which
223+
/// we do as long as blocks are connected).
231224
///
232-
/// Note that even when you fail a holder commitment transaction update, you must store the
233-
/// update to ensure you can claim from it in case of a duplicate copy of this ChannelMonitor
234-
/// broadcasts it (e.g distributed channel-monitor deployment)
225+
/// In order to broadcast the latest local commitment transaction, you'll need to call
226+
/// [`ChannelMonitor::get_latest_holder_commitment_txn`] and broadcast the resulting
227+
/// transactions once you've safely ensured no further channel updates can be generated by your
228+
/// [`ChannelManager`].
229+
///
230+
/// Note that at least one final [`ChannelMonitorUpdate`] may still be provided, which must
231+
/// still be processed by a running [`ChannelMonitor`]. This final update will mark the
232+
/// [`ChannelMonitor`] as finalized, ensuring no further updates (e.g. revocation of the latest
233+
/// commitment transaction) are allowed.
234+
///
235+
/// Note that even if you return a [`PermanentFailure`] due to unavailability of secondary
236+
/// [`ChannelMonitor`] copies, you should still make an attempt to store the update where
237+
/// possible to ensure you can claim HTLC outputs on the latest commitment transaction
238+
/// broadcasted later.
235239
///
236240
/// In case of distributed watchtowers deployment, the new version must be written to disk, as
237241
/// state may have been stored but rejected due to a block forcing a commitment broadcast. This
238242
/// storage is used to claim outputs of rejected state confirmed onchain by another watchtower,
239243
/// lagging behind on block processing.
244+
///
245+
/// [`PermanentFailure`]: ChannelMonitorUpdateErr::PermanentFailure
246+
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager
240247
PermanentFailure,
241248
}
242249

lightning/src/ln/chanmon_update_fail_tests.rs

+10
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,23 @@ fn test_simple_monitor_permanent_update_fail() {
6767
_ => panic!("Unexpected event"),
6868
};
6969

70+
assert!(nodes[0].tx_broadcaster.txn_broadcasted.lock().unwrap().is_empty());
71+
7072
// TODO: Once we hit the chain with the failure transaction we should check that we get a
7173
// PaymentPathFailed event
7274

7375
assert_eq!(nodes[0].node.list_channels().len(), 0);
7476
check_closed_event!(nodes[0], 1, ClosureReason::ProcessingError { err: "ChannelMonitor storage failure".to_string() });
7577
}
7678

79+
#[test]
80+
fn test_simple_monitor_permanent_update_fail() {
81+
do_test_simple_monitor_permanent_update_fail(false);
82+
83+
// Test behavior when the persister returns a PermanentFailure.
84+
do_test_simple_monitor_permanent_update_fail(true);
85+
}
86+
7787
#[test]
7888
fn test_monitor_and_persister_update_fail() {
7989
// Test that if both updating the `ChannelMonitor` and persisting the updated

lightning/src/ln/channelmanager.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -1019,7 +1019,7 @@ macro_rules! handle_monitor_err {
10191019
// given up the preimage yet, so might as well just wait until the payment is
10201020
// retried, avoiding the on-chain fees.
10211021
let res: Result<(), _> = Err(MsgHandleErrInternal::from_finish_shutdown("ChannelMonitor storage failure".to_owned(), *$chan_id, $chan.get_user_id(),
1022-
$chan.force_shutdown(true), $self.get_channel_update_for_broadcast(&$chan).ok() ));
1022+
$chan.force_shutdown(false), $self.get_channel_update_for_broadcast(&$chan).ok() ));
10231023
(res, true)
10241024
},
10251025
ChannelMonitorUpdateErr::TemporaryFailure => {
@@ -3516,7 +3516,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
35163516
// We do not do a force-close here as that would generate a monitor update for
35173517
// a monitor that we didn't manage to store (and that we don't care about - we
35183518
// don't respond with the funding_signed so the channel can never go on chain).
3519-
let (_monitor_update, failed_htlcs) = chan.force_shutdown(true);
3519+
let (_monitor_update, failed_htlcs) = chan.force_shutdown(false);
35203520
assert!(failed_htlcs.is_empty());
35213521
return Err(MsgHandleErrInternal::send_err_msg_no_close("ChannelMonitor storage failure".to_owned(), funding_msg.channel_id));
35223522
},

0 commit comments

Comments
 (0)