You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PRs adds strategies to handle two typical disaster scenarios: outdated commitments and unhandled exceptions.
Default strategies may be the best choice for smaller loosely administered nodes, while alternative strategies may avoid unnecessary mass force-close (but are reserved for advanced users who closely monitor the node).
Strategies for outdated commitments:
- request the counterparty to close the channel (default).
- if the node was restarted less than 10 min ago, log an error message and stop the node
Strategies for unhandled exceptions:
- local force close of the channel (default)
- log an error message and stop the node
Default settings maintain the same behavior as before.
In order to minimize force-closes of channels (especially for larger nodes), it is possible to customize the way eclair handles certain situations, like outdated commitment and internal errors.
6
+
7
+
:warning: There is no magic: non-default strategies are a trade-off where it is assumed that the node is closely monitored. Instead of automatically reacting to some events, eclair will stop and await manual intervention. It is therefore reserved for advanced or professional node operators. Default strategies are best suited for smaller loosely administered nodes.
8
+
9
+
### Outdated commitments
10
+
11
+
The default behavior, when our peer tells us (or proves to us) that our channel commitment is outdated, is to request a remote force-close of the channel (a.k.a. recovery).
12
+
13
+
It may happen that due to a misconfiguration, the node was accidentally restarted using e.g. an old backup, and the data wasn't really lost. In that case, simply fixing the configuration and restarting eclair would prevent a mass force-close of channels.
14
+
15
+
This is why an alternative behavior is to simply log an error and stop the node. However, because our peer may be lying when it tells us that our channel commitment data is outdated, there is a 10 min window after restart when this strategy applies. After that, the node reverts to the default strategy.
16
+
17
+
During the 10 min window, the operator should closely monitor the node and assess, if the peer stops, whether this is really a case of using outdated data, or a peer is just lying. If it turns out that the data is really outdated due to a misconfiguration, the operator has an opportunity to fix it and restart the node. If the data is really outdated because it was simply lost, then the operator should change the strategy to the default and restart the node: this will cause the force close of outdated channels, but there is no way to avoid that.
18
+
19
+
Here is a decision tree:
20
+
```
21
+
if (node stops after restart)
22
+
if (false positive)
23
+
configure eclair to use default strategy and restart node (will force close channels to malicious peers)
24
+
else
25
+
if (more up-to-date data available)
26
+
configure eclair to point to proper database and restart node
27
+
else
28
+
configure eclair to use default strategy and restart node (will force close all outdated channels)
29
+
```
30
+
31
+
The alternate strategy can be configured by setting `eclair.outdated-commitment-strategy=stop` (see [`reference.conf`](https://github.com/ACINQ/eclair/blob/master/eclair-core/src/main/resources/reference.conf)).
32
+
33
+
### Unhandled exceptions
34
+
35
+
The default behavior, when we encounter an unhandled exception or internal error, is to locally force-close the channel.
36
+
37
+
Not only is there a delay before the channel balance gets refunded, but if the exception was due to some misconfiguration or bug in eclair that affects all channels, we risk force-closing all channels.
38
+
39
+
This is why an alternative behavior is to simply log an error and stop the node. Note that if you don't closely monitor your node, there is a risk that your peers take advantage of the downtime to try and cheat by publishing a revoked commitment. Additionally, while there is no known way of triggering an internal error in eclair from the outside, there may very well be a bug that allows just that, which could be used as a way to remotely stop the node (with the default behavior, it would "only" cause a local force-close of the channel).
40
+
41
+
The alternate strategy can be configured by setting `eclair.unhandled-exception-strategy=stop` (see [`reference.conf`](https://github.com/ACINQ/eclair/blob/master/eclair-core/src/main/resources/reference.conf)).
Copy file name to clipboardExpand all lines: docs/release-notes/eclair-vnext.md
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,14 @@
4
4
5
5
## Major changes
6
6
7
+
### Advanced strategies to avoid mass force-close of channels
8
+
9
+
In order to minimize force-closes of channels (especially for larger nodes), it is possible to customize the way eclair handles certain situations, like outdated commitment and internal errors.
10
+
11
+
:warning: There is no magic: non-default strategies are a trade-off where it is assumed that the node is closely monitored. Instead of automatically reacting to some events, eclair will stop and await manual intervention. It is therefore reserved for advanced or professional node operators. Default strategies are best suited for smaller loosely administered nodes.
12
+
13
+
This feature is documented [here](../Advanced.md).
14
+
7
15
### Separate log for important notifications
8
16
9
17
Eclair added a new log file (`notifications.log`) for important notifications that require an action from the node operator.
@@ -1669,6 +1701,7 @@ class Channel(val nodeParams: NodeParams, val wallet: OnChainChannelFunder, remo
1669
1701
casesyncSuccess: SyncResult.Success=>
1670
1702
varsendQueue=Queue.empty[LightningMessage]
1671
1703
// normal case, our data is up-to-date
1704
+
1672
1705
if (channelReestablish.nextLocalCommitmentNumber ==1&& d.commitments.localCommit.index ==0) {
1673
1706
// If next_local_commitment_number is 1 in both the channel_reestablish it sent and received, then the node MUST retransmit funding_locked, otherwise it MUST NOT
1674
1707
log.debug("re-sending fundingLocked")
@@ -2288,7 +2321,8 @@ class Channel(val nodeParams: NodeParams, val wallet: OnChainChannelFunder, remo
log.error("we just restarted and may have an outdated commitment: standard procedure would be to request our peer to force-close, but eclair has been configured to halt instead. Please ensure your database is up-to-date and restart eclair.")
2625
+
NotificationsLogger.logFatalError(
2626
+
s"""stopping node as configured strategy to outdated commitment for nodeId=$remoteNodeId channelId=${d.channelId}
2627
+
|
2628
+
|Eclair has been configured to shut down if a sync error is detected at restart, instead of requesting a
2629
+
|force-close from the peer. This gives the operator a chance of avoiding an unnecessary mass force-close
2630
+
|of channels.
2631
+
|
2632
+
|You should investigate why Eclair appears to be using outdated data. If it turns out that this is due to a
2633
+
|misconfiguration, just fix it and restart the node. If however the data was really lost, then you should
2634
+
|change the outdated commitment strategy to the default and restart the node: this will cause a force
2635
+
|close of outdated channels, but there is no way to avoid that.
2636
+
|""".stripMargin)
2637
+
System.exit(1)
2638
+
stop(FSM.Shutdown)
2639
+
case _ =>
2640
+
valexc=PleasePublishYourCommitment(d.channelId)
2641
+
valerror=Error(d.channelId, exc.getMessage)
2642
+
goto(WAIT_FOR_REMOTE_PUBLISH_FUTURE_COMMITMENT) using DATA_WAIT_FOR_REMOTE_PUBLISH_FUTURE_COMMITMENT(d.commitments, channelReestablish) storing() sending error
0 commit comments