Skip to content

Attribution data (feature 36/37) #1044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

joostjager
Copy link
Collaborator

@joostjager joostjager commented Nov 21, 2022

Failure attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.

Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.

This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.

For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html and https://delvingbitcoin.org/t/latency-and-privacy-in-lightning/1723.

Furthermore, the htlc fail and fulfill flows are extended to convey self-reported htlc hold times to the sender.

Fail flow implementations

LND implementation: lightningnetwork/lnd#9888

LDK implementation: lightningdevkit/rust-lightning#2256

Eclair implementation: ACINQ/eclair#3065

CLN implementation: ElementsProject/lightning#8291

Fulfill flow implementations

LDK implementation: lightningdevkit/rust-lightning#3801

Eclair implementation: ACINQ/eclair#3100

@thomash-acinq
Copy link

I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible?
The design seems good to me, but as I've said previously, I think keeping hop payloads and hmacs for 8 nodes only (instead of 27) is enough for almost all use cases and would give us huge size savings.

@joostjager
Copy link
Collaborator Author

joostjager commented Dec 6, 2022

I don't have test vectors yet, but I can produce them. Will add them to this PR when ready.

Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful?

@joostjager joostjager force-pushed the fat-errors branch 2 times, most recently from 4b48481 to 24b10d5 Compare December 6, 2022 16:52
@joostjager
Copy link
Collaborator Author

@thomash-acinq added a happy fat error test vector.

09-features.md Outdated
@@ -41,6 +41,7 @@ The Context column decodes as follows:
| 20/21 | `option_anchor_outputs` | Anchor outputs | IN | `option_static_remotekey` | [BOLT #3](03-transactions.md) |
| 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions | IN | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]|
| 26/27 | `option_shutdown_anysegwit` | Future segwit versions allowed in `shutdown` | IN | | [BOLT #2][bolt02-shutdown] |
| 28/29 | `option_fat_error` | Can generate/relay fat errors in `update_fail_htlc` | IN | | [BOLT #4][bolt04-fat-errors] |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?

I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.

@joostjager
Copy link
Collaborator Author

Added fat error signaling to the PR.

@thomash-acinq
Copy link

I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
In the spec you write that the hmac covers

  • failure_len, failuremsg, pad_len and pad.

  • The first y+1 payloads in payloads. For example, hmac_0_2 would cover all three payloads.

  • y downstream hmacs that correspond to downstream node positions relative to x. For example, hmac_0_2 would cover hmac_1_1 and hmac_2_0.

implying that we need to concatenate them in that order. But in your code you follow a different order:

// Include payloads including our own.
_, _ = hash.Write(payloads[:(NumMaxHops-position)*payloadLen])

// Include downstream hmacs.
var hmacsIdx = position + NumMaxHops
for j := 0; j < NumMaxHops-position-1; j++ {
	_, _ = hash.Write(
		hmacs[hmacsIdx*sha256.Size : (hmacsIdx+1)*sha256.Size],
	)

	hmacsIdx += NumMaxHops - j - 1
}

// Include message.
_, _ = hash.Write(message)

I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet.

@joostjager
Copy link
Collaborator Author

Oh great catch! Will produce a new vector.

@joostjager
Copy link
Collaborator Author

@thomash-acinq updated vector

@joostjager
Copy link
Collaborator Author

Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139

@joostjager joostjager mentioned this pull request May 27, 2025
@joostjager
Copy link
Collaborator Author

I did a rough implementation in LDK of hold times for the success case. It turned out to be straight-forward, mostly reusing code from the failure case. Test vectors in #1261

Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic all looks good, but would like to restructure the Returning Errors section to follow the requirements / rationale structure that we have elsewhere in the spec - as is the two are interspersed.

Also needs to refer to option_attributable_failures rather than "attributable failures" in a few place.

09-features.md Outdated
@@ -46,6 +46,7 @@ The Context column decodes as follows:
| 26/27 | `option_shutdown_anysegwit` | Future segwit versions allowed in `shutdown` | IN | | [BOLT #2][bolt02-shutdown] |
| 28/29 | `option_dual_fund` | Use v2 of channel open, enables dual funding | IN | | [BOLT #2](02-peer-protocol.md) |
| 34/35 | `option_quiesce` | Support for `stfu` message | IN | | [BOLT #2][bolt02-quiescence] |
| 36/37 | `option_attributable_failure` | Can generate/relay attributable failures in `update_fail_htlc` | IN9 | | [BOLT #4][bolt04-attributable-failures] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be in invoices (9)?

Copy link
Collaborator Author

@joostjager joostjager Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed the question whether there are scenarios where senders would refuse a payment that pays to a node that doesn't support attribution data? If they'd return random data, and their predecessor adds attribution data, blame still lands on the final node pair.

Maybe attribution data is unnecessary for the final hop? 🤔 Of course they still want to pass back something to not stand out as a recipient.

For hold times, they will probably always report zero anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any instructions in this PR for how to react to the presence / absence of this feature bit in an invoice, so I think we can take it out?

Seems easy enough to come back and add signaling in invoices (+ handling instructions) if we want it in the future?

@@ -579,6 +579,9 @@ should add a random delay before forwarding the error. Failures are likely to
be probing attempts and message timing may help the attacker infer its distance
to the final recipient.

Note that nodes on the blinded route return failures through `update_fail_malformed_htlc` and therefore do not and can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: note that nodes on -> note that nodes in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Language skills insufficient for me to flag 'on the route' as incorrect.

Upon receiving a return packet, each hop generates its `ammag`, generates the
pseudo-random byte stream, and applies the result to the return packet before
return-forwarding it.
When supported, the erring node should also populate the `attribution_data` field in `update_fail_htlc` consisting of the following data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can more formally specify this:

  • if option_attributable_failure is advertised:
    • if path_key is not set in the incoming update_add_htlc:
      • MUST include htlc_hold_times in payload.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines 1063 to 1065
1. data:
* [`20*u32`:`htlc_hold_times`]
* [`210*sha256[..4]`:`truncated_hmacs`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated here and in BOLT02 - perhaps just link to the BOLT02 description?

In addition, each node locally stores data regarding its own sending peer in the
route, so it knows where to return-forward any eventual return packets.
The node generating the error message (_erring node_) builds a return packet

## Erring node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment on this section: I'm missing the structure of requirements/rationale that we have elsewhere in the specification.

For example, the section for the erring node notes "The sender can use this information to score nodes on latency". To me this should be in the origin node section or all contained in a single rationale at the end. Ditto with the game theory of guessing HMACs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I've added a commit trying to address this and improve it a little, but it remains difficult to capture this in requirements and still have explanation too I found.

@joostjager joostjager changed the title Attributable failures (feature 36/37) Attributable data (feature 36/37) Jun 5, 2025
@joostjager joostjager changed the title Attributable data (feature 36/37) Attribution data (feature 36/37) Jun 5, 2025
Copy link

@brh28 brh28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the start time and end time of the htlc_hold_time defined. I assume these are upon receiving the update_add_htlc and before sending the update_fulfill_htlc/update_fail_htlc, respectively?

Overall, I like the idea of htlc_hold_time but suspect it'll be gamed by routing nodes.

@Roasbeef
Copy link
Collaborator

Roasbeef commented Jun 6, 2025

Overall, I like the idea of htlc_hold_time but suspect it'll be gamed by routing nodes.

A motivated sender can always point point what the actual forwarding delays are by bisecting the route. They also always know how long the entire route took as well. You are correct though that it's intended mainly on a best effort basis, the overall assumption is that this can be used to allow a "fast" node to distinguish themselves, and be rewarded for that.

thomash-acinq added a commit to ACINQ/eclair that referenced this pull request Jun 11, 2025
Attribution data is added to both failed and fulfilled HTLCs
lightning/bolts#1044
t-bast added a commit to t-bast/bolts that referenced this pull request Jun 17, 2025
In lightning#1044, we introduced a 12-blocks delay before considering a channel
closed when we see a spending confirmation on-chain. This ensures that
if the channel was spliced instead of closed, the channel participants
are able to broadcast a new `channel_announcement` to the rest of the
network. If this new `channel_announcement` is received by renote nodes
before the 12-blocks delay, the channel can keep its history in path
finding scoring/reputation, which is important.

We then realize that 12 blocks probably wasn't enough to allow this to
happen: some implementations default to 8 confirmations before sending
`splice_locked`, and allow node operators to increase this value. We
thus bump this delay to 72 blocks to allow more time before channels
are removed from the local graph.
@joostjager
Copy link
Collaborator Author

Overall, I like the idea of htlc_hold_time but suspect it'll be gamed by routing nodes.

As @Roasbeef says, in any case the sender knows how long the entire route took. The total penalty to apply is known. Without additional information, the sender might apply that penalty equally across all nodes.

The only thing the reported hold time does, is shift the penalty to better match reality. If a node reports zero, it doesn't mean that it doesn't get any penalty. It just means that the sender concludes that if that zero is true, that the outgoing side of that node is fast, and all of the to-be-distributed penalty for that node needs to be directed to its incoming side.

There is some gaming potential in there, but I think it is limited and also in a routing node's interest to report realistically.

@Roasbeef
Copy link
Collaborator

Ok, looks like this is ready for final review now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.