Skip to content

Conversation

@rustyrussell
Copy link
Contributor

All the issues I found while trying to make deterministic core lightning runs.

  1. A crash when we splice a confirmed-but-unannounced channel.
  2. listtransactions not noticing when a sendpsbt transaction gets mined, if there are no outputs to us.
  3. Improve robustness of notleak() macro so we can use it very early.
  4. Make --dev-save-plugin-io more robust against restarts, by including timestamp in filename.
  5. Remove unused functions and fields.
  6. Fix benchmark routines to use timemono not timeabs.
  7. Fix invalid output in scb files (to do with last seen address).

@ddustin
Copy link
Collaborator

ddustin commented Sep 19, 2025

Do you have more the log handy for this crash? 9d76481

@rustyrussell
Copy link
Contributor Author

Do you have more the log handy for this crash? 9d76481

You can reproduce it yourself on master using this test. However, it was pretty clear, and the next commit fixes it...

@rustyrussell rustyrussell force-pushed the misc-fixes-for-examples branch 2 times, most recently from 7954cda to c217e3a Compare October 2, 2025 02:10
@ddustin
Copy link
Collaborator

ddustin commented Oct 10, 2025

LGTM except this weird CI failure -> https://github.com/ElementsProject/lightning/actions/runs/18181239624/job/51760146749?pr=8555

FAILED tests/test_splicing.py::test_splice_unannounced - pyln.client.lightning.RpcError: RPC call failed: method:
splice_update,
payload:{
 'channel_id': '1070a78b714290108ed590cd5bc5792d9c33096a9e56c66b6879e280ebe135d9',
 'psbt': 'cHNldP8BAgQCAAAAAQMEZwAAAAEEAQIBBQEDAQYBAwH7BAIAAAAAAQFCAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAADyKkABYAFMLMqxccKlvp2rUuxBuCWGMCTFRmAQ4gEHCni3FCkBCO1ZDNW8V5LZwzCWqeVsZraHnigOvhNdkBDwQBAAAAARAE/f///wz8CWxpZ2h0bmluZwEIFGRQGwtPqrIAAQDwAgAAAAABmtOmhRkUqWhL0Jz4z7SGBQzmCPbotH64ClYAVlgvOd8BAAAAAP3///8DAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAAD0JAACIAIFuM07kUz2fN2Ppic8kwNT3TZHZzT72WIQLC31O5CIDNAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAADyKkABYAFMLMqxccKlvp2rUuxBuCWGMCTFRmAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAAAB+cAABmAAAAAQFOAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAAD0JAACIAIFuM07kUz2fN2Ppic8kwNT3TZHZzT72WIQLC31O5CIDNAQVHUiECMkJm3oQDs6sVegnx94TVh69hgxyZjBUbzCG7dMKyMUshAuO9OACYZsnajsSqmcxOqcbA3UbfFcYe8M4fJxKRcU5XUq4BDiAQcKeLcUKQEI7VkM1bxXktnDMJap5WxmtoeeKA6+E12QEPBAAAAAABEAQAAAAADPwJbGlnaHRuaW5nAQinEkBq/HGQCAABAwiBbg0AAAAAAAEEFgAU1rlp023/2tmGkZP2Y+5IDf3HNIgH/ARwc2V0AiBc57lj03+PLVHK+7qSiqqeIguLvGYFcUmcA2KKOFG4zgz8CWxpZ2h0bmluZwEIXxADWpMm9qYAAQMIWwoAAAAAAAABBAAH/ARwc2V0AiBc57lj03+PLVHK+7qSiqqeIguLvGYFcUmcA2KKOFG4zgz8CWxpZ2h0bmluZwEIFS3gMVW4k/AAAQMI4MgQAAAAAAABBCIAIFuM07kUz2fN2Ppic8kwNT3TZHZzT72WIQLC31O5CIDNB/wEcHNldAIgXOe5Y9N/jy1Ryvu6koqqniILi7xmBXFJnANiijhRuM4M/AlsaWdodG5pbmcBCE4A/b4fKO3yAA=='},
error: {
 'code': 363,
 'message':
 'Splice command failed: Splice update error: interactivetx ADD_INPUT PSBT needs the previous transaction set.'}

This happens because interactivetx sees an input without the prevtx loaded. Most likely this is for the input spending the funding outpoint.

This is loaded from channeld via bitcoin_tx_from_txid, which asks lightningd with towire_channeld_splice_lookup_tx, which in turn calls wallet_transaction_get with the txid.

This is querying the database SELECT rawtx FROM transactions WHERE id=[txid].

Is it possible there is some flow where the funding tx doesn't make it into the transactions table in time? 🤔

In the same run we're seeing these failures, perhaps they provide a clue:

test_sendpsbt_confirm[False]
Error broadcasting transaction: error code: -26\\nerror message:\\nbad-txns-in-ne-out, value in != value out. Unsent tx discarded 
test_sendpsbt_confirm[True]
Error broadcasting transaction: error code: -26\\nerror message:\\nbad-txns-in-ne-out, value in != value out. Unsent tx discarded 

@rustyrussell
Copy link
Contributor Author

LGTM except this weird CI failure -> https://github.com/ElementsProject/lightning/actions/runs/18181239624/job/51760146749?pr=8555

FAILED tests/test_splicing.py::test_splice_unannounced - pyln.client.lightning.RpcError: RPC call failed: method:
splice_update,
payload:{
 'channel_id': '1070a78b714290108ed590cd5bc5792d9c33096a9e56c66b6879e280ebe135d9',
 'psbt': 'cHNldP8BAgQCAAAAAQMEZwAAAAEEAQIBBQEDAQYBAwH7BAIAAAAAAQFCAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAADyKkABYAFMLMqxccKlvp2rUuxBuCWGMCTFRmAQ4gEHCni3FCkBCO1ZDNW8V5LZwzCWqeVsZraHnigOvhNdkBDwQBAAAAARAE/f///wz8CWxpZ2h0bmluZwEIFGRQGwtPqrIAAQDwAgAAAAABmtOmhRkUqWhL0Jz4z7SGBQzmCPbotH64ClYAVlgvOd8BAAAAAP3///8DAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAAD0JAACIAIFuM07kUz2fN2Ppic8kwNT3TZHZzT72WIQLC31O5CIDNAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAADyKkABYAFMLMqxccKlvp2rUuxBuCWGMCTFRmAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAAAB+cAABmAAAAAQFOAVznuWPTf48tUcr7upKKqp4iC4u8ZgVxSZwDYoo4UbjOAQAAAAAAD0JAACIAIFuM07kUz2fN2Ppic8kwNT3TZHZzT72WIQLC31O5CIDNAQVHUiECMkJm3oQDs6sVegnx94TVh69hgxyZjBUbzCG7dMKyMUshAuO9OACYZsnajsSqmcxOqcbA3UbfFcYe8M4fJxKRcU5XUq4BDiAQcKeLcUKQEI7VkM1bxXktnDMJap5WxmtoeeKA6+E12QEPBAAAAAABEAQAAAAADPwJbGlnaHRuaW5nAQinEkBq/HGQCAABAwiBbg0AAAAAAAEEFgAU1rlp023/2tmGkZP2Y+5IDf3HNIgH/ARwc2V0AiBc57lj03+PLVHK+7qSiqqeIguLvGYFcUmcA2KKOFG4zgz8CWxpZ2h0bmluZwEIXxADWpMm9qYAAQMIWwoAAAAAAAABBAAH/ARwc2V0AiBc57lj03+PLVHK+7qSiqqeIguLvGYFcUmcA2KKOFG4zgz8CWxpZ2h0bmluZwEIFS3gMVW4k/AAAQMI4MgQAAAAAAABBCIAIFuM07kUz2fN2Ppic8kwNT3TZHZzT72WIQLC31O5CIDNB/wEcHNldAIgXOe5Y9N/jy1Ryvu6koqqniILi7xmBXFJnANiijhRuM4M/AlsaWdodG5pbmcBCE4A/b4fKO3yAA=='},
error: {
 'code': 363,
 'message':
 'Splice command failed: Splice update error: interactivetx ADD_INPUT PSBT needs the previous transaction set.'}

This happens because interactivetx sees an input without the prevtx loaded. Most likely this is for the input spending the funding outpoint.

This is loaded from channeld via bitcoin_tx_from_txid, which asks lightningd with towire_channeld_splice_lookup_tx, which in turn calls wallet_transaction_get with the txid.

This is querying the database SELECT rawtx FROM transactions WHERE id=[txid].

Is it possible there is some flow where the funding tx doesn't make it into the transactions table in time? 🤔

In the same run we're seeing these failures, perhaps they provide a clue:

test_sendpsbt_confirm[False]
Error broadcasting transaction: error code: -26\\nerror message:\\nbad-txns-in-ne-out, value in != value out. Unsent tx discarded 
test_sendpsbt_confirm[True]
Error broadcasting transaction: error code: -26\\nerror message:\\nbad-txns-in-ne-out, value in != value out. Unsent tx discarded 

Can't be, because then we'd hit this:

	channel_internal_error(channel,
			       "channel control unable to find txid %s",
			       fmt_bitcoin_txid(tmpctx, &txid));

@rustyrussell rustyrussell force-pushed the misc-fixes-for-examples branch 3 times, most recently from 30f4a3d to 98f3043 Compare October 22, 2025 20:42
```
DEBUG   lightningd: Got depth change 2->3 for e9e31956f77c3844ee2e6e4607dbfebdee95a9aa549668a7a429b8246a6a29de
**BROKEN** lightningd: FATAL SIGNAL 6 (version v25.09-20-g003ba4a)
**BROKEN** lightningd: backtrace: common/daemon.c:41 (send_backtrace) 0x619bef20e274
**BROKEN** lightningd: backtrace: common/daemon.c:78 (crashdump) 0x619bef20e408
**BROKEN** lightningd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x7a1ccf24532f
**BROKEN** lightningd: backtrace: ./nptl/pthread_kill.c:44 (__pthread_kill_implementation) 0x7a1ccf29eb2c
**BROKEN** lightningd: backtrace: ./nptl/pthread_kill.c:78 (__pthread_kill_internal) 0x7a1ccf29eb2c
**BROKEN** lightningd: backtrace: ./nptl/pthread_kill.c:89 (__GI___pthread_kill) 0x7a1ccf29eb2c
**BROKEN** lightningd: backtrace: ../sysdeps/posix/raise.c:26 (__GI_raise) 0x7a1ccf24527d
**BROKEN** lightningd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7a1ccf2288fe
**BROKEN** lightningd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7a1ccf22881a
**BROKEN** lightningd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7a1ccf23b516
**BROKEN** lightningd: backtrace: lightningd/peer_control.c:2202 (funding_depth_cb) 0x619bef1ac497
**BROKEN** lightningd: backtrace: lightningd/watch.c:223 (txw_fire) 0x619bef1cfcbf
**BROKEN** lightningd: backtrace: lightningd/watch.c:292 (watch_topology_changed) 0x619bef1cffa4
**BROKEN** lightningd: backtrace: lightningd/chaintopology.c:829 (updates_complete) 0x619bef144a8c
**BROKEN** lightningd: backtrace: lightningd/chaintopology.c:1047 (get_new_block) 0x619bef14561e
```

Signed-off-by: Rusty Russell <[email protected]>
…w one via splice.

This happens if the channel is *not* announcable yet.  Then we hit the assertion
in funding_depth_cb that the txid is the same as the current funding.txid.

Signed-off-by: Rusty Russell <[email protected]>
Changelog-EXPERIMENTAL: fixed crash when we splice a channel which hasn't been announced yet.
I got a NULL deref on `infcopy->remote_funding = *inflight->funding->splice_remote_funding`
at once point in testing, so this should prevent that from happening,
yet still allow us to catch it in CI if it happens again.

Signed-off-by: Rusty Russell <[email protected]>
We watch if they are to do with a channel, or have outputs going to us, but otherwise
we didn't, so we never updated the blockheight in the db.

Signed-off-by: Rusty Russell <[email protected]>
Changelog-Fixed: JSON-RPC: `listtransactions` now correctly updates `blockheight` for txs created by `sendpsbt` which have no change outputs.
It now simply renames tal names, so it's harmless to do even if we're
not going to do memleak detection.

Signed-off-by: Rusty Russell <[email protected]>
…eak coverage for any typed hash table.

You can now simply add per-tal-object helpers for memleak, but our older pattern required
calling memleak functions explicitly during memleak handling.  Hash tables in particular need
to be dynamically allocated (we override the allocators using htable_set_allocator and assume
this), so it makes sense to have a helper macro that does all three.

This eliminates a huge amount of code.

Signed-off-by: Rusty Russell <[email protected]>
…ames.

Incorporate a time: this covers the restart case as well.  And make it time_mono(),
which doesn't get overridden when we override normal wall time.

Signed-off-by: Rusty Russell <[email protected]>
I noticed, because it pulled in randomness routines.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the misc-fixes-for-examples branch from 98f3043 to c2b6992 Compare October 22, 2025 23:39
@rustyrussell rustyrussell merged commit f801054 into ElementsProject:master Oct 24, 2025
71 of 76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants