Skip to content

Fix reinit synchronization#13

Open
randomlogin wants to merge 49 commits intofebyeji:add-cbf-chain-sourcefrom
randomlogin:cbf-fix-reinit-sync
Open

Fix reinit synchronization#13
randomlogin wants to merge 49 commits intofebyeji:add-cbf-chain-sourcefrom
randomlogin:cbf-fix-reinit-sync

Conversation

@randomlogin
Copy link
Copy Markdown

Fixes a hang in start/stop/reinit after a checkpoint-based resume where wait_for_cbf_sync timed out because neither
sync timestamp advanced.

  • sync_lightning_wallet: when no scripts are registered, still call update_node_metrics_timestamp so callers waiting
    on a strict advance of latest_lightning_wallet_sync_timestamp make progress. The empty-scripts and normal paths now share a single timestamp-update site at the end of the closure.
  • fee_rate_cache_from_cbf: treat FetchBlockError::UnknownHash as a skip-this-cycle condition instead of an error. After a checkpoint resume, requester.chain_tip() can return the checkpoint hash which has no BlockNode in kyoto's tree yet, previously surfacing as FeerateEstimationUpdateFailed from node.start().

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tnull and others added 30 commits April 3, 2026 10:38
Resolve Node.js 20 deprecation warnings by updating all GitHub Actions
to their latest major versions supporting Node.js 24.

Co-Authored-By: HAL 9000
Add a `ChannelShutdownState` enum mirroring LDK's own type, and expose
it as an `Option<ChannelShutdownState>` field on `ChannelDetails`.
…_shutdown_state

Expose `ChannelDetails::channel_shutdown_state`
LSPs can now allow their clients to spend their entire channel balance.
Note that LSPs will still be required to maintain a reserve on their
side of the channel.
Bump CI action dependencies to latest major versions
…erve

Enable 0conf and 0reserve on channels with trusted peers
The upstream LDK API moved the min/max feerate parameters from
`ChannelManager::splice_channel` to `FundingTemplate::splice_in` and
`FundingTemplate::splice_out`. This associates the feerate range with
the funding contribution rather than the splice initiation, allowing
different feerates per contribution (e.g., for RBF bumps). Update
the splice_in_inner and splice_out call sites accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bf-refactor

Bump LDK dependency for splice feerate API refactor
We replace all `unwrap`s with `expect`s.

Co-Authored-By: HAL 9000
Log and skip runtime listener failures instead of panicking when accepting inbound connections or converting accepted sockets. These errors can happen in normal operation, so keeping the node running is safer than treating them as unreachable.

Co-Authored-By: HAL 9000
Replace the success-path unwrap on payment amounts with an expect that explains why outbound payments must already have a recorded amount by the time LDK reports them as sent.

Co-Authored-By: HAL 9000
Replace the pending-channel unwrap with an expect that records why supported LDK Node state should always include the former temporary channel id. Older rust-lightning state could omit it, but LDK Node never shipped before that field existed.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
Replace the user_version query unwrap with normal io::Error propagation so database initialization failures are reported cleanly instead of panicking.

Co-Authored-By: HAL 9000
Replace the Tokio runtime builder unwrap with io::Error propagation so VSS startup failures surface through the constructor instead of panicking.

Co-Authored-By: HAL 9000
Use a zero-millisecond fallback for elapsed-time logging so clock adjustments do not panic the chain polling loop.

Co-Authored-By: HAL 9000
Return Esplora client construction failures through build-time error handling instead of panicking so invalid headers or reqwest setup errors fail node construction cleanly.

Co-Authored-By: HAL 9000
Fail library clippy runs when new unwrap calls are introduced so the
unwrap policy stays enforced without pulling tests, benches, or docs
into the restriction.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
…wrap

Linting: Disallow `unwrap`s, replace with `expect` or error propagation
Mirror LDK's sentinel logic for confirmed_balance_candidate_index in
ClaimableOnChannelClose: when the index is 0 (no specific alternative
funding confirmed), use the last balance candidate (most current
splice/RBF attempt) instead of the first. This aligns with LDK's
claimable_amount_satoshis() behavior and fixes a mismatch where
total_lightning_balance_sats could differ from the sum of individual
LightningBalance amounts during pending splices.

AI tools were used in preparing this commit.
…ate-index

Fix balance candidate selection during pending splices
This makes it consistent with the argument used in ldk-server's open
channel API.

Also add a few more variable renames to make things consistent.
…erve

Add tests and improve documentation for zero reserve channels
Extract the repeated "acquire write lock on `node_metrics`, mutate a
field or two, then write the encoded struct to the kv-store" idiom into
a single helper in `io::utils`. As a side effect, `write_node_metrics`
is inlined into the helper.

Co-Authored-By: HAL 9000
Previously, after each successful Rapid Gossip Sync update the
background task wrote `latest_rgs_snapshot_timestamp` to the persisted
`NodeMetrics` immediately, while the network graph itself is only
flushed to disk later by LDK's background processor. A crash in that
window left the on-disk metric ahead of the on-disk graph — on restart
we'd resume RGS from the newer timestamp and permanently skip the
updates that were never persisted together with the graph.

Instead, seed the RGS start timestamp from
`NetworkGraph::get_last_rapid_gossip_sync_timestamp`, which is part of
the graph's own serialized state and therefore lands on disk atomically
with the channel updates it describes. The same source now backs the
RGS timestamp reported via `NodeStatus::latest_rgs_snapshot_timestamp`,
so the reported value always matches what's reflected in the graph.
Worst case after a crash is that we refetch the snapshots since the
last persisted graph — an idempotent operation — rather than silently
losing them.

The `latest_rgs_snapshot_timestamp` field is retired from `NodeMetrics`,
and TLV slot 6 is kept readable for backwards compatibility via LDK's
`legacy` TLV grammar. Old persisted records still deserialize; new
records no longer carry slot 6. The dead "reset RGS timestamp on
gossip-source switch" block in the P2P builder branch also goes away,
since the graph's timestamp remains the correct resume point across a
P2P→RGS switch.

Co-Authored-By: HAL 9000
Introduce new configuration parameters to manage Human-Readable Name
(HRN) resolution and DNSSEC validation behavior.

These settings allow users to define custom resolution preferences
for BOLT12 offer lookups. Moving these parameters
into the central configuration struct ensures that node behavior is
customizable at runtime and consistent across different network
environments. This abstraction is necessary to support diverse DNSSEC
requirements without hard-coding resolution logic.
Inject specialized resolution capabilities into OnionMessenger to
support outbound payments and third-party resolution services.

This change refines the previous resolution logic by allowing the node
to act as a robust BIP 353 participant. If configured as a service
provider, the node utilizes a Domain Resolver to handle requests for
other participants. Otherwise, it uses an HRN Resolver specifically for
initiating its own outbound payments. Providing these as optional
parameters in the Node constructor ensures the logic matches the
node's designated role in the ecosystem.
Introduce a comprehensive test case to verify the full lifecycle of a
payment initiated via a Human Readable Name (HRN).

This test ensures that the integration between HRN parsing, BIP 353
resolution, and BOLT12 offer execution is functioning correctly within
the node. By asserting that an encoded URI can be successfully resolved
to a valid offer and subsequently paid, we validate the reliability of
the resolution pipeline and ensure that recent architectural changes
to the OnionMessenger and Node configuration work in unison.
Update the GitHub Actions workflow to include coverage for the new
hrn_tests feature across multiple build configurations.

This ensures that the DNSSEC override logic is validated in both
standard Rust and UniFFI-enabled environments. Including these flags in
CI prevents regressions where testing-specific code might break the
primary build or fail to compile due to type mismatches between the
LDK and FFI wrappers.

Testing both feature combinations (with and without UniFFI) guarantees
that the abstraction for HumanReadableName remains consistent across
all supported platforms and integration layers.
@febyeji
Copy link
Copy Markdown
Owner

febyeji commented Apr 21, 2026

Thanks for the work! Could you split this into two commits? Fix 1 (timestamp progression) and Fix 2 (UnknownHash) are logically independent, so it would be helpful to bisect them.

…esolving-hrns

Add support for resolving BIP 353 Human-Readable Names
@randomlogin randomlogin force-pushed the cbf-fix-reinit-sync branch from 0785f6a to 063a7aa Compare April 21, 2026 09:43
tnull and others added 18 commits April 21, 2026 18:14
Read the RGS sync timestamp from the network graph
Fixes the problem when the funding tx is not registered for CBF node, for
example during splicing.

Added macros to skip incompatible with CBF backend tests (which require
mempool).
Decrease timeouts to not falsely conclude some tests as passing
Previously we tracked synced height and also had dirty flag for rescan
which was made from the genesis block. This was removed in favor of
block heights from channel manager and onchain wallet.

Safety reorg interval of 7 blocks (look-back) is added.

Force tests use CBF config with 1 required peers, as otherwise it would
die if one of two required peers went offline (which made impossible
testing of scenarios with 2 nodes on which goes offline).
  add auto-restart with exponential backoff (lightningdevkit#10)

* refactor(cbf): extract build_cbf_node helper and add wallet reference
* feat(cbf): auto-restart bip157 node with exponential backoff
* feat(cbf): add liveness check before returning requester
* fix(cbf): clean up scan state on filter scan failure
* fix(cbf): add per-block-fetch timeout to wallet sync
The old wait_for_cbf_sync only checked that the onchain sync timestamp
advanced, which could false-pass when the timestamp updated but wallet
state was still stale. The new version verifies both onchain and
lightning wallet syncs completed, and accepts a check closure so callers
can assert concrete wallet state (e.g. balance, payment existence).

Also simplifies CbfSyncConfig in test setup to use Default (which
already sets required_peers: 1 and reasonable timeouts).
`push` only appends above the tip, so when `recent_history` contained
blocks at or below the wallet's current checkpoint height after a reorg,
the stale hashes on the wallet checkpoint were never replaced. Switch to
`CheckPoint::insert`, which detects conflicting hashes and purges stale
blocks, matching bdk-kyoto's `UpdateBuilder::apply_chain_event`.

Also clear `latest_tip` on `BlockHeaderChanges::Reorganized` so cached
tip state does not point at an abandoned chain.

Update the `checkpoint_building_handles_reorg` unit test (added in
c1844b3) to exercise the fixed behaviour: a reorg where the new tip
is at the same height as the wallet's checkpoint must still result
in the reorged hashes winning.

Disclosure: drafted with assistance from Claude Code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…istered

`sync_lightning_wallet` returned early without updating
`latest_lightning_wallet_sync_timestamp` when no scripts were registered,
causing `wait_for_cbf_sync` to hang on start/stop/reinit cycles. Fold the
empty-scripts branch into the main path so both share a single
`update_node_metrics_timestamp` site at the end of the closure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_cbf

After a checkpoint-based resume, `requester.chain_tip()` can return the
checkpoint hash which has no `BlockNode` in kyoto's tree yet, causing
`get_block` to fail with `FetchBlockError::UnknownHash` and surfacing as
`FeerateEstimationUpdateFailed` from `node.start()`. Skip the cycle and
retry later instead of erroring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s after upstream API change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@randomlogin randomlogin force-pushed the cbf-fix-reinit-sync branch from 063a7aa to 6a5b69e Compare April 22, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants