Skip to content

Dash mainnet readiness: embedded GBT + Phase C complete + interop fixes#43

Open
frstrtr wants to merge 220 commits intomasterfrom
dash-spv-embedded
Open

Dash mainnet readiness: embedded GBT + Phase C complete + interop fixes#43
frstrtr wants to merge 220 commits intomasterfrom
dash-spv-embedded

Conversation

@frstrtr
Copy link
Copy Markdown
Owner

@frstrtr frstrtr commented Apr 24, 2026

Summary

Brings c2pool-dash to functional parity with p2pool-dash for the network's
consensus-critical paths so c2pool-dash nodes can replace live p2pool-dash
mainnet nodes. Adds a fully self-sufficient SPV-embedded path so c2pool-dash
no longer depends on a local dashd RPC for block templates or submission.

189 commits since branching from master at `46e2bffc` (2026-04-18). Includes
the Phase C work (TEMPLATE/SUBMIT/CUTOVER/PAY/L/SML/QUO/MEMPOOL), the SPV
embedded pipeline (S1/S2/Phase U), four bug fixes from the 2026-04-24 testnet
battle-test, in-process crash diagnostics, and a per-share GENTX-OUTS
diagnostic for cross-implementation debugging.

Phase grouping (for review)

SPV embedded (S1 + S2 + Phase U)

DNS seeds (`ded056cc`), BIP 155 addrv2 (`6103af48`), BIP 152 vendored
blockencodings + negotiation + reassembly (`18104e26` + `3136e00e` +
`145a589a`), UTXO adapter + live connect_block + LevelDB + per-block height
(`79f71a74` + `145a589a` + `7c17cef7` + `96dfe510`), rolling-288 bootstrap
pipeline (`5a89fedb`), tip-changed handler with reorg disconnect_block +
header-sync nudge (`c68b44a7`).

Phase C-SML (Simplified Masternode List sync)

Live-validated bit-exact against Dash mainnet — `[SML] root MATCH` for
blocks 2460036/37/38 from 13+ peers. 7 build steps + Bug A fix (`3629cc74`
uint256 sort-order memcmp + diff.cbTx self-aligned root verify).

Phase C-QUO (Quorum DB persistence)

MVP (`40155291` + `96f10a38`) + persistence (`90f44cc2`) + step-4 schema bump
for mining_height (`f0b550f9`). `load_into()` replays full state at startup
with sentinel cross-check vs sml_db.

Phase L (ChainLock + dashboard SPV panel)

5 build steps + iteration-2a verify-gate + SML rollback (`7660cd70`) +
reorg drop (`a00c9657`) + iteration-2b ban-on-bad-data (`55c2f468`) +
dashboard SPV panel on /web/sync_status (`5b397381`). Linux/macOS only.

Phase C-PAY Path A (Masternode payment verification)

8 commits — ProTx vendor (`7607f59d`), MnStateDb (`b71a88e6`), snapshot
loader + integrity pin (`43815ed1`), `--dump-mn-snapshot` RPC dumper
(`ca9b13be`), RPC bootstrap fallback (`4fca8804`), first in-tree snapshot
(`8b3bdd98`, h=2460249, 2936 entries), per-block state machine
(`1f09f3df`), GetMNPayee + log-only `[PAY]` verify (`74bcebb7`).

Phase C-MEMPOOL

Storage + fee + LRU eviction + confirm-eviction + conflict detection
(`e6542439`); feerate-sorted index + recompute_unknown_fees (`d57ed8e5`).
Adapted from src/impl/ltc/coin/mempool.hpp, dropped segwit/weight.

Phase C-TEMPLATE (Embedded GetBlockTemplate, RPC-independent)

13b commits including subsidy + qfcommit scanner + merkle_root_quorums
(`f0b550f9`), embedded GBT (`346edee1`), CCbTx encoder (`b77cd2f8`),
best-CLSIG (`82e206b3`), MTP-11 mintime (`57eb9f60`), own DGW3 bits
(`530be2c7`), version field (`bbfbd532`), creditPoolBalance seeding
(`579753dc`), base58 payee (`cd40be7a`), DIP-0027 credit-pool state
machine (`1b5a3d32`), CreditPoolDb persistence (`78079113`).

Embedded GBT bit-exact for ALL DashWorkData and CCbTx fields; all 4
consensus dbs warm-startable (SMLDb / QuorumDb / MnStateDb / CreditPoolDb).

Phase C-SUBMIT (P2P block broadcast)

P2P block broadcast as PRIMARY path, RPC optional (`68938a24`); roundtrip
confirmation via pending-submit map matched by on_full_block hash + 30s
warning timer for >60s un-confirmed (`9cd51786`).

Phase C-CUTOVER (Default policy flip + observability)

`--gbt-source` flag (`54b9e41d`), [SUBMIT-SANITY] hop (`1b0d1fbd`),
auto-fallback hysteresis 3-strike (`6ca69995`), dashboard cutover panel +
atomic soak counters (`4814dbe8`), liveness watchdog + 'LOST CONTACT'
warning (`25e6713d`), default policy flip to embedded-prefer (`c053c14e`),
15 unit tests for CreditPool + subsidy (`43ef108f`).

Default behavior is now embedded-with-RPC-cross-check; legacy RPC-primary
requires explicit `--gbt-source rpc`.

Battle-test 2026-04-24 fixes (testnet sharechain interop)

  • Bug 4 (`73a287a3`) — DifficultyAdjustmentEngine div-by-zero on sub-1.0
    testnet difficulty. Branch on `diff >= 1.0` for sub-1.0 use multiplicative inverse.
  • Bug 5 (`509cf204`) — DashNodeImpl missing `node.listen()` so sharechain
    accept socket was never bound. Add the bind call.
  • Bug 7 (`1074f6ad`) — `make_coin_params(testnet)` was using mainnet's
    MAX_TARGET. p2pool-dash testnet rejected every share with 'share PoW invalid'.
    Added testnet-specific 0x00000fff... value.
  • Bug 1 (`a847cd2c`) — Iteration-2b CLSIG ban storm during catch-up.
    Reverted to log-only.
  • Wrong_port filter (`845cfaa5`) — `p.p2p_port = 8999` was unconditional.
    Testnet branch now sets 18999 so outbound dialer accepts real testnet peers.

All 4 bugs covered by regression tests in
`test/test_dash_battletest_regressions.cpp` (`dca4f656`, 7 tests).

Crash diagnostics + per-share visibility

  • SIGSEGV/SIGABRT/SIGBUS/SIGFPE handler in `main_dash` (`2d33d09a`) —
    writes backtrace to stderr + `/tmp/c2pool_dash_crash.log` via
    `backtrace_symbols_fd`. No `ulimit` / `sudo` / `core_pattern` dance.
    Captures next mainnet crash autonomously.
  • `[GENTX-OUTS]` per-share INFO log (`6cdd1afe`) — subsidy, walked count,
    n_outs, output (amount, script_hex), hash_link state. Designed for
    cross-comparison with p2pool's regenerated gentx when share.check() raises
    `'gentx doesn't match hash_link'`. Successfully diagnosed the stale-state
    rejection cycle on 2026-04-24.

Test status

  • 22 Dash unit tests pass: battletest_regressions (7) + credit_pool (10)
    • subsidy (5).
  • 18 test binaries / 290 tests total pass: Dash + LTC + DOGE +
    compact_blocks + decay_pplns + header_chain + mempool + mweb_builder +
    phase4_{embedded,live} + redistribute_address + share_messages +
    template_builder + utxo + v36_script_sorting + weights + hardening.
  • Live-validated Phase C-SML against Dash mainnet (bit-exact root match
    for h=2460036/37/38 from 13+ independent peers).
  • Live-validated Phase L against Dash mainnet ChainLocks + reorg drop.
  • Testnet sharechain interop verified between c2pool-dash on .41 and
    p2pool-dash on .42/.191 — federation works (44+/59 verified, zero
    rejections after the battle-test fixes landed).

Known issues NOT in this PR

  • 3 pre-existing link-failed tests (`sharechain_test`,
    `test_pplns_stress`, `test_auto_ratchet`) — same failure on master, not
    introduced here. CMake target_link_libraries order issue, separate fix.
  • macOS Intel build verification scheduled for 2026-04-25 (today, after open).
    macOS ARM build verification scheduled for 2026-04-29.
  • Bug 3: a 2.5h SIGSEGV on a Dash mainnet shadow soak captured no backtrace
    (apport ate the core, ulimit was 0). The crash handler in this PR
    (`2d33d09a`) makes the next firing autonomously diagnostic. Treated as
    non-blocking for merge.
  • Bug 6: `compute_merkle_root_quorums` never matches at testnet tip
    (cosmetic, not consensus-affecting).

Test plan

  • CI green
  • macOS Intel build (today)
  • macOS ARM build (Apr 29 — fix forward if blocked)
  • 24h testnet sharechain soak observation (already running on .41/.42/.191)
  • Code review by phase grouping above

frstrtr added 30 commits April 16, 2026 14:19
New files (re-applied from multipow branch onto current master):
- core/pow.hpp: PowFunc/BlockHashFunc/SubsidyFunc type aliases
  + pow::scrypt() and pow::sha256d() implementations
- core/coin_params.hpp: CoinParams struct — the p2pool "net" equivalent
  carrying all coin+pool parameters through the stack
- impl/ltc/params.hpp: ltc::make_coin_params(testnet) factory
  populating CoinParams with all LTC constants
All functions now take const core::CoinParams& params:
- share_init_verify, generate_share_transaction, share_check, verify_share
- create_local_share, create_local_share_v35, verify_merged_coinbase_commitment
- compute_gentx_before_refhash, compute_ref_hash_for_work, pubkey_hash_to_address
- Hardcoded scrypt → params.pow_func() (3 call sites)
- All PoolConfig:: statics → params.field (40+ replacements)
…oinParams

- share_tracker.hpp: Add m_params member, replace 34 PoolConfig:: refs
- node.hpp: Add m_coin_params + coin_params() getter, wire m_tracker.m_params
- node.cpp: Replace all PoolConfig:: with m_tracker.m_params->
- c2pool_refactored.cpp: Pass coin_params() to compute_ref_hash_for_work,
  create_local_share, pubkey_hash_to_address
- .gitignore: exclude build-qt/
New src/impl/bitcoin_family/ library with coin-agnostic types:
- coin/base_block.hpp: SmallBlockHeaderType + BlockHeaderType (generic
  80-byte Bitcoin header, no MWEB). LTC's BlockType extends with MWEB.
- coin/softfork_check.hpp: generic softfork JSON parser
- coin/txidcache.hpp: generic thread-safe tx cache

LTC block.hpp now uses `using bitcoin_family::coin::SmallBlockHeaderType`
and `using bitcoin_family::coin::BlockHeaderType`, extends BlockType with
MWEB (m_mweb_raw, HogEx serialization).

LTC softfork_check.hpp and txidcache.hpp are forwarding headers.

Header-only INTERFACE library — no .cpp files yet.
…tion.hpp

TxParams, TxPrevOut, TxIn, TxOut moved to bitcoin_family::coin.
LTC transaction.hpp imports them via using declarations, keeps
Transaction/MutableTransaction with MWEB HogEx flag (m_hogEx, flag 0x08).
bitcoin_family/coin/base_p2p_messages.hpp:
  22 generic Bitcoin wire protocol messages (version, verack, ping, pong,
  alert, inventory_type, inv, getdata, getblocks, getheaders, getaddr,
  addr, reject, sendheaders, notfound, feefilter, mempool, sendcmpct,
  wtxidrelay, sendaddrv2, btc_addr_record_t).

  Messages referencing coin-specific types (block, tx, headers, compact
  blocks) remain in ltc/coin/p2p_messages.hpp — they use LTC's MWEB-aware
  BlockType and MutableTransaction.

bitcoin_family/coin/chain_params.hpp:
  Generic ChainParams struct for header validation: target_timespan,
  target_spacing, pow_limit, genesis_hash, halving_interval, pow_func.
  Includes generic calculate_next_work_required() (Bitcoin/LTC algorithm).
  Dash can override with DigiShield; DOGE with its own schedule.

ltc/coin/p2p_messages.hpp refactored:
  Imports 22 generic messages via using declarations, defines only
  coin-specific messages (tx, block, headers, cmpctblock, getblocktxn,
  blocktxn) that reference LTC types.
X11 hash algorithm (11 sph functions pipeline):
- Pure C implementations from dashcore v0.16.1.1 (MIT license)
- 11 .c files + 13 .h files in impl/dash/crypto/x11/
- C++ wrapper: dash::crypto::hash_x11() in crypto/hash_x11.hpp
- Builds as dash_x11 static library

Dash CoinParams (impl/dash/params.hpp):
- X11 PoW, SHA256d block identity
- share_period=20, chain_length=4320, spread=10, protocol v1700
- address_version=76 ('X'), no segwit, no bech32
- p2pool port 8999, stratum 7903, bootstrap rov.p2p-spb.xyz
- identifier=7242ef345e1bed6b, prefix=3b3e1286f446b891
Share v16 (impl/dash/share.hpp + share_types.hpp):
  - DashShare struct with all v16 fields from p2pool-dash data.py
  - PackedPayment: masternode/superblock/platform payment entries
  - HashLinkType, MerkleLink, StaleInfo

Dash block (impl/dash/coin/block.hpp):
  - Uses bitcoin_family SmallBlockHeaderType + BlockHeaderType
  - Simple BlockType without MWEB (standard Bitcoin block)

Dash transaction (impl/dash/coin/transaction.hpp):
  - Uses bitcoin_family TxPrevOut, TxIn, TxOut
  - Adds DIP3/DIP4 CBTX support: type field + extra_payload
  - No segwit, no MWEB — version|type<<16 serialization
share_check.hpp:
  - share_init_verify() for v16: X11 PoW, hash_link, merkle_link
  - check_hash_link(), check_merkle_link() (same algorithm as LTC)
  - compute_gentx_before_refhash() for Dash donation script
  - Full ref_hash computation with v16 share_info serialization

config.hpp: DashPoolConfig + DashCoinConfig + combined Config typedef
messages.hpp: Dash p2pool P2P messages (protocol v1700) — same wire
  format as LTC but different identifier/prefix
peer.hpp: Dash peer data structure

All compile clean as header-only — no link errors.
VALIDATED: connects to rov.p2p-spb.xyz:8999, receives correct prefix
3b3e1286f446b891. X11 self-test passed. CoinParams functional.
CRITICAL FIX: params.hpp donation script was P2PK (forrestv's LTC key).
Corrected to P2PKH: 76a91420cb5c22b1e4d5947e5c112c7696b51ad9af3c61
(XdgF55wEHBRWwbuBniNYH4GvvaoYMgL84u — Dash p2pool donation address)

Added generate_share_transaction() with Dash v16 PPLNS formula:
- 49/50 (98%) to PPLNS-weighted workers (linear weights, NOT decay)
- 1/50 (2%) finder fee to block creator
- Remainder (rounding + donation weight) to donation script
- Masternode/superblock/platform payments subtracted from worker_payout
  BEFORE PPLNS distribution (they're not part of pool rewards)
- DIP3/DIP4 CBTX support: version=3, type=5, extra_payload

Weight formula: att * (65535 - donation_field) per share (16-bit field)
Coinbase output order: [workers sorted] [payments] [donation] [OP_RETURN]
Received 1 v16 share (1164 bytes) from rov.p2p-spb.xyz:8999.
Proper wire protocol: framing, version handshake, share messages.
CRITICAL: HeaderChain no longer hardcodes scrypt for PoW validation.
The PoW function is now injected via bitcoin_family::coin::ChainParams.pow_func.
This enables Dash (X11), BTC (SHA256d), or any coin's embedded SPV node.

Changes:
- bitcoin_family/coin/chain_params.hpp: Add block_hash_func, Checkpoint,
  std::optional<Checkpoint> for fast-sync
- header_chain.hpp: LTCChainParams is now alias for ChainParams.
  Factory functions make_ltc_chain_params_mainnet/testnet() inject scrypt.
  scrypt_hash(header) → pow_hash(header, m_params.pow_func) at both call sites.
  Legacy scrypt_hash() alias kept for backward compat.
- c2pool_refactored.cpp: Use new factory functions.

Dash can now create HeaderChain with X11 pow_func — no code duplication.
Generic overload takes ChainParams (halving_interval + initial_subsidy).
Works for LTC (840k blocks), BTC (210k blocks), DOGE/Dash (no halving).
Legacy LTC-specific overload kept for backward compatibility.
…shNodeImpl)

New files:
- share_chain.hpp: DashShareType variant, DashShareIndex, DashShareChain
- share_tracker.hpp: DashShareTracker with CoinParams, attempt_verify, PPLNS
- node.hpp: DashNodeImpl extending BaseNode<DashConfig, DashShareChain, DashPeer>
  with protocol v1700 handshake, message dispatch, share reception

Updated:
- share.hpp: DashShare now inherits BaseShare<uint256, 16> (required by ShareVariants)
- config.hpp: CoinConfig inherits Fileconfig (required by core::Config)
- main_dash.cpp: Uses DashNodeImpl with proper Config, prefix verification

Status: BaseNode connects to peer, prefix matches (3b3e1286f446b891),
version sent. Socket prefix scanner not triggering handle() — needs
investigation of Socket read pipeline vs p2pool message framing.
test_dash_p2p.py: Minimal Python3 p2pool wire protocol test server
p2pool-dash debug branch with logging (banning disabled)

FINDING: Local p2pool-dash on .191:18999 responds to raw TCP with 166-byte
version message (prefix 3b3e1286f446b891). c2pool's Socket::read_prefix()
async_read never completes despite data being available. This is a core::Socket
lifetime/ASIO issue, not Dash-specific. Raw TCP works, ASIO async_read doesn't.

Dashd running on 192.168.86.24:9999 (mainnet, block 2456020, fully synced).
RPC: dashrpc_test/testpassword123 on port 9998 (LAN accessible).
…IVED

Bug 1: rmsg->m_command == "version" never matched because wire command
is 12-byte null-padded. Fixed with .compare(0, 7, "version") prefix match.

Bug 2: handle() didn't restart peer timeout on each message, causing
NEW_PEER_TIMEOUT (10s) to always fire. Added peer->m_timeout->restart().

Bonus: socket.hpp init() split endpoint error check.

RESULT: c2pool-dash connects to rov.p2p-spb.xyz:8999, completes v1700
handshake (subver=dash-v1.0.6-1-g07aa58e-dirty), receives real v16
shares (1032 bytes, 1138 bytes). Full BaseNode pipeline working.
…network

Full share receive+verify pipeline working against rov.p2p-spb.xyz:8999:
- DashFormatter Read/Write: complete v16 wire deserialization (all fields)
- share_init_verify: hash_link + merkle_link → header → X11 PoW check
- process_shares: deserialize → verify → add to ShareTracker
- Fix testnet prefix/identifier from p2pool-dash source
- Fix ref_stream: add VarInt count for transaction_hash_refs (ListType mul=2)
- Fix hash_link_data: append outer coinbase_payload (data.py line 342-348)
- Fix PackStream rvalue binding in ShareType::load()
- ShareReplyData struct + share_getter_t using ReplyMatcher pattern
- download_shares(): recursive chain walker (p2pool node.py:108-141)
  - Random peer selection, random parents 0-499
  - Stops from verified chain heads
  - Failure tracking with MAX_EMPTY_RETRIES
- handle_sharereq/handle_sharereply: message dispatch + async response
- handle_get_shares: walk chain collecting shares up to parents count
- Trigger download from handle_version when peer has unknown best_share
- Fix: add peer to m_peers after stable() (was missing from BaseNode)
- Fix: store best_share on peer, trigger download after handshake complete
- Deduplicate shares in process_shares and download callback
Dash uses X11 for BOTH POW_FUNC and BLOCKHASH_FUNC (unlike LTC which uses
scrypt for PoW but SHA256d for block identity). share_init_verify was using
SHA256d for the share hash, causing all shares to have wrong hashes and
preventing chain linking (8900 shares = 8900 disconnected heads).

Fix: share_hash = params.pow_func(header) = X11(header)
Result: 8903 shares downloaded in single chain (heads=1) in ~8 seconds
- decode_payee_script(): handles "!" prefix (raw hex script) and regular
  base58 addresses (P2PKH/P2SH) for masternode/superblock/platform payments
- generate_share_transaction: properly writes payment outputs with decoded scripts
- Platform OP_RETURN payments ("!6a28...") now correctly decoded to raw script bytes
- Reference: p2pool-dash data.py lines 189-217
- Add Blockchain::DASH to enum in address_validator.hpp
- Add Dash case to rest_web_currency_info() and rest_node_info() endpoints
- Add initialize_dash_configs() with P2PKH v76 (X), P2SH v16 (7), testnet v140/v19
- Block explorer URLs: blockchair.com/dash/
Backend (web_server.cpp):
- block_value_miner = (subsidy - payment_amount) * (1 - fee) for Dash
- block_value_payments = masternode + superblock + platform payments
- payment_amount extracted from GBT template (dashd provides it)
- node_fee key is blockchain-aware (node_fee_dash, node_fee_ltc)

Frontend (dashboard.html):
- Show MN/Gov payment split when block_value_payments > 0
- Dynamic merged block symbol (not hardcoded DOGE)
- Hide merged block sub when no merged mining active
- Store window.currencySymbol from currency_info for reuse
- Node fee display uses correct coin symbol
Show "Total: 1.7703 (Master Node/Treasury: 1.3277)" under miner block value
when payment_amount exists — matches p2pool-dash format exactly.
Hide merged mining line + time separator when no merged chain active.
New files in src/impl/dash/coin/:
- p2p_messages.hpp: dashd wire messages (tx, block, headers) using Dash types
- p2p_connection.hpp: TCP connection with ReplyMatcher for block/header requests
- p2p_node.hpp: NodeP2P<Config> — dashd handshake, header sync, block relay
  - X11 block hash (not SHA256d) for all header/block identity
  - NODE_NETWORK only (no segwit, no MWEB, no compact blocks)
  - Protocol version 70230 (Dash Core v20+)
  - BIP 130 sendheaders for header-first announcements
  - Auto-reconnect with 30s interval
- node.hpp: coin::Node<Config> wrapper (P2P + future RPC)
- node_interface.hpp: event interface (new_block, new_headers, new_tx, full_block)
- block.hpp: fix BlockType serialize/unserialize to include transactions

Adapted from LTC coin/ layer — stripped MWEB, segwit, compact blocks, wtxidrelay.
…ve v3

Full header-only chain for Dash embedded SPV node:
- X11 PoW validation for all headers (fast ~0.1ms, no skip optimization needed)
- DarkGravityWave v3 difficulty retarget (24-block lookback, per-block adjustment)
- LevelDB persistence with write-back dirty set + atomic flush
- Block locator (BIP 31 exponential backoff) for getheaders
- Fast-start checkpoint support
- Dynamic checkpoint from RPC
- Reorg detection with tip-changed callback
- Thread-safe with mutex (same pattern as LTC HeaderChain)

Reference: dashcore/src/pow.cpp DarkGravityWave()
Genesis mainnet: 00000ffd590b1485b3caadc19b22e6379c733355108f107a430458cdf3407ab6
Genesis testnet: 00000bafbc94add76cb75e2ec92894837288a481e5c005f6563d91623bf8bc2c
- main_dash.cpp: --dashd HOST:PORT flag for dashd P2P connection
  - Wire new_headers → HeaderChain for SPV sync
  - Wire new_block/full_block events for block notifications
  - Set dashd wire prefix (0xbf0c6bbd mainnet, 0xcee2caff testnet)
  - Status line shows header sync progress
  - LevelDB persistence at ~/.c2pool/dash/embedded_headers
- hash_x11.hpp: add std::span<std::byte> overloads for PackStream compatibility
- Fix dangling reference: capture dashd_addr by config pointer, not local ref
- Dashd connects but disconnects (protocol version tuning needed)
- p2pool P2P share download still works (8900+ shares, heads=1)
- Fix dashd wire prefix byte order (was LE uint32, needs raw byte order)
  Mainnet: bf 0c 6b bd, Testnet: ce e2 ca ff
- Dashd handshake WORKS: connected to Dash Core v23.1.2 at height 2.4M
- Send getheaders with genesis hash as locator for initial sync
- Continue requesting headers when batch is full (>=2000)
- hash_x11.hpp: add std::byte span overloads for PackStream compatibility
- Both p2pool P2P (9073 shares) and dashd P2P running simultaneously
frstrtr added 6 commits April 25, 2026 02:11
The 19:23:15 UTC SIGSEGV captured by the new crash handler shows a
signal-11 in libstdc++ codecvt::do_length called from the boost::log
formatter, with NodeP2P::connected and the message-variant dispatch
on the stack. Diagnosis:

  NodeP2P does NOT inherit from std::enable_shared_from_this; the
  DashBroadcastPeer slot owns it by VALUE. When m_peers.erase(key)
  destructs the slot during the disconnect-reconnect cascade, any
  in-flight async callback (connect, read, timer, error) that
  captured the bare `this` pointer dereferences freed memory.

  Symptom path: connected() runs after destruct → m_target_addr.m_ip
  is freed-string memory → m_ip + ":" + port_str() composes garbage
  → boost::log formatter feeds non-UTF8 bytes to codecvt → crash.

  This matches the Bug 3 hypothesis from
  project_dash_soak_crash_2026_04_24.md exactly: "SIGSEGV after 2.5h
  during peer disconnect-then-reconnect cascade".

Proper fix is multi-day (shared_from_this on NodeP2P + capture self
into every async lambda + audit all timer handlers). For the mainnet
shadow soak we need a robust mitigation today.

Pragmatic fix: deferred destruction via a graveyard list. Both
disconnect_peer paths now move the slot's unique_ptr into a
graveyard with a 30-second TTL instead of dropping it directly.
A timer drains expired entries every 5 s. By the time the slot
actually destructs, all in-flight asio callbacks on it have either
completed or seen ec=cancelled — no more UAF window.

stop() also retires live peers into the graveyard before clearing
m_peers; the graveyard drains naturally at process exit when the
io_context has stopped (no more callbacks possible).

Cost: up to 30 s of memory retention per disconnected peer
(~1 MB/peer × ~10 churn events/h). Negligible vs the alternative
of a 2.5 h hard SIGSEGV.

Tests: 7 dash_battletest_regressions, 10 credit_pool, 5 subsidy
all PASS unchanged.

Filed proper-fix as separate followup — needs NodeP2P refactor to
inherit enable_shared_from_this and audit all async callsites for
self-capture. Tracked under Bug 3 in
project_dash_soak_crash_2026_04_24.md.
…n Factory

The 19:23:15 UTC SIGSEGV (signal 11 in libstdc++ codecvt::do_length called
from boost::log inside NodeP2P::connected) was a use-after-free during
the peer disconnect-then-reconnect cascade. Diagnosis confirmed in
project_dash_soak_crash_2026_04_24.md and traced through:

  * core::Factory<core::Client>::connect_socket captures the bare
    Client `&this`. When async_connect's handler fires, it dereferences
    m_node (= NodeP2P) which has been destroyed by m_peers.erase(key)
    in the broadcaster maintenance loop.
  * NodeP2P's 3 timer lambdas (reconnect, timeout, ping) capture
    `[this]` directly. Timer's m_destroyed flag protects against firing
    AFTER cancel, but does NOT protect against firing on a NodeP2P
    that's been destroyed mid-handler.
  * Symptom path: m_target_addr.m_ip in connected() points to freed
    memory → m_ip + ":" + port_str() composes garbage non-UTF8 bytes
    → boost::log formatter feeds them to codecvt → crash.

The previous mitigation (445987d graveyard) bought a 30 s window
of safety but was a band-aid masking the real lifecycle bug. This
commit replaces it with the proper fix.

Layered design:

  1. core::INetwork now inherits std::enable_shared_from_this<INetwork>,
     so any derived node owned by a shared_ptr can yield a weak_ptr
     for safe async capture.

  2. core::Factory::Client::connect_socket / Client::resolve /
     Server::accept now capture m_node->weak_from_this() into the
     async lambda. Inside the callback, lock to a strong shared_ptr
     to extend lifetime past the dispatch. If the weak_ptr was
     non-empty at registration but has expired by callback fire,
     skip the dispatch entirely (the destination is dead).

     LTC/DOGE pattern (NodeP2P NOT shared_ptr-managed) → weak_node
     is empty from the start; the `was_managed` bool records this
     and the callback falls back to raw m_node, preserving prior
     behavior. Zero LTC/DOGE regression.

  3. dash::coin::p2p::NodeP2P now inherits
     std::enable_shared_from_this<NodeP2P<ConfigType>> in addition
     to its existing INetwork base. The 3 timer lambdas
     (reconnect at line 207, timeout at line 224, ping at line 494)
     now capture `[self = shared_from_this()]` so the Timer handler
     keeps NodeP2P alive while it runs.

  4. DashBroadcastPeer holds NodeP2P as std::shared_ptr (was a value
     member). Constructor uses std::make_shared so the object is
     shared_ptr-managed at construction — required for
     shared_from_this() to work. All ~20 access sites in
     broadcaster_full.hpp updated from `peer->node_p2p.X()` to
     `peer->node_p2p->X()`.

The 445987d graveyard mitigation is reverted in the same commit
(stop()/disconnect_peer paths back to direct erase, GRAVEYARD_TTL
member + retire_to_graveyard()/drain_graveyard()/schedule_graveyard_drain()
removed) — proper lifetime management supersedes deferred destruction.

Tests: 18 binaries / 290 tests all PASS unchanged. LTC/DOGE
explicitly verified clean (test_doge_chain 29 PASS, test_mempool 22,
test_mweb_builder 26, test_template_builder 35, test_compact_blocks 15,
test_v36_script_sorting 11, test_weights 10, test_redistribute_address 12,
test_share_messages 9, test_utxo 14, test_phase4_embedded 23,
test_decay_pplns 5, test_header_chain 35, test_hardening 20).

Bug 3 in project_dash_soak_crash_2026_04_24.md can be marked closed
once the mainnet shadow soak runs >2.5 h without recurrence.
…hared

Two follow-on fixes from c42d0f5 that were caught at first runtime:

1. NodeP2P inherited enable_shared_from_this TWICE — once via INetwork
   (its base) and once directly. make_shared can't disambiguate the two
   weak_this pointers and leaves both empty → shared_from_this() throws
   bad_weak_ptr at the first timer-lambda registration.

   Drop the direct enable_shared_from_this<NodeP2P> inheritance. Add a
   private shared_self() helper that calls
   std::static_pointer_cast<NodeP2P>(INetwork::shared_from_this()) so the
   timer/connect lambdas still get a NodeP2P-typed self for method
   dispatch without the cast site repeated 3x.

2. dash::coin::Node (the singleton dashd-RPC NodeP2P holder, separate
   from DashBroadcastPeer) constructed m_p2p via std::make_unique. That
   leaves NodeP2P NOT shared_ptr-managed → shared_from_this() throws
   bad_weak_ptr on the singleton dashd connection, killing startup.

   Switch to std::make_shared. Type also changes from
   std::unique_ptr<NodeP2P<config_t>> to std::shared_ptr — required for
   shared_from_this() to work.

Verified: c2pool-dash now starts on Dash mainnet, embedded GBT
generating jobs (height=2460532, coinb_bytes=537), 4 sharechain
peers, headers SYNCED 2460531/60532, hash=38102 GH/s observed.
The mainnet shadow soak surfaced a constant-payee bug: every block
0..N had `[PAY] MISMATCH expected=6cfdbaaede02ab2e observed=...`
where the SAME MN was always c2pool's prediction. Direct dashd
query showed that MN had `lastPaidHeight: -1` (= "never paid")
in dashd's protx info JSON.

mn_snapshot_rpc.hpp:65 read it via:
    m.nLastPaidHeight = s.value("lastPaidHeight", 0);

`s.value(key, 0)` deduces int from the default; the JSON parser
returns int(-1); implicit conversion to uint32_t wraps to
UINT32_MAX (4294967295). In find_expected_payee:
    int h = static_cast<int>(st.nLastPaidHeight);  // (int)UINT32_MAX = -1
That MN now has h = -1, less than every other MN's positive height
→ always wins the min-find → tie-broken by lowest proTxHash →
ONE specific never-paid MN becomes "next to be paid" forever.
The author already handled the same -1 sentinel for
nPoSeBanHeight (line 72-73) but missed the height fields.

Two-layer fix:

1. mn_snapshot_rpc.hpp — take_height_or_zero() helper that reads
   as int64_t and clamps negatives to 0 BEFORE the uint32_t store.
   Applied to nRegisteredHeight, nLastPaidHeight,
   nPoSeRevivedHeight, nPoSeBanHeight (all of which dashd may emit
   as -1). New snapshots dumped via --dump-mn-snapshot will store
   correct values.

2. mn_state_machine.hpp find_expected_payee — defensive sane_height()
   inside the loop that maps UINT32_MAX → 0. Existing in-tree /
   persisted snapshots that already have UINT32_MAX baked into bytes
   are corrected at evaluation time without requiring a re-dump.

Verified live: c2pool-dash on mainnet, after wiping the persisted
mn_state_db and reloading from a freshly-dumped snapshot, no longer
emits the constant `expected=6cfdba...` — `expected` now varies per
block as the algorithm intends.

Note: Some [PAY] MISMATCH events still occur because of an unrelated
structural issue — the bootstrap gap. Snapshot is taken at height N,
and individual blocks N+1..tip are not always sequentially apply_block'd
(headers can advance in batches without per-intermediate-block tip
events). That gap means c2pool's state lags dashd by some number of
unprocessed blocks. The proper fix is block backfill on startup;
filed as a Phase C-PAY follow-up. The MISMATCH is log-only-at-MVP
by design and does not affect consensus correctness.

Tests: 22 dash binaries unchanged (battletest_regressions 7,
credit_pool 10, subsidy 5).
… state

Bootstrap window blocks arrive in peer-response order, not chain order.
apply_block has no internal idempotency check — re-applying a block at
h <= persisted best_height resets nLastPaidHeight backwards, corrupting
the projection. After a snapshot at h=N populates state with the latest
nLastPaidHeight values, every bootstrap-window block at h<=N that
re-arrives bumps SOME MN's nLastPaidHeight back to its earlier value.

Net observed effect on mainnet: expected payee converges to whichever
MN was bumped by the EARLIEST re-applied bootstrap block (lowest
resulting nLastPaidHeight) and stays constant -> 100% [PAY] MISMATCH
rate against dashd's actual selection.

Gate apply_block (and the [PAY] verification, which would be meaningless
against re-applied state) on `mn_state_db->is_open() && height <=
mn_state_db->get_best_height()`. Other state machines (credit_pool,
quorums, GBT) continue to receive every block — they have their own
ordering / idempotency semantics.

Pairs with e4c7c10 (UINT32_MAX wrap fix). The wrap fix corrected the
sentinel value; this commit prevents earlier-block re-application from
overwriting the correct value with a lower one.
Top-level CMakeLists.txt declares Boost::system as OPTIONAL_COMPONENTS
because system has been header-only since Boost 1.69 — no link target
is required. CI runners (Linux/macOS arm64/Windows) all fail at the
generate step because the optional target isn't materialized when the
Conan-provided Boost config doesn't expose it.

The c2pool-dash target is the only one in the tree that puts
Boost::system on its link line; LTC/DOGE link asio fine without it.
Drop it to unbreak CI.
Comment thread src/core/web_server.cpp
// V35→V36 transition tracking is LTC-specific. Other blockchains
// (e.g. Dash v16) don't have a pending transition, so return an
// empty object to keep the dashboard's transition banners hidden.
if (m_blockchain != Blockchain::LITECOIN)
denom_shares = static_cast<double>(num_shares > 1 ? num_shares - 1 : 1);
}

double ratio = (denom_shares > 0 && target_time_per_mining_share_ > 0)
@@ -1641,8 +1642,8 @@
}
t.pool_hashrate = pool_hr;

double share_period = static_cast<double>(PoolConfig::share_period());
double chain_length = static_cast<double>(PoolConfig::real_chain_length());
double share_period = static_cast<double>(m_params->share_period);
frstrtr added 23 commits April 25, 2026 11:06
share_init_verify gained a CoinParams& second arg in commit a94435e
on the branch, but test_threading.cpp's six callsites were never
updated. Linux x86_64 CI fails at compile (test_threading.cpp.o).

Fix: introduce a static test_coin_params() helper backed by
ltc::make_coin_params(testnet=false), thread it through all six
callsites. Verify is coin-wide-constant for the params it consumes,
so testnet-vs-mainnet doesn't matter for the V36 testnet share
fixture this file uses.

Also flip core/ltc link order to ltc/core throughout test/CMakeLists.txt
so static-link symbol resolution works regardless of ld pass mode
(ltc symbols reference core::timestamp + others, so core must come
after ltc on the link line for single-pass ld).
test_header_chain.cpp: 3 callsites of calculate_next_work_required hit
"ambiguous overload" because the using-directive (`using namespace
ltc::coin`) imports both ltc::coin and bitcoin_family::coin overloads
(the latter via ADL on the params arg). Qualify the calls explicitly
as ltc::coin::calculate_next_work_required to disambiguate. Verified:
35/35 tests pass.

test_hash_link.cpp: compute_gentx_before_refhash gained a
core::CoinParams& second arg in commit a94435e but the test still
called the 1-arg form. Add a static test_coin_params() helper backed
by ltc::make_coin_params(testnet=false) and thread it through both
callsites. Verified: 11/11 tests pass.

build.yml: temporarily exclude test_coin_broadcaster /
test_multiaddress_pplns / test_pplns_stress from the Build-tests step.
core/web_server.cpp grew direct calls into ltc::coin::NodeRPC and
c2pool::merged::MergedMiningManager, creating a static-link cycle
(core <-> ltc_coin, core <-> c2pool_merged_mining). Production
binaries build fine because user code (c2pool_refactored.cpp) directly
references symbols that drag the right .o files in via single-pass
ld; tests don't, so the unresolved refs in web_server.cpp.o stay.
Proper fix is architectural: extract LTC/MM-specific endpoints out
of core/ into their own translation unit (or split MiningInterface
into a coin-agnostic base + LTC subclass). Filed in
project_dash_test_rot_2026_04_25.md memory.
Previous commit (b2a985e) dropped the 3 cycle-broken tests from CI's
Build-tests target list, but their gtest_add_tests() registrations
were still in test/CMakeLists.txt. CI's "Run tests" step then tried
to run all 134 of their cases via ctest and reported them as
"Not Run" (executable doesn't exist on disk) → ctest exit 8.

Comment out add_executable / target_link_libraries / gtest_add_tests
for all 3, with a TOP-OF-FILE note pointing at memory:
project_dash_test_rot_2026_04_25.md for the architectural fix that
re-enables them. ctest target count: 580 → 473.
test_dash_credit_pool / test_dash_subsidy / test_dash_battletest_regressions
were added on this branch (commits 43ef108 + dca4f65) and registered
with gtest_add_tests(), but never added to the workflow's Build-tests
target list. CI's Run-tests step then ctest-invoked all 22 of their
cases against non-existent binaries → exit 8.
fast-check property test "parseSnapshot: output always has required keys
with correct types" found a counterexample where summing many
Number.MAX_VALUE-class miner amounts overflowed to Infinity, breaking
the Number.isFinite(snap.totalPrimary) invariant. Reproduced with seed
917071668 (CI run on 662b570).

Individual amounts pass through num() which already filters non-finite
values, but the reduce sum can still overflow. Replace the two reduce
sums (modern-shape fallback + legacy-shape) with a finiteSum() helper
that clamps to Number.MAX_VALUE on overflow.

Verified: seed 917071668 + 300 runs no longer reproduces the failure.
Multiple Dash MNs can share the same payoutAddress (operators running
multiple MNs to one wallet). Live-observed on mainnet:

  MN 7173b6a94bf9f448... payoutAddress=XjbaGWaGnvEtuQAUoBgDxJWe8ZNv45upG2
  MN 06a9ee248111bf6d... payoutAddress=XjbaGWaGnvEtuQAUoBgDxJWe8ZNv45upG2

apply_block Pass 3's find_by_payout_script returned the FIRST std::map
iteration match — deterministically the lower-hash MN (06a9ee24). Net
effect: every payment dashd correctly attributed to 7173b6a9 was
mis-attributed to 06a9ee24 in our state. 7173b6a9's nLastPaidHeight
stayed at the snapshot value forever (live: 2458528, vs dashd's
2460553). With find_expected_payee picking lowest-h MN, 7173b6a9
became permanently "starved" and won the projection every block —
producing a constant `expected` hash and 100% [PAY] MISMATCH against
dashd, which correctly rotated the two.

Confirmed via dashd protx info on mainnet (h=2460783):
  7173b6a9: lastPaidHeight=2460553 (dashd) vs 2458528 (us)
  06a9ee24: lastPaidHeight=2460575 (dashd, actually paid at h=2460575)
  both share payoutAddress XjbaGWaGnvEtuQAUoBgDxJWe8ZNv45upG2

Fix: new pick_paid_mn(script) member that mirrors dashd's
CompareByLastPaid_GetHeight ordering — when N MNs share a script, pick
the one with the lowest projected h (= the MN dashd's GetMNPayee would
have chosen at this height). Used in apply_block Pass 3 (state
mutation) and find_paid_in_block_first ([PAY] log).

Also reorder main_dash.cpp on_full_block: call find_paid_in_block_first
BEFORE apply_block so the lowest-h disambiguation runs against the
pre-apply state. Post-apply the just-paid MN has the highest h and
would lose to its colliding peers.

Pairs with e4c7c10 (UINT32_MAX wrap) and 03fa0aa (OOO-block guard)
to address all three known root causes of [PAY] MISMATCH on mainnet.

Includes a one-shot debug_dump_mn() diagnostic + throttled trigger in
main_dash.cpp at MISMATCH events. Will be removed once a clean ~1
week soak confirms the fix.
Defense-in-depth: refuse to roll nLastPaidHeight backwards in
apply_block Pass 3. Catches the original Bug 2 (03fa0aa) bug class
even if a future caller bypasses the outer OOO guard in main_dash.cpp.
Trivial guard, no functional change for the steady-state forward path.

Add test/test_dash_pay_attribution.cpp pinning all three soak-found
PAY bugs against future regression:
  Bug 1 — UINT32_MAX sentinel must not win find_expected_payee
  Bug 2 — Pass-3 idempotency: never roll lastPaid backwards
  Bug 3 — pick_paid_mn lowest-h disambiguation under shared scripts:
          - happy path (prefers lower-h MN over lower-hash MN)
          - revived-height precedence
          - never-paid uses registeredHeight
          - tiebreak by hash when h equal
          - banned MN excluded
7/7 pass locally.

The bug class would have been caught instantly by these tests had
they existed before the soak. Lesson noted; tests added.
The bootstrap pipeline was a UTXO-only pipeline. It pulls block bodies
for h=snapshot+1 .. tip via getdata, drains them in chain order, and
calls utxo->connect_block per block. **It never invoked the MN state
machine apply_block for those blocks.** Result: a snapshot at h=N +
restart at chain tip h=N+M leaves M blocks of MN payments unprocessed
in our state. Each of those payments updates dashd's lastPaidHeight
for the paid MN, but our state stays at the snapshot value forever.

Live mainnet observation: snapshot at h=2460550, restart at h=2460786.
236-block gap. MN 8bc76ca7a979ded6 was paid by dashd at h=2460551.
Our state stayed at lastPaid=2458526 (snapshot value). On every
subsequent block our find_expected_payee picked 8bc76ca7 (lowest h
in our projection) but dashd had already moved past it (lastPaid
=2460551 in dashd's view). Result: 100% [PAY] MISMATCH stuck on
8bc76ca7 for 222+ blocks.

Fix:
1. Bootstrap drain loop (main_dash.cpp on_full_block) now calls
   credit_pool->apply_block AND mn_state_machine->apply_block per
   drained block, in chain order. Same persistence + [PAY] verify
   semantics as the steady-state path; [PAY-BF] log throttled
   1-in-50 to keep bootstrap drain output readable.

2. mn_state_db::write_all is now monotonic-advance for best_height.
   The top-of-handler MN apply for the tip block runs BEFORE the
   drain (which catches up h=snapshot+1 .. tip-1 afterwards). Without
   this, drain's per-block write_all(snapshot, h, ...) would roll
   best_height back to the drain's current h. With monotonic-advance,
   entries are persisted but best_height never decreases.

Verification matrix (live mainnet shadow soak):
- Fresh snapshot @ tip (h=2460794), 0-block gap: 5/5 PAY MATCH
- Stale snapshot @ h=2460550, 236-block gap: pending soak
Was tracking only `build-qt/`, missing `build-spv/` and any other
`build-XXX/` cmake out-of-tree dirs. Also missed autoconf-generated
`configure~` files (e.g. external/dashbls/configure~) created by
autoreconf when regenerating configure scripts.
Self-review caught: credit_pool gets seeded at top-of-handler with the
TIP block's cbtx.creditPoolBalance. Drain then replays h=snapshot+1..tip
calling credit_pool->apply_block(b, h) for each block — adding each
backfill block's lock/unlock deltas to a balance that ALREADY reflects
all those deltas (it was seeded from the post-tip balance).

Net: credit_pool balance = B_tip + sum(deltas h=N+1..tip), should be
just B_tip. Every drain run double-counted the entire snapshot-to-tip
delta.

MN state apply in drain stays — it's correct (apply_block per drained
block in chain order, with Pass 3 idempotency safety net).

credit_pool catch-up in drain is a separate problem: needs the snapshot
to ALSO carry a seed balance at snapshot_height, so drain can re-seed
at h=snapshot_height before applying snapshot+1..tip deltas. Filing as
follow-up.
…shot floor

Two related bugs in the bootstrap-trigger logic surfaced during stale-
snapshot soak:

1. Stale-peer block triggers bootstrap with WRONG end_height
   First peer to push a block-body via inv/cmpct may push a stale
   tip (e.g. h=2430000 when the real chain tip is h=2460805).
   on_full_block computes height=2430000 from this block. Bootstrap
   trigger fires with end_height=2430000, start_from=2429712. Range
   [2429712..2430000] is 30000+ blocks before the real tip.
   Extension via the `if (height > end_height) end_height = height`
   path then makes the range balloon to 30000+ blocks total. At the
   16-block sliding window's pace, that's ~50h to drain.

   Fix: gate the trigger on `chain->height() <= height`. If chain
   has higher headers than this block, this block is stale relative
   to the real tip — defer trigger. Wait until a fresh-tip block-
   body arrives (the steady-state header_chain.set_on_tip_changed
   handler requests it via request_full_block(new_tip) once header
   sync hits the real tip).

2. UTXO bootstrap range doesn't cover MN state snapshot gap
   With utxo_db wiped (cold) and mn_state_db at snapshot h=N,
   bootstrap range was tip-DASH_KEEP..tip = 288 blocks. If snapshot
   is OLDER than tip-DASH_KEEP, the snapshot-to-(tip-DASH_KEEP) range
   is missed entirely → MN payments in that gap never apply.

   Fix: lower start_from to mn_snap_h+1 if it's older than the UTXO
   window. Log the override. UTXO replay over a wider range is safe;
   the rolling-DASH_KEEP undo window doesn't change.

Pairs with 9d61f8c (drain backfill MN apply). Together: bootstrap
range correctly spans both UTXO and MN state catch-up, drain processes
each block in chain order, MN state stays in sync with dashd.

Verification matrix update pending stale-snapshot soak rerun.
…shot

Previous trigger-gate (e5e498c: chain->height() <= height) wasn't
strong enough. When the chain header sync hadn't caught up to the real
tip yet, both chain->height() and the just-received block's height
were stale (e.g. both at h=2430000 when real tip was h=2460805). Gate
passed, bootstrap activated with stale range, MN catch-up never
covered the snapshot-to-tip gap.

Stronger gate: when we have a snapshot at h=N, only trigger bootstrap
once a block AT-OR-AFTER h=N arrives. Pre-snapshot blocks pushed by
peers are stale by definition (we already have authoritative MN state
covering up to h=N from the snapshot file). Defer until peers push us
a fresh-tip block.

Verification: 7/7 regression tests still green; stale-snapshot soak
rerun pending.
Race observed in stale-snapshot soak verification (5708d1a):
After bootstrap correctly fired with snapshot+tip range, drain
backfilled MN state in chain order. But the top-of-handler MN apply
also ran for tip blocks arriving DURING bootstrap — using stale
snapshot-era state, before drain caught up. Result: 2 transient
[PAY] MISMATCH at the bootstrap-to-steady-state boundary
(h=2460815, h=2460816), then clean MATCH from h=2460817 onwards.

Fix: gate top-of-handler MN apply on `!dash_bs->active`. The drain
loop's per-block apply handles all blocks during bootstrap (in chain
order, with [PAY-BF] log). Top-of-handler resumes for tip blocks
once bootstrap completes.

This is the same pattern as the existing UTXO logic which also
returns early when bootstrap is active. MN apply now follows suit.
Final cleanup of the bootstrap-to-steady-state boundary transients.
After d8cb58c eliminated mid-bootstrap races (top-apply skipped while
dash_bs->active=true), one transient remained: the FIRST at-or-past-
snapshot block that TRIGGERS bootstrap. It runs through top-of-handler
BEFORE bootstrap activates (dash_bs->active=false at that moment),
top MN apply runs against snapshot-era state, [PAY] log fires
MISMATCH. Then bootstrap activates and drain catches up.

Fix: detect "this block will TRIGGER bootstrap" early (replicate the
trigger-gate condition) and gate top MN apply on it too. The trigger
block goes into the bootstrap buffer for drain; drain's per-block
apply produces the correct [PAY-BF] log entry.

Implementation: hoisted DASH_KEEP, dash_bootstrap_done, mn_snap_h_pre
declarations to the top of the on_full_block handler so they're in
scope at both the MN-apply gate AND the bootstrap-trigger site.
Closes the Bug 5 (5efd257) follow-up: credit_pool catch-up during
bootstrap drain. Previously, drain skipped credit_pool->apply_block
because credit_pool was seeded at top-of-handler from the TIP block's
cbtx.creditPoolBalance — replaying h=N+1..tip on top would double-
count every backfill block's deltas.

Fix: extend the snapshot file format to carry credit_pool_balance at
snapshot_height. Loader seeds credit_pool with that value before drain
starts. Drain then applies h=snapshot+1..tip deltas correctly.

Wire format change (mn_snapshot.hpp):
- Bumped SNAPSHOT_VERSION 1 -> 2; SNAPSHOT_VERSION_V1 kept for
  backward-compat decoding of existing in-tree snapshots
- DmnSnapshot adds `int64_t credit_pool_balance{-1}` (-1 = "not in
  this snapshot"; loader treats as "do not seed")
- Encode appends 8-byte LE int64 trailer for v2 only
- Decode accepts BOTH v1 and v2; reads trailer when v2

RPC dumper (mn_snapshot_rpc.hpp):
- After fetching MN list, also `getblock <hash> 2` to get coinbase
  with cbTx; extract creditPoolBalance and store in snap. Failure is
  non-fatal (snapshot still valid as v2 with -1 sentinel).

Loader (main_dash.cpp):
- After snapshot file load: if credit_pool_balance >= 0 AND credit_pool
  not initialized (cold start), call credit_pool->seed() and
  credit_pool_db->write_state(). Logs the seed value.

Drain (main_dash.cpp):
- Re-enable credit_pool->apply_block + persist per drained block
  (gated on initialized()). The 5efd257 skip was correct for v1
  snapshots; with v2 seed, drain catch-up is safe.

Top-of-handler (main_dash.cpp):
- Add bootstrap-handling gate to credit_pool apply too (mirror of
  the MN gate from d8cb58c + 680f3c0). Prevents the same race
  during the bootstrap-to-steady-state boundary.

Existing in-tree snapshot (data/dash/dmn_snapshot_h2460249.dat) is
v1 and continues to load (no credit_pool seed; CCbTx-driven re-seed
at first new tip handles it as before). New dumps via
--dump-mn-snapshot produce v2 files with the seed.
Works around the static-link cycle introduced when core/web_server.cpp
grew direct calls into ltc::coin::NodeRPC and c2pool::merged::MergedMiningManager.
Wrapping `ltc_coin ltc core c2pool_merged_mining` in --start-group/--end-group
lets ld multi-pass-resolve the cyclic refs.

42/42 tests pass locally.

test_multiaddress_pplns + test_pplns_stress remain disabled — their
wider transitive deps (pool, sharechain, c2pool_storage, c2pool_payout,
c2pool_hashrate) cause CMake to inject duplicate libcore.a/libltc_coin.a
OUTSIDE the start-group, where ld can't multi-pass-resolve. Proper
architectural fix (extract LTC/MM endpoints out of core/web_server.cpp
into a separate translation unit) is still desirable but ~6h of work
touching the live LTC pool's mining hot path; deferred.
…le-archive

Both tests pull in core/web_server.cpp.o via MiningInterface usage, which
has unresolved refs to ltc::coin::NodeRPC::{getwork, submit_block_hex}.
The symbols ARE present in libltc_coin.a's rpc.cpp.o and the archive index
includes them, but ld's --start-group multi-pass evidently doesn't
re-extract rpc.cpp.o for those refs (subtle archive-scan ordering issue).

--whole-archive on libltc_coin.a forces all of rpc.cpp.o (and the rest)
into the link unconditionally, sidestepping the bug. Test binaries are
slightly larger as a result; production binaries link fine without this.

Validated on VM 211 (cold conan + cmake build):
  test_multiaddress_pplns: 31/31 PASSED
  test_pplns_stress:       17/17 PASSED

Adds both targets back to the CI Build-tests cmake --target list.

Drops the architectural-extraction TODO from CMakeLists.txt — the
fix is mechanical, not architectural, so we don't need to refactor
core/web_server.cpp at all.
Both VM 210 (Bug 3 soak) and VM 201 (Phase C-PAY soak) crashed within 24
seconds of each other on 2026-04-25 with [ERROR] vector::_M_default_append
(std::length_error from resize() exceeding max_size). Same trigger on two
unrelated peers (178.208.87.213 and 65.108.4.213) at the same wall-clock —
either coordinated malicious peers or a wave of malformed share-fetch replies.

Root cause: in src/impl/dash/share_chain.hpp the wire format reads
`pair_count` and `count` via VarInt(), which (per src/core/pack_types.hpp:266)
maps to ReadCompactSize(os, false) — `false` disables the 32 MiB range_check.
A malformed peer can send a 9-byte VarInt of UINT64_MAX. share_chain.hpp:82
then evaluates `pair_count * 2` (overflows to a different huge value) and
calls resize() on std::vector<uint64_t> whose max_size is ~2^60 — boom.

Fixes:
1. src/impl/dash/share_chain.hpp — cap m_packed_payments at 10000 entries
   and m_transaction_hash_refs at 100000 pairs. Excess throws ios_base::failure
   which the share parser catches cleanly without crashing the process.
2. src/core/socket.cpp — defense-in-depth cap of 32 MiB on the wire-format
   message_length before payload.resize(). Disconnects the offending peer
   cleanly on cap exceedance. Bitcoin Core uses 4 MiB; we use 32 MiB to
   accommodate Dash's larger mnlistdiff messages with headroom.
3. src/impl/dash/main_dash.cpp — enhance the top-level ioc.run() catch to
   log typeid(e).name() + a backtrace + drop crash log via the existing
   dash_write_crash_log() helper, mirroring the SIGSEGV handler from 2d33d09.
   Future "vector::_M_default_append"-style regressions will pinpoint the
   exact resize() site instead of needing source-grep.

The MAX_PAYMENTS_PER_SHARE = 10000 and MAX_TX_HASH_REF_PAIRS = 100000 caps
are well above any legitimate share (real-world: ~10-50 payouts; ~few hundred
tx-hash-ref pairs even worst-case).
Same class as the LTC fix in 2f9d3e1 — five HTTP cache callbacks in
main_dash.cpp held a blocking std::shared_lock on node.tracker_mutex().
When the dash compute thread holds the exclusive write lock for a
long think+clean cycle on a wedged sharechain, these would block the
io_context until the watchdog fires.

Sites converted to shared_lock(try_to_lock) with safe-default returns:

  - L1246  head_count   → fall back to snap.fork_count (functionally equivalent)
  - L1296  window_fn    → empty json::object (CacheEntry holds previous)
  - L1395  tip_fn       → std::nullopt (typed signature, consumer renders empty tip)
  - L1420  delta_fn     → empty json::object (next poll picks up)
  - L1518  lookup_fn    → {"error":"tracker busy, retry"}

The 4 remaining shared_lock sites at L1713/1720/1737/1760 are inside
the PPLNS precompute std::thread (its own dedicated thread, not the
io_context).  Blocking there only stalls the precomputer itself; no
freeze risk.  Left as-is — blocking is correct for that thread.
The vector::_M_default_append crashes recurring after eb0f03f's
share_chain.hpp + socket.cpp caps were diagnosed via __cxa_throw
LD_PRELOAD shim. Throw site:

  core::Socket::init()    [main_dash.cpp + socket.hpp inlined]
  → make_shared<Packet>(m_node->get_prefix().size())
  → Packet ctor: prefix.resize(prefix_length)
  → std::length_error

m_node is a raw ICommunicator* held by Socket. On rapid
disconnect-reconnect (Bug-3-family lifecycle), get_prefix() can be
called on a freed object and reads garbage as the vector size.
The resulting resize() call exceeds max_size and throws — escaping
to ioc.run() and killing the process via the top-level catch in
main_dash.cpp:4453.

Fixes:
1. src/core/packet.hpp — Packet ctor now caps prefix_length at 16
   (every protocol uses a 4-byte magic prefix; 16 is conservative).
   Throws ios_base::failure on cap exceedance.
2. src/core/socket.hpp — Socket::read() catches the make_shared
   exception locally and aborts the connection cleanly instead of
   letting it propagate to ioc.run() and kill the process.

Validated with the LD_PRELOAD __cxa_throw shim:
  CXA-CAPTURE 2026-04-27 12:34:53 UTC St12length_error thrown:
    Socket::init() at +0x107ec6
    connect_socket lambda at +0x12bfa9
    asio::range_connect_op::process at +0x1d47d5
    ...

Note: this band-aids the symptom (UAF garbage → length_error). The
underlying lifecycle issue (raw m_node ptr in Socket while owning
node may be destroyed) remains; a proper fix would route m_node
through weak_ptr<ICommunicator> in the same shape as the Bug 3 fix
on NodeP2P. That refactor is deferred — the cap is the immediate
unblock so the soak window can resume.
Replaces the band-aid Packet prefix_length cap from 0f91b49 with a
fundamental lifecycle fix mirroring the c42d0f5 factory-level pattern,
applied at the Socket layer where the actual UAF lives.

Diagnosis: the throw-site backtrace captured via __cxa_throw LD_PRELOAD
shim showed core::Socket::init() → make_shared<Packet>(prefix_length)
where prefix_length = m_node->get_prefix().size(). m_node was a raw
ICommunicator* that survives across async-read callbacks but isn't
kept alive by them. Subsequent ASYNC_READ chains (read_prefix,
read_command, read_length, read_checksum, read_payload) only capture
[self = shared_from_this()] — keeping the Socket alive but not the
node. Once the Factory async lambda returns and its strong_node lock
goes out of scope, m_node can dangle.

Production rate: ~14k cap firings/day per VM (every ~6s) on Phase C-PAY
soak (VM 201) + Bug 3 soak (VM 210). Each firing wastes one outbound
TCP connection, leaving the soak under-peered (5 sharechain peers vs
typical 15-20) and the share-fetch path effectively dead.

Fix:
1. src/core/socket.hpp + .cpp — Socket holds weak_ptr<INetwork>
   m_node_lifetime alongside the cached ICommunicator* m_node. Dual-mode
   bool m_was_managed distinguishes Dash NodeP2P (post-c42d0f5c
   make_shared, lifetime-tracked) from legacy LTC/DOGE pool nodes
   (raw, untracked). The acquire_node() helper locks the weak_ptr at
   every async-callback entry; on managed-but-expired, the connection
   aborts cleanly via abort_connection() instead of dereferencing m_node.
   For unmanaged nodes, was_managed=false skips the check, preserving
   prior behavior.

   ASYNC_READ macro updated to do the lock-or-bail at every callback
   entry; strong_node lifetime extends through the user-supplied
   handler scope so m_node access inside is safe.

   Socket ctor + init() + read() + write() moved out-of-line to .cpp
   where INetwork is complete (forward-declared in .hpp to avoid
   circular include with factory.hpp).

2. .github/workflows/build.yml — new linux-asan job builds with
   -fsanitize=address,undefined -fno-sanitize=vptr (vptr disabled
   because leveldb's typeinfo isn't visible). continue-on-error: true
   initially so reports surface in PR checks without blocking merges
   while we work through the audit. Will flip to required (Phase 7)
   once known UAFs are fixed.

   Sanitizer build_type must be Release (not RelWithDebInfo) to match
   the conan_install --settings=build_type=Release; otherwise the
   $<$<CONFIG:Release>:...> generator expression in conan-generated
   *-Target-release.cmake silently drops every conan dep's include path.

Validation:
- c2pool + c2pool-dash both build clean
- All previously-passing unit tests still pass (87/87 dash+share+
  hardening+utxo+threading+coinbroadcaster+multiaddress)
- Pre-existing test_v36_cross_impl_refhash link issue unchanged
- LTC pool path uses unmanaged-node fallback; behavior identical to
  pre-fix — no risk to .20/.40 LTC mainnet
- 0f91b49's Packet cap + Socket::read try-catch retained as
  belt-and-braces defense-in-depth

Per design doc: frstrtr/the/docs/c2pool-socket-lifecycle-fundamental-fix.md
Memory: project_dash_socket_lifecycle_fundamental_fix.md
AsAN run on VM 210 (Phase 6b validation of c558fe9's Socket fix)
caught a separate use-after-free in core::Timer that's been silent in
production. Same Bug-3 family (async callback outliving captured
object), different code site:

  heap-use-after-free in core::Timer::logic() lambda at timer.hpp:37
  freed by ResponseWrapper dtor → unique_ptr<Timer> dtor
  triggered from reply_matcher.hpp:92 inside m_handler() invocation

Sequence:
  1. Matcher::request() creates a Timer (unique_ptr) inside a
     ResponseWrapper, stored in std::map keyed by request hash
  2. Timer::logic() schedules an asio::async_wait with a lambda that
     captures *this* by reference [&,...]
  3. Timer fires (ec=0). Lambda calls m_handler() (the user reply
     callback)
  4. Inside m_handler(), the matcher erases the map entry → destroys
     ResponseWrapper → destroys Timer
  5. m_handler() returns. Lambda accesses m_repeat (via &-capture) on
     the freed Timer → UAF on the next-line `if (m_repeat && ...)`

Minimal fix:
- Capture m_repeat by VALUE alongside the existing destroyed shared_ptr
- Re-check *destroyed AFTER m_handler() returns before any
  this-relative access

This pairs with c558fe9's Socket weak_ptr<INetwork> fix as part of
the same Bug-3-family audit. The full enable_shared_from_this refactor
of Timer (matching the Socket pattern) is deferred to Phase 5 of the
fundamental fix plan — touches every Timer construction site across
LTC + Dash + RPC; the minimal fix is sufficient to stop the bleeding.

Validated locally; full re-validation on VM 210 AsAN under way.

Per design doc: frstrtr/the/docs/c2pool-socket-lifecycle-fundamental-fix.md
…r UAF)

8h after deploying c558fe9 (Socket weak_ptr<INetwork>) + 0f594e0 (Timer
UAF cap), the AsAN canary on VM 211 surfaced a NEW heap-use-after-free
in the same Bug-3 family at a different code site:

  READ at: src/core/socket.cpp:140 (operator==(prefix vectors))
  FREED by: std::default_delete<dash::DashBroadcastPeer>
            from std::map::erase
    in dash::DashCoinBroadcaster::disconnect_peer at broadcaster_full.hpp:492
    called from prune_dead_locally() at broadcaster_full.hpp:576
    called from do_maintenance() at broadcaster_full.hpp:531

VM 210 (Release binary) crashed with SIGSEGV at the same tick (14:36 UTC,
~14 min after VM 211's AsAN trip) — same UAF, undefined-behavior path on
Release manifests as a segfault.

Why c558fe9 didn't catch it: that fix protects NodeP2P's lifetime via
weak_ptr<INetwork>.lock() on every async-callback entry. NodeP2P stays
alive past peer erase. But NodeP2P held m_config and m_coin as RAW
POINTERS into DashBroadcastPeer's by-value `config` and `coin_node`
members. When m_peers.erase() destructs the peer, those raw pointers
dangle. Socket's read_prefix callback then calls m_node->get_prefix()
which returns a reference into freed memory — AsAN UAF.

Fix: NodeP2P TAKES OWNERSHIP of coin and config so their lifetime is
tied to NodeP2P's. New ctor accepts unique_ptr<dash::interfaces::Node>
and unique_ptr<config_t>; legacy raw-pointer ctor preserved for callers
that guarantee parent lifetime (e.g. tests). DashBroadcastPeer no longer
holds coin_node/config as direct members; broadcaster wires event
callbacks via peer->node_p2p->coin()->X.

After this:
  m_peers.erase(key) -> shared_ptr<NodeP2P> count drops by 1
  Socket strong_node still holds NodeP2P alive (refcount > 0)
  m_coin_owned + m_config_owned stay alive (NodeP2P members)
  get_prefix() returns reference into LIVE memory -> safe.

LTC's broadcaster (c2pool/merged/coin_broadcaster.hpp) uses a separate
template instance (ltc::coin::p2p::NodeP2P) and is unchanged — same bug
pattern is present there but LTC peer churn has not exhibited it. Same
fix can be applied if/when observed.

Build: c2pool-dash AsAN target builds clean. Deploy to VM 211 next; ~24h
soak required to confirm UAF class is fully closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants