Skip to content

[DO NOT MERGE] Partial sync mode#654

Draft
sublimator wants to merge 18 commits intodevfrom
partial-sync-mode
Draft

[DO NOT MERGE] Partial sync mode#654
sublimator wants to merge 18 commits intodevfrom
partial-sync-mode

Conversation

@sublimator
Copy link
Collaborator

No description provided.

Allows RPC handlers to query ledgers that are still being acquired,
enabling faster node startup for read queries.

Key changes:
- Add coroutine detection via getCurrentCoroPtr() in LocalValue.h
- Add postAndYield() to Coro for safe poll-wait synchronization
- Modify SHAMap::finishFetch() to poll-wait for missing nodes when
  in coroutine context (30s timeout, re-requests missing nodes)
- Add getPartialLedger() to InboundLedgers for accessing incomplete
  ledgers that have headers
- Add getLastValidatedLedger() to LedgerMaster to get validated
  ledger hash even when not fully synced
- Update RPCHelpers getLedger() to fall back to partial ledgers
- Fix Manifest seq_++ bug for new manifest entries
- Add zero hash guard in NetworkOPs::checkLastClosedLedger()

Note: This is a proof of concept. Production use would require
fetch prioritization to make queries fast enough to be practical.
Partial sync mode improvements for faster RPC queries during sync:

- Track network-observed ledger from any validation (not just trusted)
  to allow queries before trusted validators are configured
- Add priority node fetching: queries can request specific nodes be
  fetched immediately via addPriorityNode/addPriorityHash
- Store state/transaction nodes directly to node store (not fetch pack)
  so partial sync queries find them immediately
- Add poll-wait loops in RPCHelpers for ledger header acquisition
- Replace postAndYield with sleep_for in SHAMap finishFetch
- Implement linear backoff for re-requests (50ms increments, max 2s)
Call updateTrusted() immediately when all publisher lists become
available in applyListsAndBroadcast(), rather than waiting for
beginConsensus(). This allows validations to be trusted within
milliseconds of VL fetch instead of waiting 14+ seconds for
consensus to start.

Also adds debugging logs:
- PartialSync journal: untrusted validations during startup
- PartialSync journal: checkAccept quorum details
- ValidatorSite journal: VL fetch timing
Yields the coroutine and schedules resume after delay, freeing up
the job queue thread instead of blocking it with sleep_for().

Updated SHAMap::finishFetch() and RPCHelpers getLedger() to use
coro->sleepFor() for partial sync poll-wait loops.
Adds a new RPC handler that submits transactions and waits for
validated results, designed for nodes still syncing:

- Broadcasts raw tx to network without local state validation
- Indexes tx hashes from incoming txMap leaf nodes for fast lookup
- Polls for tx in partial ledgers, then waits for validation quorum
- Only returns when numTrustedForLedger >= quorum (truly validated)

Supporting changes:
- Add hasTx()/knownTxHashes_ to InboundLedger for tx tracking
- Add findTxLedger() to InboundLedgers to search across ledgers
- Add broadcastRawTransaction() to NetworkOPs for blind relay
- Add coroutine-local fetchTimeout to LocalValue.h
- SHAMap::finishFetch() now uses configurable timeout
Simplify TX priority mechanism using RangeSet instead of per-TX hash
tracking. When submit_and_wait is called, it registers a ledger range
where TX nodes should be fetched before state nodes.

Key changes:
- Add prioritizeTxForLedgers(start, end) and isTxPrioritized(seq)
  to InboundLedgers using RangeSet<uint32_t>
- InboundLedger::trigger() checks range to decide TX-before-state order
- Remove complex per-TX hash tracking that couldn't help due to
  Merkle tree structure (need parent hashes to request children)
- Format CMake and source files
When the node is synced and receiving transactions via gossip,
ledgers are built locally and won't be in InboundLedgers. Now
checks both:
- InboundLedgers (partial sync mode - ledgers from peers)
- LedgerMaster (synced mode - ledgers built from gossip)
# Conflicts:
#	Builds/CMake/RippledCore.cmake
#	src/ripple/app/misc/impl/Manifest.cpp
Previously finishFetch() entered the poll-wait loop for any coroutine
context, causing unit tests to spin for 30s on missing nodes with no
network to deliver them. Now requires explicit setPartialSyncWait(true)
from partial sync code paths (RPCHelpers, SubmitAndWait).
Merges 299 commits from origin/dev including the major rippled 2.4.0
repository restructure (src/ripple/ -> src/xrpld/ + include/xrpl/,
Builds/CMake/ -> cmake/, shards removal).

Resolved conflicts in 10 files, preserving all partial sync additions:
- Coro.ipp: kept our #include <thread> with new paths
- SHAMap.cpp: kept LocalValue.h, JobQueue.h, chrono includes
- NodeFamily.cpp: kept enhanced logging and addPriorityNode call
- InboundLedgers.cpp: kept RangeSet.h include with new paths
- NetworkOPs.cpp: updated beginConsensus to include clog param
- RPCHelpers.cpp: kept InboundLedgers.h, LocalValue.h includes
  and commented-out sync validation (updated for removed reporting())
- Handler.cpp: added submit_and_wait to new alphabetical handler list
- Handler.h: kept @@Markers, updated to NO_CONDITION check
- SHAMapInnerNode.h: kept @@Markers, added getBranchCount()
- ordering.txt: accepted upstream restructured dependency graph
… rescan

- Replace std::thread::detach() in Coro::sleepFor() with
  boost::asio::steady_timer on the existing io_service thread pool.
  Previously each 50ms poll spawned a new detached thread (~600 per
  missing node over 30s timeout).

- Clear expired TX-priority ranges in InboundLedgers::sweep() for
  sequences at or below the current validated ledger, preventing
  unbounded growth of txPriorityRange_.

- Track lastCheckedSeq in submit_and_wait to only scan new validated
  ledgers each poll iteration, eliminating O(n*polls) rescan overhead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant