Skip to content

stdcoroutine-2: Migrate production entry points from Boost.Coroutine to C++20 coroutines#6423

Draft
pratikmankawde wants to merge 22 commits intopratik/Swtich-to-std-coroutinesfrom
pratik/std-coro/migrate-entry-points
Draft

stdcoroutine-2: Migrate production entry points from Boost.Coroutine to C++20 coroutines#6423
pratikmankawde wants to merge 22 commits intopratik/Swtich-to-std-coroutinesfrom
pratik/std-coro/migrate-entry-points

Conversation

@pratikmankawde
Copy link
Collaborator

@pratikmankawde pratikmankawde commented Feb 25, 2026

High Level Overview of Change

Migrate all production postCoro() call sites to postCoroTask() using C++20 coroutine lambdas, as part of the Boost.Coroutine → std coroutines migration. This eliminates the Context::coro field and replaces the RipplePathFind yield/post/resume pattern with a simpler std::condition_variable approach.

Context of Change

This is Milestone 2 of the Boost.Coroutine → C++20 coroutines migration. Milestone 1 (pratik/std-coro/add-coroutine-primitives) introduced the new coroutine primitives (CoroTask<T>, CoroTaskRunner, JobQueueAwaiter, postCoroTask()). This milestone migrates all production code to use them.

  • Milestone 1 (pratik/std-coro/add-coroutine-primitives): Added CoroTask<T>, CoroTaskRunner, JobQueueAwaiter, postCoroTask()
  • Milestone 2 (pratik/std-coro/migrate-entry-points): Migrated all production entry points
  • Milestone 3 (pratik/std-coro/migrate-test-code): Migrated all test code
  • Milestone 4 (this PR): Delete old code and build dependencies

Key design decision: Rather than "coloring" the entire processSession → processRequest → doCommand → callMethod → doRipplePathFind call chain as coroutines (which C++20 stackless coroutines would require for co_await to work at any depth), we replace RipplePathFind's yield()/post()/resume() pattern with a local std::condition_variable that blocks the thread until path-finding completes. This is correct, simple, and bounded by existing path-finding timeouts.

Since context.coro has no remaining readers after this change, the field is removed entirely from RPC::Context.

Type of Change

  • Refactor (non-breaking change that only restructures code)

API Impact

  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)

Before / After

Entry points (ServerHandler::onRequest, onWSMessage, GRPCServer::CallData::process):

// Before:
m_jobQueue.postCoro(jtCLIENT_RPC, "RPC-Client",
    [this, session](std::shared_ptr<JobQueue::Coro> coro) {
        processSession(session, coro);
    });

// After:
m_jobQueue.postCoroTask(jtCLIENT_RPC, "RPC-Client",
    [this, session](auto) -> CoroTask<void> {
        processSession(session);
        co_return;
    });

RipplePathFind handler:

// Before: yield/post/resume via Boost.Coroutine
jvResult = context.app.getPathRequests().makeLegacyPathRequest(
    request,
    [&context]() {
        std::shared_ptr<JobQueue::Coro> coroCopy{context.coro};
        if (!coroCopy->post())
            coroCopy->resume();
    },
    context.consumer, lpLedger, context.params);
if (request) {
    context.coro->yield();
    jvResult = request->doStatus(context.params);
}

// After: condition_variable blocking wait
std::mutex mtx;
std::condition_variable cv;
bool pathDone = false;
jvResult = context.app.getPathRequests().makeLegacyPathRequest(
    request,
    [&]() { { std::lock_guard lk(mtx); pathDone = true; } cv.notify_one(); },
    context.consumer, lpLedger, context.params);
if (request) {
    std::unique_lock lk(mtx);
    cv.wait(lk, [&] { return pathDone; });
    jvResult = request->doStatus(context.params);
}

Future Tasks

  • Milestone 3: Migrate old coroutine tests (Coroutine_test.cpp, JobQueue_test.cpp) to postCoroTask
  • Milestone 4: Delete Coro.ipp, remove old Coro class and postCoro() from JobQueue.h, remove Boost.Coroutine from CMake/Conan dependencies

…iter

Introduce the core building blocks for migrating from Boost.Coroutine to
C++20 stackless coroutines (Milestone 1):

- CoroTask<T>: RAII coroutine return type with promise_type, symmetric
  transfer via FinalAwaiter, and lazy start (suspend_always)
- CoroTaskRunner: Lifecycle manager (nested in JobQueue) mirroring the
  existing Coro class — handles LocalValues swap, nSuspend_ accounting,
  mutex-guarded resume, and join/post semantics
- JobQueueAwaiter: Convenience awaiter combining suspend + auto-repost,
  with graceful fallback when JobQueue is stopping
- postCoroTask(): JobQueue entry point for launching C++20 coroutines
- CoroTask_test.cpp: 8 unit tests covering completion, suspend/resume
  ordering, LocalValue isolation, exception propagation, and shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde added the DraftRunCI Normally CI does not run on draft PRs. This opts in. label Feb 25, 2026
This change replaces `void const*` by `uint256 const&` for database fetches.

Object hashes are expressed using the `uint256` data type, and are converted to `void *` when calling the `fetch` or `fetchBatch` functions. However, in these fetch functions they are converted back to `uint256`, making the conversion process unnecessary. In a few cases the underlying pointer is needed, but that can then be easy obtained via `[hash variable].data()`.
@pratikmankawde pratikmankawde added the StdCoroutineSwitch Boost to Std Coroutine Switch label Feb 26, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 3ee0341 to 9d79af8 Compare February 26, 2026 12:38
@pratikmankawde pratikmankawde changed the title Migrate production entry points from Boost.Coroutine to C++20 coroutines stdcoroutine-2: Migrate production entry points from Boost.Coroutine to C++20 coroutines Feb 26, 2026
clang-format: collapse single-line initializer lists and function
arguments. prettier: add blank lines in markdown lists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 9d79af8 to f451347 Compare February 26, 2026 13:18
Store the coroutine callable on the heap in CoroTaskRunner::init()
via a type-erased FuncStore wrapper. Coroutine frames store a
reference to the callable's implicit object parameter (the lambda);
if the callable is a temporary, that reference dangles after the
caller returns. This caused stack-use-after-scope (ASAN), assertion
failures, and hangs across multiple compilers.

Also fix expectEarlyExit() to destroy the coroutine frame when
postCoroTask() fails, breaking a potential shared_ptr cycle.

Switch all coroutine test lambda captures from [&] to explicit
pointer-by-value as defense-in-depth against GCC 14 coroutine
frame corruption. Add value-returning CoroTask<T> tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from f451347 to ecc6093 Compare February 26, 2026 14:13
godexsoft and others added 4 commits February 26, 2026 18:26
This change enables all clang-tidy checks that are already passing. It also modifies the clang-tidy CI job, so it runs against all files if .clang-tidy changed.
This change adjusts the CI tests to make it easier to spot errors, without needing to sift through the thousands of lines of output.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from ecc6093 to 7f99b08 Compare February 27, 2026 17:10
After a coroutine completes, the frame remains alive holding a captured
shared_ptr<CoroTaskRunner> back to its owner. This creates an unreachable
cycle: runner -> task_ -> frame -> shared_ptr<runner>.

Break the cycle in resume() by destroying the coroutine frame (task_ = {})
and the stored callable when the coroutine is done. Also fix runnable() to
handle the null-handle state after cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 7f99b08 to f7fd476 Compare February 27, 2026 17:21
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 78.46890% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.8%. Comparing base (65e63eb) to head (666cafc).
⚠️ Report is 3 commits behind head on pratik/Swtich-to-std-coroutines.

Files with missing lines Patch % Lines
include/xrpl/core/CoroTaskRunner.ipp 80.3% 14 Missing ⚠️
src/xrpld/rpc/handlers/RipplePathFind.cpp 0.0% 9 Missing ⚠️
include/xrpl/core/CoroTask.h 93.1% 5 Missing ⚠️
src/xrpld/app/main/GRPCServer.cpp 0.0% 4 Missing ⚠️
include/xrpl/core/JobQueue.h 87.5% 2 Missing ⚠️
include/xrpl/core/JobQueueAwaiter.h 80.0% 2 Missing ⚠️
src/libxrpl/nodestore/backend/MemoryFactory.cpp 33.3% 2 Missing ⚠️
src/libxrpl/nodestore/backend/NuDBFactory.cpp 66.7% 2 Missing ⚠️
src/libxrpl/nodestore/backend/NullFactory.cpp 0.0% 2 Missing ⚠️
src/libxrpl/nodestore/backend/RocksDBFactory.cpp 71.4% 2 Missing ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@                       Coverage Diff                        @@
##           pratik/Swtich-to-std-coroutines   #6423    +/-   ##
================================================================
  Coverage                             79.8%   79.8%            
================================================================
  Files                                  848     851     +3     
  Lines                                67763   67921   +158     
  Branches                              7578    7555    -23     
================================================================
+ Hits                                 54077   54218   +141     
- Misses                               13686   13703    +17     
Files with missing lines Coverage Δ
include/xrpl/nodestore/Backend.h 14.3% <ø> (ø)
src/libxrpl/nodestore/DatabaseRotatingImp.cpp 63.9% <100.0%> (ø)
src/xrpld/app/main/Application.cpp 70.0% <ø> (ø)
src/xrpld/app/main/GRPCServer.h 100.0% <ø> (ø)
src/xrpld/rpc/ServerHandler.h 100.0% <ø> (ø)
src/xrpld/rpc/detail/ServerHandler.cpp 87.6% <100.0%> (-0.2%) ⬇️
src/libxrpl/nodestore/DatabaseNodeImp.cpp 39.5% <50.0%> (+4.6%) ⬆️
include/xrpl/core/JobQueue.h 92.9% <87.5%> (-7.1%) ⬇️
include/xrpl/core/JobQueueAwaiter.h 80.0% <80.0%> (ø)
src/libxrpl/nodestore/backend/MemoryFactory.cpp 67.6% <33.3%> (-0.5%) ⬇️
... and 7 more

... and 5 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replace `task_ = {}` with `std::move(task_)` in resume() and
expectEarlyExit(). The move assignment operator calls
handle_.destroy() while task_.handle_ still holds the old (now
dangling) handle value. If frame destruction triggers re-entrant
runner cleanup on GCC-12, the destructor sees a non-null handle_
and destroys the same frame again — a double-free.

std::move(task_) immediately nulls task_.handle_ via the move
constructor, then the frame is destroyed when the local goes out
of scope. This eliminates the re-entrancy window.

Also remove storedFunc_.reset() from resume() — the callable does
not participate in the shared_ptr cycle and will be cleaned up by
the runner's destructor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from f7fd476 to 75b8ba0 Compare February 27, 2026 18:39
Change postCoroTask from async post() to synchronous resume() for the
initial coroutine dispatch. The async approach created a timing-dependent
race during Env destruction where the coroutine frame's shared_ptr
reference cycle could be broken in an indeterminate order, causing the
debug-only finished_ assertion to fire non-deterministically on GCC-12
and GCC-15 debug builds.

The synchronous resume runs the coroutine body to its first suspension
point (co_await) or completion (co_return) on the caller's thread,
ensuring the coroutine state is determinate before postCoroTask returns.
Subsequent resumes still happen on worker threads via post().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 75b8ba0 to d9d72d5 Compare February 27, 2026 20:14
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from d9d72d5 to 7c510de Compare February 27, 2026 23:19
Revert the initial coroutine dispatch from synchronous resume() to async
post(). The synchronous approach ran the coroutine body on the caller's
thread, swapping in the coroutine's LocalValues. When coroutines mutated
LocalValues (e.g. thread_specific_storage), those mutations bled back
into the caller's thread-local state after the swap-out, corrupting
unrelated tests (Book, Subscribe) sharing the same thread pool.

Async post() dispatches the coroutine to a JobQueue worker thread whose
LocalValues are managed by the thread pool, not by the caller. The
original assertion issue that motivated sync resume was a symptom of the
shared_ptr cycle and double-free bugs that have since been fixed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 7c510de to 677c982 Compare February 28, 2026 10:47
Add join() before the finished_ assertion in the destructor. With async
dispatch, the coroutine runs on a worker thread. A gate signal inside
the coroutine body can wake the test thread before resume() sets
finished_. The test thread then triggers Env destruction, and the
runner's destructor fires while finished_ is still false.

join() establishes a happens-before edge via mutex_run_:
  finished_ = true → unlock(mutex_run_) in resume()
  → lock(mutex_run_) in join() → read finished_

This guarantees finished_ is visible when the assertion checks it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 677c982 to 28fe438 Compare February 28, 2026 11:38
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from 28fe438 to bdd3659 Compare February 28, 2026 12:50
pratikmankawde and others added 6 commits February 28, 2026 13:47
When post() is called from within the coroutine body (via JobQueueAwaiter),
two resume operations can overlap: R1 is still running while R2 is queued.
With a boolean running_ flag, R1's cleanup (running_=false) clobbers R2's
pending state, causing join() to return prematurely on ARM64.

Replace bool running_ with int runCount_: post() increments before enqueue,
resume() decrements after completion. join() waits for runCount_==0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all postCoro() call sites with postCoroTask() using C++20
coroutine lambdas. The key changes are:

- Remove Context::coro field (shared_ptr<JobQueue::Coro>) from
  RPC::Context, eliminating it from all aggregate initializations
- Replace RipplePathFind's yield/post/resume pattern with a local
  std::condition_variable that blocks until path-finding completes,
  avoiding colored-function infection across the RPC call chain
- Switch ServerHandler entry points (onRequest, onWSMessage) from
  postCoro to postCoroTask with co_return lambdas
- Switch GRPCServer::CallData::process() to use postCoroTask,
  rename private handler to processRequest()
- Update Path_test and AMMTest to use postCoroTask (they set
  context.coro which no longer exists)

The old postCoro() API remains available for Coroutine_test and
JobQueue_test, which will be migrated in a subsequent commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the extra {} that was for the now-deleted Context::coro field
in the RPC::JsonContext construction in Application::startGeometry().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
clang-format and prettier auto-formatting adjustments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cppcoro, fcontext, gantt, pratik, repost, stackful to
cspell.config.yaml to fix cspell check failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/std-coro/migrate-entry-points branch from bdd3659 to 666cafc Compare February 28, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DraftRunCI Normally CI does not run on draft PRs. This opts in. StdCoroutineSwitch Boost to Std Coroutine Switch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants