Skip to content

Add host dispatcher#4041

Draft
wsttiger wants to merge 8 commits intoNVIDIA:features/cudaq.realtimefrom
wsttiger:cudaq_realtime_host_dispatcher_sandbox
Draft

Add host dispatcher#4041
wsttiger wants to merge 8 commits intoNVIDIA:features/cudaq.realtimefrom
wsttiger:cudaq_realtime_host_dispatcher_sandbox

Conversation

@wsttiger
Copy link

Summary

This PR refines the host-side dispatcher backend and adds end-to-end test coverage for the GRAPH_LAUNCH dispatch path.

Restrict host dispatcher to GRAPH_LAUNCH only — The host loop now only dispatches GRAPH_LAUNCH entries; HOST_CALL and DEVICE_CALL slots are dropped (cleared and advanced). Removes the unused dispatch_host_call path and updates comments/headers to reflect the GRAPH_LAUNCH-only design.

Add host dispatcher tests + external mailbox support — New test file test_host_dispatcher.cu with two tests:

  • Smoke test: starts the host loop via the C API, sends an RPC with an unknown function_id, and verifies the slot is silently dropped.
  • GRAPH_LAUNCH round-trip: full end-to-end test through the C API — allocates a pinned mailbox, captures an increment graph, wires the dispatcher, sends an RPC {0,1,2,3}, and asserts the graph produces {1,2,3,4} in-place.

New C API: cudaq_dispatcher_set_mailbox — Lets callers provide a caller-managed pinned (cudaHostAllocMapped) mailbox. This is required for GRAPH_LAUNCH because the graph must be captured with the device-side mailbox pointer before the dispatcher starts, and the internal allocation (plain new) is not device-visible. When no external mailbox is provided, the C API falls back to internal allocation (backward compatible).

Files changed

File Change
cudaq_realtime.h Add cudaq_dispatcher_set_mailbox decl; external_mailbox param on start thread
host_dispatcher.h Comments: GRAPH_LAUNCH only
host_dispatcher.cu Remove HOST_CALL/DEVICE_CALL branches; single drop path for non-GRAPH
host_dispatcher_capi.cu owns_mailbox flag; accept/skip external mailbox
cudaq_realtime_api.cpp Store + pass h_mailbox_bank through to start thread
unittests/CMakeLists.txt test_host_dispatcher target
unittests/test_host_dispatcher.cu New: smoke test + GRAPH_LAUNCH round-trip test

Test plan

  • HostDispatcherSmokeTest.DropsSlotWithUnknownFunctionId — passes
  • HostDispatcherGraphLaunchTest.FullRpcRoundTripViaPinnedMailbox — passes
  • test_dispatch_kernel still builds and is unaffected

Description

…me API

- Extend C API (cudaq_realtime.h): add cudaq_backend_t (DEVICE_KERNEL /
  HOST_LOOP), optional ringbuffer host pointers, CUDAQ_DISPATCH_HOST_CALL
  and cudaq_host_rpc_fn_t in function entries, and host-dispatcher C API
  (cudaq_host_dispatcher_start_thread / cudaq_host_dispatcher_stop).
- Add host_dispatcher.h with HostDispatcherConfig, HostDispatchWorker
  (graph_exec, stream, function_id), and host_dispatcher_loop declaration.
- Implement host_dispatcher_loop in host_dispatcher.cu: parse RPC by
  function_id, dispatch HOST_CALL (inline callback) or GRAPH_LAUNCH (worker
  pool + mailbox), with helpers for parse, acquire worker, and launch graph.
- Implement host_dispatcher_capi.cu: allocate state and worker pool from
  function table, build HostDispatcherConfig, spawn thread running
  host_dispatcher_loop; stop joins thread and frees resources.
- In cudaq_realtime_api.cpp: backend-aware validation (host backend
  requires ringbuffer host view, no launch_fn); start() spawns host thread
  when backend is HOST_LOOP; stop()/destroy and get_processed() handle host
  backend; link cudaq-realtime to cudaq-realtime-host-dispatch.
- Build: add libcudaq-realtime-host-dispatch (host_dispatcher.cu +
  host_dispatcher_capi.cu) and link it from libcudaq-realtime.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Drop HOST_CALL and DEVICE_CALL in the host loop (clear slot and advance).
Remove dispatch_host_call and document that only GRAPH_LAUNCH is
dispatched for the host backend.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Add test_host_dispatcher.cu with two tests for the HOST_LOOP backend:
- Smoke test: verifies the host loop drops slots with unknown function_id
- GRAPH_LAUNCH round-trip: exercises the full C API dispatch path with a
  pinned mailbox, verifying that an RPC request is matched to a graph,
  launched via the worker pool, and produces the correct in-place response

To support this, add cudaq_dispatcher_set_mailbox() so callers can provide
a pinned (cudaHostAllocMapped) mailbox before start(). This lets graphs be
captured with the device-side mailbox pointer ahead of time—the existing
internal allocation (plain host memory) is not device-visible. The C API
falls back to internal allocation when no external mailbox is set.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bmhowe23 bmhowe23 changed the base branch from main to features/cudaq.realtime February 25, 2026 00:20
…t tests

Add cudaq_host_ringbuffer_* helper functions to the public C API that
encapsulate the RPC wire format, slot signalling, and tx_flag polling so
callers no longer need to manipulate magic constants or raw flag arrays
directly. Also add cudaq_host_release_worker for returning a graph
worker to the idle pool after completion.

New helpers:
  cudaq_host_ringbuffer_write_rpc_request
  cudaq_host_ringbuffer_signal_slot
  cudaq_host_ringbuffer_poll_tx_flag
  cudaq_host_ringbuffer_slot_available
  cudaq_host_ringbuffer_clear_slot
  cudaq_host_release_worker

Expose cudaq_tx_status_t enum and CUDAQ_RPC_MAGIC_* / CUDAQ_RPC_HEADER_SIZE
constants so producers can work purely through the C API.

Add 5 new HostDispatcherLoopTest cases (multi-worker function_id routing,
worker recycling, backpressure when all workers busy, stats counter
accuracy, multi-slot round-robin) and migrate existing smoke and
round-trip tests to use the new C API helpers instead of raw pointer
manipulation.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
wsttiger added 2 commits March 4, 2026 19:26
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
wsttiger added 2 commits March 7, 2026 01:39
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant