Add host dispatcher#4041
Draft
wsttiger wants to merge 8 commits intoNVIDIA:features/cudaq.realtimefrom
Draft
Conversation
…me API - Extend C API (cudaq_realtime.h): add cudaq_backend_t (DEVICE_KERNEL / HOST_LOOP), optional ringbuffer host pointers, CUDAQ_DISPATCH_HOST_CALL and cudaq_host_rpc_fn_t in function entries, and host-dispatcher C API (cudaq_host_dispatcher_start_thread / cudaq_host_dispatcher_stop). - Add host_dispatcher.h with HostDispatcherConfig, HostDispatchWorker (graph_exec, stream, function_id), and host_dispatcher_loop declaration. - Implement host_dispatcher_loop in host_dispatcher.cu: parse RPC by function_id, dispatch HOST_CALL (inline callback) or GRAPH_LAUNCH (worker pool + mailbox), with helpers for parse, acquire worker, and launch graph. - Implement host_dispatcher_capi.cu: allocate state and worker pool from function table, build HostDispatcherConfig, spawn thread running host_dispatcher_loop; stop joins thread and frees resources. - In cudaq_realtime_api.cpp: backend-aware validation (host backend requires ringbuffer host view, no launch_fn); start() spawns host thread when backend is HOST_LOOP; stop()/destroy and get_processed() handle host backend; link cudaq-realtime to cudaq-realtime-host-dispatch. - Build: add libcudaq-realtime-host-dispatch (host_dispatcher.cu + host_dispatcher_capi.cu) and link it from libcudaq-realtime. Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Drop HOST_CALL and DEVICE_CALL in the host loop (clear slot and advance). Remove dispatch_host_call and document that only GRAPH_LAUNCH is dispatched for the host backend. Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Add test_host_dispatcher.cu with two tests for the HOST_LOOP backend: - Smoke test: verifies the host loop drops slots with unknown function_id - GRAPH_LAUNCH round-trip: exercises the full C API dispatch path with a pinned mailbox, verifying that an RPC request is matched to a graph, launched via the worker pool, and produces the correct in-place response To support this, add cudaq_dispatcher_set_mailbox() so callers can provide a pinned (cudaHostAllocMapped) mailbox before start(). This lets graphs be captured with the device-side mailbox pointer ahead of time—the existing internal allocation (plain host memory) is not device-visible. The C API falls back to internal allocation when no external mailbox is set. Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…t tests Add cudaq_host_ringbuffer_* helper functions to the public C API that encapsulate the RPC wire format, slot signalling, and tx_flag polling so callers no longer need to manipulate magic constants or raw flag arrays directly. Also add cudaq_host_release_worker for returning a graph worker to the idle pool after completion. New helpers: cudaq_host_ringbuffer_write_rpc_request cudaq_host_ringbuffer_signal_slot cudaq_host_ringbuffer_poll_tx_flag cudaq_host_ringbuffer_slot_available cudaq_host_ringbuffer_clear_slot cudaq_host_release_worker Expose cudaq_tx_status_t enum and CUDAQ_RPC_MAGIC_* / CUDAQ_RPC_HEADER_SIZE constants so producers can work purely through the C API. Add 5 new HostDispatcherLoopTest cases (multi-worker function_id routing, worker recycling, backpressure when all workers busy, stats counter accuracy, multi-slot round-robin) and migrate existing smoke and round-trip tests to use the new C API helpers instead of raw pointer manipulation. Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR refines the host-side dispatcher backend and adds end-to-end test coverage for the GRAPH_LAUNCH dispatch path.
Restrict host dispatcher to GRAPH_LAUNCH only — The host loop now only dispatches
GRAPH_LAUNCHentries;HOST_CALLandDEVICE_CALLslots are dropped (cleared and advanced). Removes the unuseddispatch_host_callpath and updates comments/headers to reflect the GRAPH_LAUNCH-only design.Add host dispatcher tests + external mailbox support — New test file
test_host_dispatcher.cuwith two tests:function_id, and verifies the slot is silently dropped.{0,1,2,3}, and asserts the graph produces{1,2,3,4}in-place.New C API:
cudaq_dispatcher_set_mailbox— Lets callers provide a caller-managed pinned (cudaHostAllocMapped) mailbox. This is required for GRAPH_LAUNCH because the graph must be captured with the device-side mailbox pointer before the dispatcher starts, and the internal allocation (plainnew) is not device-visible. When no external mailbox is provided, the C API falls back to internal allocation (backward compatible).Files changed
cudaq_realtime.hcudaq_dispatcher_set_mailboxdecl;external_mailboxparam on start threadhost_dispatcher.hhost_dispatcher.cuhost_dispatcher_capi.cuowns_mailboxflag; accept/skip external mailboxcudaq_realtime_api.cpph_mailbox_bankthrough to start threadunittests/CMakeLists.txttest_host_dispatchertargetunittests/test_host_dispatcher.cuTest plan
HostDispatcherSmokeTest.DropsSlotWithUnknownFunctionId— passesHostDispatcherGraphLaunchTest.FullRpcRoundTripViaPinnedMailbox— passestest_dispatch_kernelstill builds and is unaffectedDescription