DEVICE/CUDA_IPC: Fix proto selection for cuda_ipc #10943
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
THIS PULL REQUEST IS A TEMPORARY WORKAROUND FOR INTEGRATION, WE ARE NOT PLANNING TO MERGE IT
What?
Fix
cuda_ipc
overhead for device operations to enable proto selection on EOS.Why?
When running on EOS, cuda_ipc was failing to be selected during proto selection for device lanes. Despite having better bandwidth than
rc_gda
,cuda_ipc
had a hardcoded overhead of 7.0µs appropriate for host operations but incorrect for device operations, causing it to score poorly.Before
After
How?
Changed hardcoded overhead from 7.0µs to 0.05µs in
uct_cuda_ipc_iface_query()
. This increasescuda_ipc
's score, making it automatically selected for intra-node device lanes.