Skip to content

Conversation

michal-shalev
Copy link
Contributor

@michal-shalev michal-shalev commented Oct 9, 2025

THIS PULL REQUEST IS A TEMPORARY WORKAROUND FOR INTEGRATION, WE ARE NOT PLANNING TO MERGE IT

What?

Fix cuda_ipc overhead for device operations to enable proto selection on EOS.

Why?

When running on EOS, cuda_ipc was failing to be selected during proto selection for device lanes. Despite having better bandwidth than rc_gda, cuda_ipc had a hardcoded overhead of 7.0µs appropriate for host operations but incorrect for device operations, causing it to score poorly.

Before

cuda_ipc: bandwidth=400800.00MB/s xfer=0.62us latency=1.00us overhead=7.00us reg=0.05us total=8.67us score=115290.35
rc_gda:   bandwidth=46016.67MB/s  xfer=5.43us latency=0.70us overhead=0.04us reg=0.05us total=6.22us score=160698.99

After

cuda_ipc: bandwidth=400800.00MB/s xfer=0.62us latency=1.00us overhead=0.05us reg=0.05us total=1.72us score=580129.69
rc_gda:   bandwidth=46016.67MB/s  xfer=5.43us latency=0.70us overhead=0.04us reg=0.05us total=6.22us score=160698.99​

How?

Changed hardcoded overhead from 7.0µs to 0.05µs in uct_cuda_ipc_iface_query(). This increases cuda_ipc's score, making it automatically selected for intra-node device lanes.

iface_attr->bandwidth.dedicated = 0;
iface_attr->bandwidth.shared = iface->config.bandwidth;
iface_attr->overhead = 7.0e-6;
iface_attr->overhead = 0.05e-6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How do we know the exact overhead of cuda_ipc on device?
  2. So previous overhead of 7us is still accurate for host operations?
    Then this change might break protocol selection for host operations.
    Maybe we need a separate iface cuda_ipc_device_iface that would override this latency estimate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants