Problem
On many GPU training nodes, dual-port NICs (e.g., ConnectX-7 with 2 physical ports) are bonded together using LACP (802.3ad) at the infrastructure level. This is not a vendor-specific setup — CX7 is a single PCIe device with two ports, and in many network environments (e.g., rail-optimized fabrics), the two ports are configured as an LACP bond rather than exposed as two independent interfaces. The bonding happens at the OS/network level, not at the hardware level.
A typical H20 GPU node topology:
PCIe 0000:7f:00.0 (CX7, 2 ports) → bond0 (LACP 802.3ad)
PCIe 0000:c7:00.0 (CX7, 2 ports) → bond1 (LACP 802.3ad)
PCIe 0001:08:00.0 (CX7, 2 ports) → bond2 (LACP 802.3ad)
PCIe 0001:a2:00.0 (CX7, 2 ports) → bond3 (LACP 802.3ad)
dranet discovers these bond interfaces and publishes them in the ResourceSlice. However, when a pod claims one, Prepare fails because bond devices with LACP cannot be moved into a pod network namespace:
- Moving the bond breaks LACP negotiation with the switch
- Moving individual slave ports out of the bond is not supported by the kernel
- The bond must remain in the host namespace to maintain link aggregation
Relationship to #63
#63 proposes using IPvlan to share a single NIC across multiple pods, using allowMultipleAllocations and consumable capacity to model the sharing.
This issue is different. We are not trying to share — the user still wants exclusive use of the NIC. The problem is purely that the bond cannot be moved. IPvlan here is a transport mechanism to work around the LACP constraint, not a sharing mechanism.
From the user's perspective, claiming a bonded NIC should feel the same as claiming a regular NIC — they should not need to know whether the underlying interface is a bond or not.
Proposal
When dranet detects that a network interface is a bond in LACP mode, it should automatically create an IPvlan child interface and move the child into the pod's namespace, instead of attempting to move the bond itself.
Host namespace: Pod namespace:
bond0 (LACP, stays on host) ipvlan0 ← child of bond0
bond1 (LACP, stays on host) ipvlan1 ← child of bond1
bond2 (LACP, stays on host) ipvlan2 ← child of bond2
bond3 (LACP, stays on host) ipvlan3 ← child of bond3
This should be transparent to the scheduler and to the user:
- The bond is published in the ResourceSlice as a normal device (no
allowMultipleAllocations, no capacity)
- The user requests it with a normal
allocationMode: All or ExactCount
- The scheduler allocates it exclusively as usual
- During
Prepare / RunPodSandbox, dranet detects the bond + LACP and creates an IPvlan child instead of calling ip link set netns
The detection logic could be:
- Check if the interface is a bond (
/sys/class/net/<ifname>/bonding/mode)
- If the bond mode is
802.3ad (LACP) or other modes where moving is unsafe, create an IPvlan child
- Move the IPvlan child into the pod namespace and configure it (IP, routes, etc.)
RDMA char devices (/dev/infiniband/uverbs*, rdma_cm) would be injected alongside as in the existing IB-only path.
Problem
On many GPU training nodes, dual-port NICs (e.g., ConnectX-7 with 2 physical ports) are bonded together using LACP (802.3ad) at the infrastructure level. This is not a vendor-specific setup — CX7 is a single PCIe device with two ports, and in many network environments (e.g., rail-optimized fabrics), the two ports are configured as an LACP bond rather than exposed as two independent interfaces. The bonding happens at the OS/network level, not at the hardware level.
A typical H20 GPU node topology:
dranet discovers these bond interfaces and publishes them in the ResourceSlice. However, when a pod claims one,
Preparefails because bond devices with LACP cannot be moved into a pod network namespace:Relationship to #63
#63 proposes using IPvlan to share a single NIC across multiple pods, using
allowMultipleAllocationsand consumable capacity to model the sharing.This issue is different. We are not trying to share — the user still wants exclusive use of the NIC. The problem is purely that the bond cannot be moved. IPvlan here is a transport mechanism to work around the LACP constraint, not a sharing mechanism.
From the user's perspective, claiming a bonded NIC should feel the same as claiming a regular NIC — they should not need to know whether the underlying interface is a bond or not.
Proposal
When dranet detects that a network interface is a bond in LACP mode, it should automatically create an IPvlan child interface and move the child into the pod's namespace, instead of attempting to move the bond itself.
This should be transparent to the scheduler and to the user:
allowMultipleAllocations, no capacity)allocationMode: AllorExactCountPrepare/RunPodSandbox, dranet detects the bond + LACP and creates an IPvlan child instead of callingip link set netnsThe detection logic could be:
/sys/class/net/<ifname>/bonding/mode)802.3ad(LACP) or other modes where moving is unsafe, create an IPvlan childRDMA char devices (
/dev/infiniband/uverbs*,rdma_cm) would be injected alongside as in the existing IB-only path.