Support LACP bonded network interfaces

## Problem

On many GPU training nodes, dual-port NICs (e.g., ConnectX-7 with 2 physical ports) are bonded together using LACP (802.3ad) at the infrastructure level. This is not a vendor-specific setup — CX7 is a single PCIe device with two ports, and in many network environments (e.g., rail-optimized fabrics), the two ports are configured as an LACP bond rather than exposed as two independent interfaces. The bonding happens at the OS/network level, not at the hardware level.

A typical H20 GPU node topology:

```
PCIe 0000:7f:00.0 (CX7, 2 ports) → bond0 (LACP 802.3ad)
PCIe 0000:c7:00.0 (CX7, 2 ports) → bond1 (LACP 802.3ad)
PCIe 0001:08:00.0 (CX7, 2 ports) → bond2 (LACP 802.3ad)
PCIe 0001:a2:00.0 (CX7, 2 ports) → bond3 (LACP 802.3ad)
```

dranet discovers these bond interfaces and publishes them in the ResourceSlice. However, when a pod claims one, `Prepare` fails because **bond devices with LACP cannot be moved into a pod network namespace**:

- Moving the bond breaks LACP negotiation with the switch
- Moving individual slave ports out of the bond is not supported by the kernel
- The bond must remain in the host namespace to maintain link aggregation

## Relationship to #63

\#63 proposes using IPvlan to **share** a single NIC across multiple pods, using `allowMultipleAllocations` and consumable capacity to model the sharing.

This issue is different. We are not trying to share — the user still wants **exclusive** use of the NIC. The problem is purely that the bond **cannot be moved**. IPvlan here is a **transport mechanism** to work around the LACP constraint, not a sharing mechanism.

From the user's perspective, claiming a bonded NIC should feel the same as claiming a regular NIC — they should not need to know whether the underlying interface is a bond or not.

## Proposal

When dranet detects that a network interface is a bond in LACP mode, it should automatically create an IPvlan child interface and move the child into the pod's namespace, instead of attempting to move the bond itself.

```
Host namespace:                    Pod namespace:
  bond0 (LACP, stays on host)       ipvlan0 ← child of bond0
  bond1 (LACP, stays on host)       ipvlan1 ← child of bond1
  bond2 (LACP, stays on host)       ipvlan2 ← child of bond2
  bond3 (LACP, stays on host)       ipvlan3 ← child of bond3
```

This should be **transparent to the scheduler and to the user**:

- The bond is published in the ResourceSlice as a normal device (no `allowMultipleAllocations`, no capacity)
- The user requests it with a normal `allocationMode: All` or `ExactCount`
- The scheduler allocates it exclusively as usual
- During `Prepare` / `RunPodSandbox`, dranet detects the bond + LACP and creates an IPvlan child instead of calling `ip link set netns`

The detection logic could be:
1. Check if the interface is a bond (`/sys/class/net/<ifname>/bonding/mode`)
2. If the bond mode is `802.3ad` (LACP) or other modes where moving is unsafe, create an IPvlan child
3. Move the IPvlan child into the pod namespace and configure it (IP, routes, etc.)

RDMA char devices (`/dev/infiniband/uverbs*`, `rdma_cm`) would be injected alongside as in the existing IB-only path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support LACP bonded network interfaces #239

Problem

Relationship to #63

Proposal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Support LACP bonded network interfaces #239

Description

Problem

Relationship to #63

Proposal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions