Skip to content

Add netdev_max_backlog, tcp_mtu_probing, and ring buffer settings for high-speed NICs #1012

@rbrunk

Description

@rbrunk

IncusOS Feature Request: High-Speed Network Tunables

Target repo: https://github.com/lxc/incus-os/issues
Related: Issue #730 (TCP tuning), PR #754 (system/kernel API), Issue #630 (system/kernel design)


Title: Add netdev_max_backlog, tcp_mtu_probing, and ring buffer settings for high-speed NICs

Description

The /1.0/system/kernel API (added in PR #754) covers TCP buffer sizing and congestion control, which is great. However, deploying IncusOS on servers with 100Gb NICs and jumbo frames (MTU 9000) reveals three gaps where stock kernel defaults cause measurable performance loss:

1. net.core.netdev_max_backlog (kernel API addition)

Current: Stock default is 1000.
Want: 30000+ for 100Gb NICs.
Why: At 100Gb line rate, the NIC can deliver ~8.3 million packets/second (at MTU 9000) or ~148 million pps (at 64 bytes). The per-CPU backlog queue fills in microseconds at stock depth. Under burst from multiple VMs simultaneously using the network, this causes silent packet drops before the kernel even processes them. This is the single most impactful tunable for high-speed NIC hypervisors.

Proposed API field:

type SystemKernelConfigNetwork struct {
    // ... existing fields ...
    NetdevMaxBacklog int `json:"netdev_max_backlog,omitempty" yaml:"netdev_max_backlog,omitempty"`
}

2. net.ipv4.tcp_mtu_probing (kernel API addition)

Current: Stock default is 0 (disabled).
Want: 1 (enabled, probe on ICMP blackhole detection).
Why: Required for Path MTU Discovery to work reliably with jumbo frames (MTU 9000). Without it, PMTUD blackholes silently kill throughput on any path that doesn't support the full MTU. This is a boolean-level setting with zero downside — the kernel only activates probing when it detects a blackhole.

Proposed API field:

type SystemKernelConfigNetwork struct {
    // ... existing fields ...
    TCPMTUProbing bool `json:"tcp_mtu_probing,omitempty" yaml:"tcp_mtu_probing,omitempty"`
}

3. Ring buffer sizing (network API addition)

Current: The SystemNetworkEthernet struct exposes offload toggles (GRO, GSO, TSO) but not ring buffer sizes.
Want: Fields to set Rx, Tx, and RxJumbo ring buffer sizes per interface/bond.
Why: Larger ring buffers absorb burst traffic from multiple VMs sharing the physical NIC. The NIC driver reports maximum supported sizes via ethtool; the ideal setting is "set all to max." For reference, Broadcom 100Gb NICs support Rx=2047, Tx=2047, RxJumbo=8191. Stock defaults are typically much lower.

systemd-networkd already supports this via .link files (RxBufferSize=, TxBufferSize=, RxJumboBufferSize=), and IncusOS's generateEthernet() function already maps struct fields to .link directives — so the plumbing is in place.

Proposed API fields:

type SystemNetworkEthernet struct {
    // ... existing fields ...
    RxBufferSize    *int `json:"rx_buffer_size,omitempty" yaml:"rx_buffer_size,omitempty"`
    TxBufferSize    *int `json:"tx_buffer_size,omitempty" yaml:"tx_buffer_size,omitempty"`
    RxJumboBufferSize *int `json:"rx_jumbo_buffer_size,omitempty" yaml:"rx_jumbo_buffer_size,omitempty"`
    RxMiniBufferSize  *int `json:"rx_mini_buffer_size,omitempty" yaml:"rx_mini_buffer_size,omitempty"`
}

Or alternatively, a single ring_buffers: max option that auto-discovers and sets all ring buffers to their hardware maximum — which is what most 100Gb deployments want.

Environment

  • Hardware: HP ProLiant Gen10 with Broadcom 100Gb NICs (ens1f0np0/ens1f1np1 in LACP bond)
  • MTU: 9000 (jumbo frames)
  • Workload: VM hypervisor running network-intensive workloads (NFS, Ceph replication, VM traffic on multiple VLANs)
  • IncusOS deployed on 3 nodes initially, scaling to 17

Impact

These three settings are universally beneficial for any IncusOS deployment on 10Gb+ hardware. They're non-destructive (no downside to higher backlog or enabled PMTUD), align with the "do the right thing out of the box" philosophy, and address the most common performance gap when moving from a tuned Linux hypervisor to IncusOS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EasyGood for new contributors

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions