Skip to content

Optimize IP version detection during address handle creation#1740

Open
swjz wants to merge 2 commits into
linux-rdma:masterfrom
swjz:master
Open

Optimize IP version detection during address handle creation#1740
swjz wants to merge 2 commits into
linux-rdma:masterfrom
swjz:master

Conversation

@swjz
Copy link
Copy Markdown

@swjz swjz commented May 14, 2026

Overview

Currently in user space, ibv_init_ah_from_wc() relies on parsing the Global Routing Header (GRH) payload to determine whether an incoming Unreliable Datagram (UD) packet is IPv4 or IPv6. This mechanism relies on get_grh_header_version(), which introduces two major issues:

  1. Correctness: Since some hardware zeroes out the IPv4 header checksum in the receive buffer, the software recalculated checksum fails the comparison against the buffer's zeroed field. This causes valid IPv4 packets to be indeterministically misclassified as IPv6.
  2. Performance: It involves an extra 20-byte stack memcpy and a checksum calculation (ipv4_calc_hdr_csum()) on the fast path.

Many modern hardware providers already parse the IP version directly in the hardware Completion Queue Entry (CQE). This PR introduces a mechanism to pass this hardware-extracted network header type up to the core library, bypassing the software fallback.

Changes Introduced

  • libibverbs:

    • Introduces IBV_WC_WITH_NETWORK_HDR_TYPE and IBV_WC_NETWORK_HDR_IPV6 flags to enum ibv_wc_flags.
    • Updates ibv_init_ah_from_wc() to evaluate this bitmask first, immediately determining the IP version without touching payload memory or executing the software checksum fallback.
  • irdma (Provider):

    • Implements the new flags in irdma_process_cqe() for UD queue pairs.
    • Translates the natively hardware-reported cur_cqe->ipv4 state directly into the generic wc_flags bitmask, ensuring accurate IP version reporting regardless of buffer checksum state.

Impact

  • Correctness: Fixes a bug where hardware-zeroed checksums cause IPv4 packets to be misclassified as IPv6. Non-supporting providers safely fall back to the existing GRH parsing logic.
  • Performance: Accelerates address handle creation for incoming UD packets on supporting hardware by eliminating redundant checksum calculation.

swjz added 2 commits May 14, 2026 19:34
Add IBV_WC_WITH_NETWORK_HDR_TYPE and IBV_WC_NETWORK_HDR_IPV6 flags to
enum ibv_wc_flags. This provides a user-space equivalent to the kernel's
network header type reporting, optimizing address handle creation from
work completions.

Currently, ibv_init_ah_from_wc() relies on parsing the GRH and
potentially performing expensive checksum calculations (via
get_grh_header_version) to determine if an incoming packet is IPv4
or IPv6.

This patch allows providers that extract IP version information directly
from the hardware CQE to set these flags in wc->wc_flags. The core
library can then cheaply evaluate the bitmask to bypass the fallback
GRH payload check entirely.

Signed-off-by: Ted Shaowang <shaowang@google.com>
Leverage the newly introduced IBV_WC_WITH_NETWORK_HDR_TYPE and
IBV_WC_NETWORK_HDR_IPV6 flags when processing Completion Queue Entries
(CQEs) for Unreliable Datagram (UD) QPs.

The irdma hardware natively parses the IP version and reports it via
the 'ipv4' field in the CQE. Translating this state into wc_flags
enables the core libibverbs layer to skip expensive software GRH
parsing and checksum verifications during address handle creation.

Beyond the performance benefits of bypassing payload memory reads,
this ensures correct IP version identification. Since some hardware
zeroes out the IPv4 header checksum in the buffer, a non-zero
checksum calculated in get_grh_header_version() could fail the
verification against the buffer's zeroed field, and wrongly
misclassify valid IPv4 packets as IPv6. Relying on the explicit
CQE state resolves this correctness issue.

Signed-off-by: Ted Shaowang <shaowang@google.com>
Comment thread libibverbs/verbs.c

if (version == 4)
ret = set_ah_attr_by_ipv4(context, ah_attr,
(struct iphdr *)((void *)grh + 20),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here again you are accessing packet header. It looks like that "performance claim" in PR description is not accurate.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! I've updated the PR description to reflect the change more accurately.

That said, the issue of a zeroed IPv4 checksum leading to IP version misidentification still stands. It is much safer to rely on explicit information rather than a heuristic. Bypassing get_grh_header_version() also still saves us the memcpy and the checksum computation cycles on the fast path.

Comment thread libibverbs/verbs.h
IBV_WC_TM_SYNC_REQ = 1 << 4,
IBV_WC_TM_MATCH = 1 << 5,
IBV_WC_TM_DATA_VALID = 1 << 6,
IBV_WC_WITH_NETWORK_HDR_TYPE = 1 << 7,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user API enum. You should fine a way to solve intel HW bug without exposing it to the users.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge is that we cannot safely fix the GRH payload inside the provider:

  • At poll time (irdma_process_cqe): Even if the provider retrieves the buffer address, it cannot assume the memory is actually CPU-accessible.
  • At AH creation (ibv_init_ah_from_wc): The application provides a valid, CPU-accessible grh pointer, but by then we are in the core library. The hardware has already zeroed the checksum during Rx offload, causing the software heuristic to fail.

Since modifying the payload in poll_cq isn't viable, and extending wc_flags is undesirable, may I ask for your advice on how to best resolve this issue, assuming we cannot fix the zero checksum in HW?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants