Optimize IP version detection during address handle creation#1740
Conversation
Add IBV_WC_WITH_NETWORK_HDR_TYPE and IBV_WC_NETWORK_HDR_IPV6 flags to enum ibv_wc_flags. This provides a user-space equivalent to the kernel's network header type reporting, optimizing address handle creation from work completions. Currently, ibv_init_ah_from_wc() relies on parsing the GRH and potentially performing expensive checksum calculations (via get_grh_header_version) to determine if an incoming packet is IPv4 or IPv6. This patch allows providers that extract IP version information directly from the hardware CQE to set these flags in wc->wc_flags. The core library can then cheaply evaluate the bitmask to bypass the fallback GRH payload check entirely. Signed-off-by: Ted Shaowang <shaowang@google.com>
Leverage the newly introduced IBV_WC_WITH_NETWORK_HDR_TYPE and IBV_WC_NETWORK_HDR_IPV6 flags when processing Completion Queue Entries (CQEs) for Unreliable Datagram (UD) QPs. The irdma hardware natively parses the IP version and reports it via the 'ipv4' field in the CQE. Translating this state into wc_flags enables the core libibverbs layer to skip expensive software GRH parsing and checksum verifications during address handle creation. Beyond the performance benefits of bypassing payload memory reads, this ensures correct IP version identification. Since some hardware zeroes out the IPv4 header checksum in the buffer, a non-zero checksum calculated in get_grh_header_version() could fail the verification against the buffer's zeroed field, and wrongly misclassify valid IPv4 packets as IPv6. Relying on the explicit CQE state resolves this correctness issue. Signed-off-by: Ted Shaowang <shaowang@google.com>
|
|
||
| if (version == 4) | ||
| ret = set_ah_attr_by_ipv4(context, ah_attr, | ||
| (struct iphdr *)((void *)grh + 20), |
There was a problem hiding this comment.
And here again you are accessing packet header. It looks like that "performance claim" in PR description is not accurate.
There was a problem hiding this comment.
Thanks for the catch! I've updated the PR description to reflect the change more accurately.
That said, the issue of a zeroed IPv4 checksum leading to IP version misidentification still stands. It is much safer to rely on explicit information rather than a heuristic. Bypassing get_grh_header_version() also still saves us the memcpy and the checksum computation cycles on the fast path.
| IBV_WC_TM_SYNC_REQ = 1 << 4, | ||
| IBV_WC_TM_MATCH = 1 << 5, | ||
| IBV_WC_TM_DATA_VALID = 1 << 6, | ||
| IBV_WC_WITH_NETWORK_HDR_TYPE = 1 << 7, |
There was a problem hiding this comment.
This is user API enum. You should fine a way to solve intel HW bug without exposing it to the users.
There was a problem hiding this comment.
The challenge is that we cannot safely fix the GRH payload inside the provider:
- At poll time (
irdma_process_cqe): Even if the provider retrieves the buffer address, it cannot assume the memory is actually CPU-accessible. - At AH creation (
ibv_init_ah_from_wc): The application provides a valid, CPU-accessiblegrhpointer, but by then we are in the core library. The hardware has already zeroed the checksum during Rx offload, causing the software heuristic to fail.
Since modifying the payload in poll_cq isn't viable, and extending wc_flags is undesirable, may I ask for your advice on how to best resolve this issue, assuming we cannot fix the zero checksum in HW?
Overview
Currently in user space,
ibv_init_ah_from_wc()relies on parsing the Global Routing Header (GRH) payload to determine whether an incoming Unreliable Datagram (UD) packet is IPv4 or IPv6. This mechanism relies onget_grh_header_version(), which introduces two major issues:memcpyand a checksum calculation (ipv4_calc_hdr_csum()) on the fast path.Many modern hardware providers already parse the IP version directly in the hardware Completion Queue Entry (CQE). This PR introduces a mechanism to pass this hardware-extracted network header type up to the core library, bypassing the software fallback.
Changes Introduced
libibverbs:IBV_WC_WITH_NETWORK_HDR_TYPEandIBV_WC_NETWORK_HDR_IPV6flags toenum ibv_wc_flags.ibv_init_ah_from_wc()to evaluate this bitmask first, immediately determining the IP version without touching payload memory or executing the software checksum fallback.irdma(Provider):irdma_process_cqe()for UD queue pairs.cur_cqe->ipv4state directly into the genericwc_flagsbitmask, ensuring accurate IP version reporting regardless of buffer checksum state.Impact