Skip to content

mlx5: Fix inline-scatter source address on 128B CQE REQ completions#1743

Open
yishaih wants to merge 1 commit into
linux-rdma:masterfrom
yishaih:mlx5_misc
Open

mlx5: Fix inline-scatter source address on 128B CQE REQ completions#1743
yishaih wants to merge 1 commit into
linux-rdma:masterfrom
yishaih:mlx5_misc

Conversation

@yishaih
Copy link
Copy Markdown
Member

@yishaih yishaih commented May 25, 2026

Fix inline-scatter source address on 128B CQE REQ completions.

Further details exist as part of the commit log.

In mlx5_parse_cqe() the inline-scatter completion path used the
function's void* buffer pointer to address the inline data:

	if (cqe64->op_own & MLX5_INLINE_SCATTER_32)
		err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe, ...);
	else if (cqe64->op_own & MLX5_INLINE_SCATTER_64)
		err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe - 1, ...);

`cqe` is the base of the CQE buffer entry, while `cqe64` points at the
64B descriptor inside that entry (offset 0 for a 64B entry, offset 64
for a 128B entry). Both call sites used the wrong base:

1. SCATTER_64: the payload occupies the first 64B of a 128B
   entry, so the correct base is `cqe64 - 1`. The existing
   `cqe - 1` relied on void* pointer arithmetic (a GNU extension
   that subtracts one byte, not one descriptor) and so was off by
   63 bytes.

2. SCATTER_32: the inline_32 payload starts at offset 0 of the
   descriptor, so the correct base is `cqe64`. Passing `cqe`
   instead reads from offset 0 of the buffer entry, which on a
   128B CQE is 64 bytes before the payload (the tail of the
   previous entry). For 64B CQEs cqe == cqe64 so the bug was
   masked.

Both bugs affect inline RDMA_READ / ATOMIC completions on the legacy
ibv_poll_cq path and the extended-CQ ibv_start_poll / ibv_next_poll
path. The matching responder helpers (handle_responder,
handle_responder_lazy, handle_tag_matching) already pass the typed cqe64
pointer and so were not affected.

Use cqe64 / cqe64 - 1 at the REQ-path call sites. The void* cqe
parameter was only used here, so drop it from mlx5_get_next_cqe(),
mlx5_parse_cqe(), mlx5_parse_lazy_cqe() and the locals in
mlx5_poll_one(), mlx5_start_poll() and mlx5_next_poll().

Fixes: 8c4791a ("libmlx5: First version of libmlx5")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant