Skip to content

rxe: Fix dma.length computation in wr_set_sge_list#1744

Open
jeholza wants to merge 1 commit into
linux-rdma:masterfrom
jeholza:rxe-fix-wr-set-sge-list
Open

rxe: Fix dma.length computation in wr_set_sge_list#1744
jeholza wants to merge 1 commit into
linux-rdma:masterfrom
jeholza:rxe-fix-wr-set-sge-list

Conversation

@jeholza
Copy link
Copy Markdown

@jeholza jeholza commented May 25, 2026

wr_set_sge_list() summed the SGE lengths with a loop that never advanced sg_list:

while (num_sge--)
	tot_length += sg_list->length;

so tot_length ended up as num_sge * sg_list[0].length instead of the true sum, and wqe->dma.length / wqe->dma.resid were written with that wrong value. The per-SGE entries themselves were unaffected because they are populated by the preceding memcpy().

The kernel rxe driver requires dma.length == sum(sge[i].length) and enforces it in rxe_mr.c:copy_data(), so a multi-SGE WR posted through the ibv_qp_ex builder API (ibv_wr_set_sge_list) on rxe completes with IB_WC_LOC_PROT_ERR once finish_packet()/copy_data() runs off the end of the SGE list.

The legacy ibv_post_send path (init_send_wqe) is unaffected; it sums the lengths with an indexed for loop.

Fix by computing the total with an indexed loop, matching the style already used in rxe_post_one_recv() and init_send_wqe() in this file.

Fixes: 1a894ca ("Providers/rxe: Implement ibv_create_qp_ex verb")

wr_set_sge_list() summed the SGE lengths with a loop that never
advanced sg_list:

	while (num_sge--)
		tot_length += sg_list->length;

so tot_length ended up as num_sge * sg_list[0].length instead of the
true sum, and wqe->dma.length / wqe->dma.resid were written with that
wrong value. The per-SGE entries themselves were unaffected because
they are populated by the preceding memcpy().

The kernel rxe driver requires dma.length == sum(sge[i].length) and
enforces it in rxe_mr.c:copy_data(), so a multi-SGE WR posted through
the ibv_qp_ex builder API (ibv_wr_set_sge_list) on rxe completes with
IB_WC_LOC_PROT_ERR once finish_packet()/copy_data() runs off the end
of the SGE list.

The legacy ibv_post_send path (init_send_wqe) is unaffected; it sums
the lengths with an indexed for loop.

Fix by computing the total with an indexed loop, matching the style
already used in rxe_post_one_recv() and init_send_wqe() in this file.

Fixes: 1a894ca ("Providers/rxe: Implement ibv_create_qp_ex verb")
Signed-off-by: Jared Holzman <jholzman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant