Skip to content

Comments

prov/efa: Fix asserts for efa-proto#11840

Merged
a-szegel merged 1 commit intoofiwg:mainfrom
a-szegel:add-missing-assert-to-efa-direct-rdma-write
Feb 12, 2026
Merged

prov/efa: Fix asserts for efa-proto#11840
a-szegel merged 1 commit intoofiwg:mainfrom
a-szegel:add-missing-assert-to-efa-direct-rdma-write

Conversation

@a-szegel
Copy link
Contributor

@a-szegel a-szegel commented Jan 26, 2026

Allow user to pass in null buff/null descriptor for efa-protocol for read/write without hitting an assert.

@a-szegel a-szegel requested a review from a team January 26, 2026 20:58
@a-szegel a-szegel changed the title prov/efa: Add missing assert to efa_rma_post_write prov/efa: Modify asserts to efa/efa-direct rdma read/write Jan 26, 2026
@a-szegel a-szegel changed the title prov/efa: Modify asserts to efa/efa-direct rdma read/write prov/efa: Add/Remove asserts to efa/efa-direct rdma read/write Jan 26, 2026
* Allow local iov count to be equal to 0 b/c bounce buffer's pre-registered buff/desc
* will be passed to rdma-core
*/
assert(ope->iov_count <= efa_rdm_ep_domain(ep)->info->tx_attr->iov_limit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(ope->iov_count <= efa_rdm_ep_domain(ep)->info->tx_attr->iov_limit);
assert(ope->iov_count <= efa_rdm_ep_domain(ep)->info->tx_attr->iov_limit || (ope->iov_count == 0 && ope->bytes_read_total_len == 0));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part of the assert is not needed because if iov_count == 0, the left side will always be true, making the right side never trigger.

I think you were going for:

#if ENABLE_DEBUG
if (ope->iov_count == 0)
     assert(ope->bytes_read_total_len == 0)
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. Or you can flip the order so that the iov_count == 0 gets checked first

I wanted to keep the assert that if iov_count is 0, the total length is also 0

* Allow local iov count to be equal to 0 b/c bounce buffer's pre-registered buff/desc
* will be passed to rdma-core
*/
assert(ope->iov_count <= efa_rdm_ep_domain(ep)->info->tx_attr->iov_limit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(ope->iov_count <= efa_rdm_ep_domain(ep)->info->tx_attr->iov_limit);
assert(ope->iov_count <= efa_rdm_ep_domain(ep)->info->tx_attr->iov_limit || (ope->iov_count == 0 && ope->bytes_write_total_len == 0));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part of the assert is not needed because if iov_count == 0, the left side will always be true, making the right side never trigger.

I think you were going for:

#if ENABLE_DEBUG
if (ope->iov_count == 0)
     assert(ope->bytes_write_total_len == 0)
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use ofi_total_iov_len to calculate bytes_write_total_len, iov_count == 0 already means bytes_write_total_len=0... I do not think it is more useful to add that additional asserts though

@a-szegel a-szegel force-pushed the add-missing-assert-to-efa-direct-rdma-write branch from 1d31d6a to e3f6709 Compare February 11, 2026 20:54
msg->context, msg->addr, flags, FI_RMA | FI_WRITE);

/* Prepare SGE list */
assert(msg->iov_count > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still doesn't address my initial concern.... that assert is for checking something you are sure to happen inside the code, but here iov_count is something parsed by application. I think we either need to make efa-direct support 0 iov_count, or return error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I didn't see where this initial concern was commented. Will fix.

Allow user to pass in null buff/null descriptor for efa-protocol for
read/write without hitting an assert.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
@a-szegel a-szegel force-pushed the add-missing-assert-to-efa-direct-rdma-write branch from e3f6709 to db52e68 Compare February 11, 2026 23:29
@a-szegel
Copy link
Contributor Author

@shijin-aws, I simplified this PR to only be for efa-proto, and I will post a follow up PR for efa-direct.

@a-szegel a-szegel changed the title prov/efa: Add/Remove asserts to efa/efa-direct rdma read/write prov/efa: Fix asserts for efa-proto Feb 11, 2026
@a-szegel a-szegel requested a review from shijin-aws February 11, 2026 23:31
@a-szegel a-szegel merged commit 765272e into ofiwg:main Feb 12, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants