-
Notifications
You must be signed in to change notification settings - Fork 489
UCP/DEVICE: Make memh and local_addr optional for counter elements #10945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
UCP/DEVICE: Make memh and local_addr optional for counter elements #10945
Conversation
test/gtest/ucp/test_ucp_device.cc
Outdated
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR | | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH; | ||
elem.memh = NULL; | ||
elem.local_addr = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be default to simplify the if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialized elem to 0
src/ucp/core/ucp_device.c
Outdated
return status; | ||
} | ||
|
||
if (local_sys_dev == UCS_SYS_DEVICE_ID_UNKNOWN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move to ucp_device_mem_list_params_check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/core/ucp_device.c
Outdated
*local_md_map = memh->md_map; | ||
*mem_type = memh->mem_type; | ||
} else { | ||
*mem_type = rkey->mem_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be cuda for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/api/device/ucp_host.h
Outdated
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH = UCS_BIT(4) /**< Length of the local buffer in bytes */ | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LOCAL_ADDR = UCS_BIT(2), /**< Local address (optional for counter elements) */ | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR = UCS_BIT(3), /**< Remote address */ | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH = UCS_BIT(4) /**< Length of the local buffer in bytes (optional for counter elements) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memh, laddr and length should be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length is only optional for partial
src/ucp/api/device/ucp_host.h
Outdated
UCP_DEVICE_MEM_LIST_ELEM_FIELD_RKEY = UCS_BIT(1), /**< Unpacked remote memory key */ | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LOCAL_ADDR = UCS_BIT(2), /**< Local address */ | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR = UCS_BIT(3), /**< Remote address */ | ||
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LOCAL_ADDR = UCS_BIT(2), /**< Local address (optional for counter elements) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it can be optional also for data elements, only rkey
is required (we only check for rkey in ucp_device_mem_list_params_check), the rest are optional.
We do need local address for ucp_device_put_multi
, but mem list is not bound to a specific API func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
elems[i].memh = perf.ucp.send_memh; | ||
elems[i].rkey = perf.ucp.rkey; | ||
elems[i].local_addr = UCS_PTR_BYTE_OFFSET(perf.send_buffer, offset); | ||
bool is_counter = (i == count - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (size_t i = 0; i < count - 1; ++i) {
elems[i].field_mask = UCP_DEVICE_MEM_LIST_ELEM_FIELD_MEMH |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_RKEY |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LOCAL_ADDR |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH;
elems[i].memh = perf.ucp.send_memh;
elems[i].rkey = perf.ucp.rkey;
elems[i].local_addr = UCS_PTR_BYTE_OFFSET(perf.send_buffer, offset);
elems[i].remote_addr = perf.ucp.remote_addr + offset;
elems[i].length = perf.params.msg_size_list[i];
offset += elems[i].length;
}
elems[count - 1].field_mask = UCP_DEVICE_MEM_LIST_ELEM_FIELD_RKEY |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH;
elems[count - 1].rkey = perf.ucp.rkey;
elems[count - 1].remote_addr = perf.ucp.remote_addr + offset;
elems[count - 1].length = ONESIDED_SIGNAL_SIZE;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will not always have a counter, check UCX_PERF_CMD_PUT_SINGLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t data_count = perf.params.msg_size_cnt;
for (size_t i = 0; i < data_count; ++i) {
elems[i].field_mask = UCP_DEVICE_MEM_LIST_ELEM_FIELD_MEMH |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_RKEY |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LOCAL_ADDR |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH;
elems[i].memh = perf.ucp.send_memh;
elems[i].rkey = perf.ucp.rkey;
elems[i].local_addr = UCS_PTR_BYTE_OFFSET(perf.send_buffer, offset);
elems[i].remote_addr = perf.ucp.remote_addr + offset;
elems[i].length = perf.params.msg_size_list[i];
offset += elems[i].length;
}
if (m_has_counter) {
elems[data_count].field_mask = UCP_DEVICE_MEM_LIST_ELEM_FIELD_RKEY |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_REMOTE_ADDR |
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH;
elems[data_count].rkey = perf.ucp.rkey;
elems[data_count].remote_addr = perf.ucp.remote_addr + offset;
elems[data_count].length = ONESIDED_SIGNAL_SIZE;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're suggesting to set rkey
and remote_addr
twice to the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, there will be a condition that is checked at each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
/* +1 for the counter */ | ||
size_t count = perf.params.msg_size_cnt + 1; | ||
size_t count = perf.params.msg_size_cnt + (m_has_counter ? 1 : 0); | ||
size_t offset = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue of this PR.
The variable (offset
) can be deleted.
What?
Make
memh
andlocal_addr
optional for counter elements inucp_device_mem_list_create
.Why?
Counter elements only require remote addressing for atomic operations (
ucp_device_counter_inc
, or as part ofucp_device_put_multi
/put_multi_partial
). Requiring local memory registration (memh
/local_addr
) for these elements is unnecessary overhead.How?
memh
is not provided, detectlocal_sys_dev
by allocating a temporary buffer on the current CUDA context device (similar toucp_ep_rma_batch_export
)