Skip to content

[Feat] use host-pinned memory with dual CPU/device addresses for transport buffers#1024

Open
yumingyue624 wants to merge 1 commit into
ModelEngine-Group:feature_26h1from
yumingyue624:adapt_connection
Open

[Feat] use host-pinned memory with dual CPU/device addresses for transport buffers#1024
yumingyue624 wants to merge 1 commit into
ModelEngine-Group:feature_26h1from
yumingyue624:adapt_connection

Conversation

@yumingyue624

Copy link
Copy Markdown
Contributor

Purpose

Switch ASU send/flag buffers from plain host memory to host-pinned
memory so that CPU code packs SQEs through the local mapping while
HCOMM/RDMA uses the device-visible mapping of the same allocation.

Modifications

  1. BufferManager: allocate host-pinned memory via aclrtMallocHost +
    aclrtHostRegisterV2, obtain device pointer via
    aclrtHostGetDevicePointer. ScatterGatherEntry gains device_addr.
    RegisterMemory uses device address for host-pinned regions.
  2. AsuTransportImpl: send/flag buffers use HOST_PINNED instead of HOST.
  3. asu_submit_flow: pass device_addr to SendIoBatch.
  4. sqe_request: use flagBuffer.device_addr for response_buffer_addr.
  5. Tests: added host-pinned dual-address and device_addr assertions.

Test

  • buffer_manager_test: HostPinnedRegistersDeviceAddress.
  • asu_submit_flow_test: BuildSubBatchSendBuffersUsesHostPinnedDeviceAddresses.
  • sqe_request_test: packed response address matches device_addr.

… transport buffers

## Purpose
Switch ASU send/flag buffers from plain host memory to host-pinned
memory so that CPU code packs SQEs through the local mapping while
HCOMM/RDMA uses the device-visible mapping of the same allocation.

## Modifications
1. BufferManager: allocate host-pinned memory via aclrtMallocHost +
aclrtHostRegisterV2, obtain device pointer via
aclrtHostGetDevicePointer. ScatterGatherEntry gains device_addr.
RegisterMemory uses device address for host-pinned regions.
2. AsuTransportImpl: send/flag buffers use HOST_PINNED instead of HOST.
3. asu_submit_flow: pass device_addr to SendIoBatch.
4. sqe_request: use flagBuffer.device_addr for response_buffer_addr.
5. Tests: added host-pinned dual-address and device_addr assertions.

## Test
- buffer_manager_test: HostPinnedRegistersDeviceAddress.
- asu_submit_flow_test: BuildSubBatchSendBuffersUsesHostPinnedDeviceAddresses.
- sqe_request_test: packed response address matches device_addr.
@yumingyue624 yumingyue624 changed the base branch from develop to feature_26h1 June 12, 2026 03:57
@yumingyue624 yumingyue624 requested a review from nrj868 as a code owner June 12, 2026 03:57
}

if (subBatchContext.flagBuffer.addr == 0 || subBatchContext.flagBuffer.length == 0) {
if (subBatchContext.sendSge.device_addr == 0 || subBatchContext.flagBuffer.addr == 0 ||

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about sendSge.addr and sendSge.length?

TransProvider::MemType providerMemType{TransProvider::MemType::MEM_HOST};
};

class BufferRegionCreator : public Trans::AscendBuffer {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming problem, the parent class is a Buffer and the child class is a Creator?

}
BufferRegionCreator regionCreator;
BufferRegion region;
auto allocStatus = regionCreator.MakeRegion(memory_type_, total, region);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Violating the inherit spirit, the child class has its own public function and will be called directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants