Skip to content

Commit 788ca51

Browse files
quic-ashigargAshish Garg
and
Ashish Garg
authored
[QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892)
### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <[email protected]>
1 parent 9d0dc9f commit 788ca51

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

onnxruntime/core/providers/qnn/qnn_allocator.cc

+3-1
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,9 @@ void HtpSharedMemoryAllocator::Free(void* allocation_address) {
181181
// Avoid throwing exceptions as this may be running from a destructor.
182182
try {
183183
// take ownership of shared memory and free at end of scope
184-
auto shared_memory = WrapSharedMemoryWithUniquePtr(allocation_address, rpcmem_lib_->Api());
184+
const size_t allocation_offset = AllocationOffsetFromStartOfHeader();
185+
void* raw_allocation_address = (void*)((std::byte*)allocation_address - allocation_offset);
186+
auto shared_memory = WrapSharedMemoryWithUniquePtr(raw_allocation_address, rpcmem_lib_->Api());
185187

186188
// destroy header
187189
allocation_header.~AllocationHeader();

0 commit comments

Comments
 (0)