Add QNN EP HTP shared memory allocator by edgchen1 · Pull Request #23136 · microsoft/onnxruntime

edgchen1 · 2024-12-18T01:08:30Z

Description

Adds QNN EP HTP shared memory allocator.

The HTP shared memory allocator (HtpSharedMemoryAllocator) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU.

The allocator can be enabled by setting QNN EP option enable_htp_shared_memory_allocator to 1. QNNExecutionProvider::CreatePreferredAllocators() will then return an instance of HtpSharedMemoryAllocator.

For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to QnnBackendManager, which also manages the QNN context handles.

For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial

Limitations:

HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported.
An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer.

Motivation and Context

Improve performance by using HTP shared memory to avoid overhead from copying data between CPU and NPU.

…test

… declarations and definitions for IAllocator::TensorAlloc().

…ion clean up callback

onnxruntime/test/shared_lib/test_inference.cc

include/onnxruntime/core/session/onnxruntime_cxx_api.h

yuslepukhin · 2025-01-13T20:05:42Z

Can this be used for Lora support when the model is modified to have optional inputs, and the data can be fed to override default initializers?

onnxruntime/test/shared_lib/test_inference.cc

yuslepukhin

🕐

edgchen1 · 2025-01-13T23:00:38Z

Can this be used for Lora support when the model is modified to have optional inputs, and the data can be fed to override default initializers?

I'm not too familiar with the scenario. If that can be done using OrtValues, an input OrtValue can use this new allocator.

yuslepukhin

onnxruntime/core/providers/qnn/qnn_allocator.cc

edgchen1 · 2025-01-14T17:45:50Z

/azp run Windows GPU WebGPU CI Pipeline

azure-pipelines · 2025-01-14T17:46:04Z

Azure Pipelines successfully started running 1 pipeline(s).

Adds QNN EP HTP shared memory allocator. The HTP shared memory allocator (`HtpSharedMemoryAllocator`) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU. The allocator can be enabled by setting QNN EP option `enable_htp_shared_memory_allocator` to `1`. `QNNExecutionProvider::CreatePreferredAllocators()` will then return an instance of `HtpSharedMemoryAllocator`. For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to `QnnBackendManager`, which also manages the QNN context handles. For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial Limitations: - HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported. - An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer. Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>

…24196) ### Description During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU. With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided. ### Motivation and Context This should help reduce the peak CPU memory consumption while running AI work loads using shared memory. Related PR: #23136 Co-authored-by: Ashish Garg (AISW) <ashigarg@qti.qualcomm.com>

…icrosoft#24196) ### Description During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU. With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided. ### Motivation and Context This should help reduce the peak CPU memory consumption while running AI work loads using shared memory. Related PR: microsoft#23136 Co-authored-by: Ashish Garg (AISW) <ashigarg@qti.qualcomm.com>

edgchen1 and others added 30 commits November 5, 2024 15:12

save work

110a3bc

save work

0ba3a2f

add logging for setting QNN tensor memory, update comment

8436b14

add option to enable HTP shared memory allocator to onnxruntime_perf_…

c9826f4

…test

hack - try to cache mem handles in QnnModel

c07c35e

Remove duplicate include.

60dc837

hack, continued - move cache out to SharedContext

24e072f

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

e66cbef

move mem handle registration to allocator

8c515da

hook up some test code

18e2780

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

09ddce5

rename to RpcMemAllocator to HtpSharedMemoryAllocator

a65bb71

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

bfb135e

remove onnx protobuf dependency from allocator.h, add shared provider…

f179a0d

… declarations and definitions for IAllocator::TensorAlloc().

remove unused CPUAllocator::TensorAlloc declaration

7645ef4

Check for nullptr when trying to free

1043732

move mem handle management to QNN backend manager

022f4bc

remove IAllocator::TensorAlloc()

c527dee

document IAllocator::Free

e4f72b3

remove IAllocator__TensorAlloc

39ff901

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

1bed5a4

fix android build warning

d70db84

remove shared mem handles from shared context

45ef883

remove allocation clean up callback removal, use weak_ptrs in allocat…

d2e7b3c

…ion clean up callback

some clean up

c892c18

more clean up

b295eef

add helper to get qnn error message

13f5e30

use make_shared for QnnBackendManager

d5eace1

add test to qnn_basic_test.cc, document allocator parameter.

bacbcdc

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

30cd9ed

edgchen1 added 5 commits January 10, 2025 16:27

add / update tests

f373035

add check for qnn tensor dynamic shape

e86ff2e

Add comment about multi-threading considerations

6fa33f0

fix test comment

4101cca

fix formatting

14af7ad

edgchen1 commented Jan 11, 2025

View reviewed changes

onnxruntime/test/shared_lib/test_inference.cc Outdated Show resolved Hide resolved

edgchen1 added 2 commits January 13, 2025 11:20

add ifdef to use htp backend if on arm64 or linux.

2f5c93c

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

b868a9f

yuslepukhin reviewed Jan 13, 2025

View reviewed changes

include/onnxruntime/core/session/onnxruntime_cxx_api.h Outdated Show resolved Hide resolved

fix typo

7ca4552

yuslepukhin reviewed Jan 13, 2025

View reviewed changes

onnxruntime/test/shared_lib/test_inference.cc Show resolved Hide resolved

yuslepukhin reviewed Jan 13, 2025

View reviewed changes

onnxruntime/test/shared_lib/test_inference.cc Show resolved Hide resolved

yuslepukhin requested changes Jan 13, 2025

View reviewed changes

edgchen1 requested a review from yuslepukhin January 13, 2025 23:03

yuslepukhin approved these changes Jan 13, 2025

View reviewed changes

baijumeswani approved these changes Jan 13, 2025

View reviewed changes

skottmckay approved these changes Jan 13, 2025

View reviewed changes

adrianlizarraga approved these changes Jan 14, 2025

View reviewed changes

adrianlizarraga reviewed Jan 14, 2025

View reviewed changes

onnxruntime/core/providers/qnn/qnn_allocator.cc Show resolved Hide resolved

edgchen1 merged commit 04030f6 into main Jan 14, 2025
98 checks passed

edgchen1 deleted the edgchen1/qnn_ep_rpcmem branch January 14, 2025 19:09

quic-ashigarg mentioned this pull request Mar 26, 2025

Set shared memory type based on options during the compilation phase #24196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QNN EP HTP shared memory allocator#23136

Add QNN EP HTP shared memory allocator#23136
edgchen1 merged 61 commits intomainfrom
edgchen1/qnn_ep_rpcmem

edgchen1 commented Dec 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!

yuslepukhin left a comment

Uh oh!

edgchen1 commented Jan 13, 2025

Uh oh!

yuslepukhin left a comment

Uh oh!

Uh oh!

edgchen1 commented Jan 14, 2025

Uh oh!

azure-pipelines bot commented Jan 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

edgchen1 commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

edgchen1 commented Jan 13, 2025

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

edgchen1 commented Jan 14, 2025

Uh oh!

azure-pipelines bot commented Jan 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

edgchen1 commented Dec 18, 2024 •

edited

Loading