Skip to content

Commit 206d4b4

Browse files
author
quic_calvnguy
committed
[QNN-EP] Add documentation for optrace profiling
1 parent 2dd7182 commit 206d4b4

File tree

1 file changed

+129
-0
lines changed

1 file changed

+129
-0
lines changed

docs/execution-providers/QNN-ExecutionProvider.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ The QNN Execution Provider supports a number of configuration options. These pro
6262
|'off'||
6363
|'basic'||
6464
|'detailed'||
65+
|'optrace'|Requires QAIRT 2.39 or later|
6566

6667
|`"profiling_file_path"`|Description|
6768
|---|---|
@@ -442,6 +443,134 @@ g_ort->AddSessionConfigEntry(session_options, kOrtSessionOptionEpContextEmbedMod
442443
options.add_session_config_entry("ep.context_embed_mode", "1")
443444
```
444445

446+
## QNN EP Profiling
447+
Profiling data is available with the HTP backend. Enabling QNN profiling will generate a user-readable .csv file that will contain information from initialization, execution, and de-initialization.
448+
449+
If onnxruntime is compiled with a more recent QAIRT SDK (2.39 or later), then a _qnn.log file will also be generated alongside the .csv file. This .log file is parsable by [qnn-profile-viewer](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/general_tools.html#qnn-profile-viewer), which is provided in the SDK.
450+
451+
## General Usage
452+
To utilize QNN profiling, simply set the EP options profiling_level to basic, detailed, or optrace. Additionally, the EP option profiling_file_path must also be defined to the output .csv filepath you would like write data to:
453+
```python
454+
# Python on Windows on Snapdragon device
455+
import onnxruntime as ort
456+
import numpy as np
457+
458+
provider_options = [
459+
"htp_performance_mode": "burst",
460+
"device_id": "0",
461+
"htp_graph_finalization_optimization_mode":"3"
462+
"soc_model": "60",
463+
"htp_arch": "73",
464+
"vtcm_mv": "8",
465+
"profiling_level": "basic",
466+
"profiling_file_path": "output.csv"
467+
]
468+
469+
sess_options = ort.SessionOptions()
470+
471+
session = ort.InferenceSession(
472+
"model.onnx",
473+
sess_options=sess_options,
474+
providers=["QNNExecutionProvider"],
475+
provider_options=provider_options
476+
)
477+
478+
input0 = np.ones((1,2,3,4), dtype=np.float32)
479+
result = session.run(None, {"input": input0})
480+
```
481+
482+
With the example above, a file "output.csv" will be generated containing the profiling data. Additionally, if using QAIRT 2.39 SDK or later, another file "output_qnn.log" will be generated.
483+
484+
"output_qnn.log" can then be parsed with the appropriate qnn-profile-viewer binary:
485+
```console
486+
> qnn-profile-viewer.exe --input_log .\output_qnn.log --output output_2.csv
487+
```
488+
489+
The above will output basic information, such as the profiling data for the fastest and slowest execution as well as the average case. A .csv file can also be generated this way, too, though the information will likely not differ from the "output.csv".
490+
491+
Additionally, if the profiling_level is set to "detailed" or "optrace", additional data will be shown per-network-layer.
492+
493+
### Optrace-Level Profiling
494+
[Optrace-level profiling](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/htp_backend.html#qnn-htp-profiling) generates a profiling .log file that contains [Qualcomm Hexagon Tensor Processor Analaysis Summary (QHAS)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/htp_backend.html#qnn-htp-analysis-summary-qhas-) data. This data can be used to generate chrometraces and provide a web browser-friendly UI to visualize data.
495+
496+
**This feature is only available with the QAIRT 2.39 SDK and later.**
497+
498+
### Optrace Setup
499+
To utilize this feature, a context binary must be generated prior to execution:
500+
```python
501+
# Python on Windows on Snapdragon device
502+
import onnxruntime as ort
503+
import numpy as np
504+
505+
provider_options = [
506+
"htp_performance_mode": "burst",
507+
"device_id": "0",
508+
"htp_graph_finalization_optimization_mode":"3"
509+
"soc_model": "60",
510+
"htp_arch": "73",
511+
"vtcm_mv": "8",
512+
"profiling_level": "optrace", # Set profiling_level to optrace
513+
"profiling_file_path": "optrace.csv"
514+
]
515+
516+
sess_options = ort.SessionOptions()
517+
518+
# Enable context bin generation
519+
sess_options.add_session_config_entry("session.disable_cpu_ep_fallback", "1")
520+
sess_options.add_session_config_entry("ep.context_embed_mode", "0")
521+
sess_options.add_session_config_entry("ep.context_enable", "1")
522+
523+
session = ort.InferenceSession(
524+
"model.onnx",
525+
sess_options=sess_options,
526+
providers=["QNNExecutionProvider"],
527+
provider_options=provider_options
528+
)
529+
```
530+
531+
Upon successful session creation, three files will be genearted:
532+
- model_ctx.onnx
533+
- model_qnn.bin
534+
- QNNExecutionProvider_QNN__<number>_schematic.bin
535+
536+
model_ctx.onnx is an onnx model with a node that points to the model_qnn.bin context binary, which will be used by the HTP backend for execution. The _schematic.bin file will be used by qnn-profile-viewer to generate QHAS data.
537+
538+
### Generating QHAS Data
539+
Previously for general profiling data, the a session was created and executed with ""model.onnx". However, now there is a new _ctx.onnx model that utilizes a newly generated context binary. As such, a new inference session must be created with the new _ctx.onnx model:
540+
```python
541+
# Continuing from Optrace Setup:
542+
543+
sess_options.add_session_config_entry("ep.context_enable", "0")
544+
545+
optrace_session = ort.InferenceSession(
546+
"model_ctx.onnx",
547+
sess_options=sess_options,
548+
providers=["QNNExecutionProvider"],
549+
provider_options=provider_options
550+
)
551+
552+
input0 = np.ones((1,2,3,4), dtype=np.float32)
553+
result = optrace_session.run(None, {"input": input0})
554+
```
555+
556+
As before under "General Usage", a .csv file (optrace.csv) and a _qnn.log file (optrace_qnn.log) are generated. qnn-profile-viewer will be used again, but with different parameters:
557+
```console
558+
> qnn-profile-viewer.exe --config .\config.json --reader .\QnnHtpOptraceProfilingReader.dll --input_log .\optrace_qnn.log --schematic .\QNNExecutionProvider_QNN_12345_schematic.bin --output optrace.json
559+
```
560+
561+
Three new files are used:
562+
- config.json: Please refer to the "Post Process (Chrometrace Generation)" section [on this page](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/htp_backend.html#qnn-htp-optrace-profiling).
563+
- QnnHtpOptraceProfilingReader.dll: Provided as part of the QAIRT SDK. The corresponding file for Linux is libQnnHtpOptraceProfilingReader.so.
564+
- QNNExecutionProvider_QNN_12345_schematic.bin: The name will vary. This file must be the same one generated alongside the context binary under "Optrace Setup".
565+
566+
Additionally, the output file is now a .json file contaning chrometrace data. This .json file can be opened with either [Perfetto Trace Vizualizer](https://ui.perfetto.dev/) or with chrome://tracing.
567+
568+
After running qnn-profile-viewer, you should see a handful of .json files generated with the same prefix as the --output filename parameter. You should also see an .html file generated as well. This .html file can be opened by Chrome to view the chrometrace in a more user-friendly GUI.
569+
570+
### Additional References
571+
For more information how to interpret QHAS data, please refer to [this page](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/htp_backend.html#qnn-htp-analysis-summary-qhas-).
572+
For more information on the data collected with optrace profiling, please refer to [this page](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/htp_backend.html#qnn-htp-optrace-profiling).
573+
445574
## QNN EP weight sharing
446575

447576
### Weight sharing in Onnx domain

0 commit comments

Comments
 (0)