Skip to content

Commit fd6f65c

Browse files
HectorSVCedgchen1
andauthored
typo
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
1 parent b70fdce commit fd6f65c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/execution-providers/EP-Context-Design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ redirect_from: /docs/reference/execution-providers/EP-Context-Design
1717

1818
## Background
1919

20-
OnnxRuntime Execution Providers enable users to inference Onnx model on different kinds of hardware accelerators empowered by backend SDKs (like QNN, OpenVINO, Vitis AI, etc). The Execution Providers convert the Onnx model into graph format required by the backend SDK, and compiles it into the format required by the hardware. Specific to NPU world, the converting and compiling process takes a long time to complete, especially for LLM models. The session creation time costs tens of minutes for some cases which impacts the user experience badly.
20+
OnnxRuntime Execution Providers enable users to inference Onnx model on different kinds of hardware accelerators empowered by backend SDKs (like QNN, OpenVINO, Vitis AI, etc). The Execution Providers convert the Onnx model into graph format required by the backend SDK, and compile it into the format required by the hardware. Specific to NPU world, the converting and compiling process takes a long time to complete, especially for LLM models. The session creation time costs tens of minutes for some cases which impacts the user experience badly.
2121
To avoid the converting and compiling cost, most of the backend SDKs provide the feature to dump the pre-compiled model into binary file. The pre-compiled model can be loaded by backend SDK directly and executed on the target device. It improves the session creation time greatly by using this way. In order to achieve this, OnnxRuntime defined a contribute Op called EPContext in MS domain.
2222

2323
## EPContext Op Schema

0 commit comments

Comments
 (0)