-
Notifications
You must be signed in to change notification settings - Fork 1k
[Feat] Support multi-device inference and OCR batch inference #3923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Bobholamovic
wants to merge
49
commits into
PaddlePaddle:develop
Choose a base branch
from
Bobholamovic:feat/optimize_ppstructurev3
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
1b6d11e
feat/parallel_computing
Bobholamovic ce01e2f
Rename functions and parallelize read operations
Bobholamovic 2bdae90
Add full support for predictor processors
Bobholamovic 6243f45
Merge branch 'develop' into feat/parallel_computing
Bobholamovic aa16d50
Optimize for OCR
Bobholamovic 346cf5b
Merge remote-tracking branch 'official/develop' into feat/parallel_co…
Bobholamovic 1dc9d5e
Remove optimization for rotate_image
Bobholamovic b5a8ecd
Fix bug
Bobholamovic c43df36
Merge remote-tracking branch 'official/develop' into feat/parallel_co…
Bobholamovic caf91c1
Merge remote-tracking branch 'official/develop' into feat/parallel_co…
Bobholamovic 551e289
Merge branch 'feat/parallel_computing' of https://github.com/Bobholam…
Bobholamovic 90c5170
Merge branch 'develop' into feat/parallel_computing
Bobholamovic a524e6a
Support multi-device inference and pipeline batch inference
Bobholamovic fdc923b
Fix hpip bug
Bobholamovic 6e7e7fb
Fix hpip bug
Bobholamovic 0e6382c
Fix hpip bug
Bobholamovic 53f7d82
warning->info
Bobholamovic 6d96879
Fix table recognition v2 bugs
Bobholamovic 6698697
Merge branch 'feat/parallel_computing' into feat/optimize_ppstructurev3
Bobholamovic a40876c
Support seal recognition and PP-StructureV3
Bobholamovic 983d87f
Update doc
Bobholamovic 0b4621e
Fix
Bobholamovic d400339
PaddlePredictorOption supports copy
Bobholamovic c458d94
Fix OCR bug
Bobholamovic 5b35046
No parallel if iterable has only one element
Bobholamovic 3e9736a
Add quick return for text det resize
Bobholamovic c266c60
Fix xycuts bug
Bobholamovic 9401d6d
Merge branch 'feat/optimize_ppstructurev3' of https://github.com/Bobh…
Bobholamovic 25ab205
Revert "Merge branch 'feat/parallel_computing' into feat/optimize_pps…
Bobholamovic c7fa3dc
Fix table bug
Bobholamovic 3bac9ea
Cancel execution permission for regular files in ultra-infer
Bobholamovic 4f7d590
More pipelines support multi-gpu inference
Bobholamovic cbccae0
PP-StructureV3 by default uses bs8
Bobholamovic ba9ffe8
Reset ultra-infer
Bobholamovic a91c416
Add parallel inference docs
Bobholamovic 6e0db1e
Add parallel inference doc to mkdocs.yml
Bobholamovic 95b433a
Update doc
Bobholamovic dfc8833
Update opset_version description
Bobholamovic 8c3bfb1
Fix serving docs
Bobholamovic b4d8ee9
PP-FormulaNet-L supports paddle_fp16
Bobholamovic 40479d2
/workspace -> /app
Bobholamovic 7bced71
Unset paddle_fp16
Bobholamovic 4b8a6c2
Add FAQ in hpi doc
Bobholamovic 3b488d7
Fix docs
Bobholamovic 40e684c
Update docs
Bobholamovic 0ba3194
Merge branch 'develop' into feat/optimize_ppstructurev3
Bobholamovic 76d94a7
Update PP-StructureV3 interface
Bobholamovic 10eb0aa
Add note on edge deployment
Bobholamovic 0fc4f9a
Polish docs
Bobholamovic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ comments: true | |
|
||
# PaddleX High-Performance Inference Guide | ||
|
||
In real production environments, many applications impose strict performance metrics—especially in response time—on deployment strategies to ensure system efficiency and a smooth user experience. To address this, PaddleX offers a high-performance inference plugin that, through automatic configuration and multi-backend inference capabilities, enables users to significantly accelerate model inference without concerning themselves with complex configurations and low-level details. | ||
In real production environments, many applications impose strict performance metrics—especially in response time—on deployment strategies to ensure system efficiency and a smooth user experience. To address this, PaddleX offers a high-performance inference plugin that, through automatic configuration and multi-backend inference capabilities, enables users to significantly accelerate model inference without concerning themselves with complex configurations and low-level details. In addition to supporting inference acceleration on pipelines, the PaddleX high-performance inference plugin can also be used to accelerate inference when modules are used standalone. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 基本上,各种“部署”相关的概念是适用于产线的,但高性能推理比较特别,也适用于模块,所以这里特别提了一下。 |
||
|
||
## Table of Contents | ||
|
||
|
@@ -24,7 +24,7 @@ In real production environments, many applications impose strict performance met | |
|
||
Before using the high-performance inference plugin, please ensure that you have completed the PaddleX installation according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md) and have run the quick inference using the PaddleX pipeline command line or the PaddleX pipeline Python script as described in the usage instructions. | ||
|
||
The high-performance inference plugin supports handling multiple model formats, including **PaddlePaddle static graph (`.pdmodel`, `.json`)**, **ONNX (`.onnx`)** and **Huawei OM (`.om`)**, among others. For ONNX models, you can convert them using the [Paddle2ONNX Plugin](./paddle2onnx.en.md). If multiple model formats are present in the model directory, PaddleX will automatically choose the appropriate one as needed, and automatic model conversion may be performed. **It is recommended to install the Paddle2ONNX plugin first before installing the high-performance inference plugin, so that PaddleX can convert model formats when needed.** | ||
The high-performance inference plugin supports handling multiple model formats, including **PaddlePaddle static graph (`.pdmodel`, `.json`)**, **ONNX (`.onnx`)** and **Huawei OM (`.om`)**, among others. For ONNX models, you can convert them using the [Paddle2ONNX Plugin](./paddle2onnx.en.md). If multiple model formats are present in the model directory, PaddleX will automatically choose the appropriate one as needed, and automatic model conversion may be performed. | ||
|
||
### 1.1 Installing the High-Performance Inference Plugin | ||
|
||
|
@@ -86,12 +86,14 @@ Refer to [Get PaddleX based on Docker](../installation/installation.en.md#21-obt | |
</tbody> | ||
</table> | ||
|
||
In the official PaddleX Docker image, TensorRT is installed by default. The high-performance inference plugin can then accelerate inference using the Paddle Inference TensorRT subgraph engine. | ||
The official PaddleX Docker images come with the Paddle2ONNX plugin pre-installed, allowing PaddleX to convert model formats on demand. In addition, the GPU version of the image includes TensorRT, so the high-performance inference plugin can leverage the Paddle Inference TensorRT subgraph engine for accelerated inference. | ||
|
||
**Please note that the aforementioned Docker image refers to the official PaddleX image described in [Get PaddleX via Docker](../installation/installation.en.md#21-get-paddlex-based-on-docker), rather than the PaddlePaddle official image described in [PaddlePaddle Local Installation Tutorial](../installation/paddlepaddle_install.en.md#installing-paddlepaddle-via-docker). For the latter, please refer to the local installation instructions for the high-performance inference plugin.** | ||
|
||
#### 1.1.2 Installing the High-Performance Inference Plugin Locally | ||
|
||
**It is recommended to install the Paddle2ONNX plugin first before installing the high-performance inference plugin, so that PaddleX can convert model formats when needed.** | ||
|
||
**To install the CPU version of the high-performance inference plugin:** | ||
|
||
Run: | ||
|
@@ -322,7 +324,7 @@ The available configuration items for `backend_config` vary for different backen | |
|
||
### 2.3 Modifying the High-Performance Inference Configuration | ||
|
||
Due to the diversity of actual deployment environments and requirements, the default configuration might not meet all needs. In such cases, manual adjustment of the high-performance inference configuration may be necessary. Users can modify the configuration by editing the **pipeline/module configuration file** or by passing the `hpi_config` field in the parameters via **CLI** or **Python API**. **Parameters passed via CLI or Python API will override the settings in the pipeline/module configuration file.** Different levels of configurations in the config file are automatically merged, and the deepest-level settings take the highest priority. The following examples illustrate how to modify the configuration. | ||
When the model is initialized, the log will, by default, record the high-performance inference configuration that is about to be used. Due to the diversity of actual deployment environments and requirements, the default configuration might not meet all needs. In such cases, manual adjustment of the high-performance inference configuration may be necessary. Users can modify the configuration by editing the pipeline/module configuration file or by passing the `hpi_config` field in the parameters via CLI or Python API. Parameters passed via CLI or Python API will override the settings in the pipeline/module configuration file. Different levels of configurations in the config file are automatically merged, and the deepest-level settings take the highest priority. The following examples illustrate how to modify the configuration. | ||
|
||
**For the general OCR pipeline, use the `onnxruntime` backend for all models:** | ||
|
||
|
@@ -566,3 +568,11 @@ For the GPU version of the high-performance inference plugin, the official Paddl | |
**4. Why does the program freeze during runtime or display some "WARNING" and "ERROR" messages after using the high-performance inference feature? What should be done in such cases?** | ||
|
||
When initializing the model, operations such as subgraph optimization may take longer and may generate some "WARNING" and "ERROR" messages. However, as long as the program does not exit automatically, it is recommended to wait patiently, as the program usually continues to run to completion. | ||
|
||
**5. When using GPU for inference, enabling the high-performance inference plugin increases memory usage and causes OOM. How can this be resolved?** | ||
|
||
Some acceleration methods trade off memory usage to support a broader range of inference scenarios. If memory becomes a bottleneck, consider the following optimization strategies: | ||
|
||
* **Adjust pipeline configurations**: Disable unnecessary features to avoid loading redundant models. Appropriately reduce the batch size based on business requirements to balance throughput and memory usage. | ||
* **Switch inference backends**: Different inference backends have varying memory management strategies. Try benchmarking various backends to compare memory usage and performance. | ||
* **Optimize dynamic shape configurations**: For modules using TensorRT or Paddle Inference TensorRT subgraph engine, narrow the dynamic shape range based on the actual distribution of input data. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实际返回的类型并不是
BasePredictor
,而是一个单下划线开头的内部类,故去除这里的错误说明。