-
Notifications
You must be signed in to change notification settings - Fork 1k
[Feat] Support multi-device inference and OCR batch inference #3923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[Feat] Support multi-device inference and OCR batch inference #3923
Conversation
Bobholamovic
commented
Apr 28, 2025
•
edited
Loading
edited
- 大部分OCR类产线支持batch size>1。
- 新增并行推理支持:内建多卡推理能力;提供多卡、多实例推理应用代码示例;新增和更新文档。
…ovic/PaddleX into feat/parallel_computing
Thanks for your contribution! |
…olamovic/PaddleX into feat/optimize_ppstructurev3
@@ -39,7 +39,6 @@ In short, just three steps: | |||
* `use_hpip`:`bool` type, whether to enable the high-performance inference plugin; | |||
* `hpi_config`:`dict | None` type, high-performance inference configuration; | |||
* _`inference hyperparameters`_: used to set common inference hyperparameters. Please refer to specific model description document for details. | |||
* Return Value: `BasePredictor` type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实际返回的类型并不是BasePredictor
,而是一个单下划线开头的内部类,故去除这里的错误说明。
@@ -4,7 +4,7 @@ comments: true | |||
|
|||
# PaddleX High-Performance Inference Guide | |||
|
|||
In real production environments, many applications impose strict performance metrics—especially in response time—on deployment strategies to ensure system efficiency and a smooth user experience. To address this, PaddleX offers a high-performance inference plugin that, through automatic configuration and multi-backend inference capabilities, enables users to significantly accelerate model inference without concerning themselves with complex configurations and low-level details. | |||
In real production environments, many applications impose strict performance metrics—especially in response time—on deployment strategies to ensure system efficiency and a smooth user experience. To address this, PaddleX offers a high-performance inference plugin that, through automatic configuration and multi-backend inference capabilities, enables users to significantly accelerate model inference without concerning themselves with complex configurations and low-level details. In addition to supporting inference acceleration on pipelines, the PaddleX high-performance inference plugin can also be used to accelerate inference when modules are used standalone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
基本上,各种“部署”相关的概念是适用于产线的,但高性能推理比较特别,也适用于模块,所以这里特别提了一下。
@@ -1,6 +1,8 @@ | |||
|
|||
pipeline_name: PP-StructureV3 | |||
|
|||
batch_size: 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认使用测试中表现最好的配置
@@ -833,7 +833,7 @@ def _build_ui_runtime(self, backend, backend_config, ui_option=None): | |||
for name, shapes in backend_config.dynamic_shapes.items(): | |||
ui_option.trt_option.set_shape(name, *shapes) | |||
else: | |||
logging.warning( | |||
logging.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不应该是警告,而是预期的行为,所以用info
@@ -335,6 +335,8 @@ def _prepare_pp_option( | |||
device_info = None | |||
if pp_option is None: | |||
pp_option = PaddlePredictorOption(model_name=self.model_name) | |||
elif pp_option.model_name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
支持自动注入model_name
,从而允许产线也通过pp_option
设置一些基本配置
@@ -105,6 +104,8 @@ def resize_image_type1(self, img): | |||
resize_w = ori_w * resize_h / ori_h | |||
N = math.ceil(resize_w / 32) | |||
resize_w = N * 32 | |||
if resize_h == ori_h and resize_w == ori_w: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于不resize的情况直接返回原图,减小开销
config["use_hpip"] = use_hpip | ||
if hpi_config is not None: | ||
config["hpi_config"] = hpi_config | ||
if use_hpip is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里原本的逻辑有点问题,可能导致配置文件中的设置不生效,这里修复了
@@ -21,7 +21,7 @@ | |||
set_env_for_device, | |||
update_device_num, | |||
) | |||
from ...utils.flags import FLAGS_json_format_model, DISABLE_CINN_MODEL_WL | |||
from ...utils.flags import DISABLE_CINN_MODEL_WL, FLAGS_json_format_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter自动修复
block.region_label not in mask_labels | ||
and block.secondary_direction == cut_direction | ||
): | ||
if len(all_boxes) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
缺少对边界情况all_boxes
为空的处理
@@ -101,7 +101,9 @@ def create_model(self, config: Dict, **kwargs) -> BasePredictor: | |||
model_dir=model_dir, | |||
device=self.device, | |||
batch_size=config.get("batch_size", 1), | |||
pp_option=self.pp_option, | |||
pp_option=( | |||
self.pp_option.copy() if self.pp_option is not None else self.pp_option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddlePredictorOption和Predictor是组合关系(predictor对pp_option的生命周期负责)而不是聚合关系,同时采用依赖注入的方式,为了防止predictor对pp_option的原地修改导致一个产线的不同模型错误地共享trt dynamic shape等信息,每次创建模型时对pp_option作一次拷贝