Skip to content

Commit 8bfdb52

Browse files
authored
Merge branch 'develop' into 0423
2 parents 7d452f8 + 7b0244e commit 8bfdb52

File tree

6 files changed

+317
-15
lines changed

6 files changed

+317
-15
lines changed

docs/pipeline_deploy/serving.en.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -308,13 +308,13 @@ First, pull the Docker image as needed:
308308
- Image supporting deployment with NVIDIA GPU (the machine must have NVIDIA drivers that support CUDA 11.8 installed):
309309
310310
```bash
311-
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc0-gpu
311+
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc1-gpu
312312
```
313313
314314
- CPU-only Image:
315315
316316
```bash
317-
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc0-cpu
317+
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc1-cpu
318318
```
319319
320320
With the image prepared, navigate to the `server` directory and execute the following command to run the server:
@@ -338,7 +338,7 @@ docker run \
338338
- If CPU deployment is required, there is no need to specify `--gpus`.
339339
- If you need to enter the container for debugging, you can replace `/bin/bash server.sh` in the command with `/bin/bash`. Then execute `/bin/bash server.sh` inside the container.
340340
- If you want the server to run in the background, you can replace `-it` in the command with `-d`. After the container starts, you can view the container logs with `docker logs -f {container ID}`.
341-
- Add `-e PADDLEX_USE_HPIP=1` to use the PaddleX high-performance inference plugin to accelerate the pipeline inference process. Please refer to the [PaddleX High-Performance Inference Guide](./high_performance_inference.en.md) for more information.
341+
- Add `-e PADDLEX_HPS_USE_HPIP=1` to use the PaddleX high-performance inference plugin to accelerate the pipeline inference process. Please refer to the [PaddleX High-Performance Inference Guide](./high_performance_inference.en.md) for more information.
342342

343343
You may observe output similar to the following:
344344

docs/pipeline_deploy/serving.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -308,13 +308,13 @@ paddlex --serve --pipeline image_classification --use_hpip
308308
- 支持使用 NVIDIA GPU 部署的镜像(机器上需要安装有支持 CUDA 11.8 的 NVIDIA 驱动):
309309
310310
```bash
311-
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc0-gpu
311+
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc1-gpu
312312
```
313313
314314
- CPU-only 镜像:
315315
316316
```bash
317-
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc0-cpu
317+
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/hps:paddlex3.0.0rc1-cpu
318318
```
319319
320320
准备好镜像后,切换到 `server` 目录,执行如下命令运行服务器:
@@ -338,7 +338,7 @@ docker run \
338338
- 如果希望使用 CPU 部署,则不需要指定 `--gpus`
339339
- 如果需要进入容器内部调试,可以将命令中的 `/bin/bash server.sh` 替换为 `/bin/bash`,然后在容器中执行 `/bin/bash server.sh`
340340
- 如果希望服务器在后台运行,可以将命令中的 `-it` 替换为 `-d`。容器启动后,可通过 `docker logs -f {容器 ID}` 查看容器日志。
341-
- 在命令中添加 `-e PADDLEX_USE_HPIP=1` 可以使用 PaddleX 高性能推理插件加速产线推理过程。请参考 [PaddleX 高性能推理指南](./high_performance_inference.md) 获取更多信息。
341+
- 在命令中添加 `-e PADDLEX_HPS_USE_HPIP=1` 可以使用 PaddleX 高性能推理插件加速产线推理过程。请参考 [PaddleX 高性能推理指南](./high_performance_inference.md) 获取更多信息。
342342

343343
可观察到类似下面的输出信息:
344344

docs/support_list/pipelines_list.en.md

+89-2
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ comments: true
6868
</td>
6969
</tr>
7070
<tr>
71-
<td rowspan = 7>Document Scene Information Extraction v3</td>
71+
<td rowspan = 7>PP-ChatOCRv3</td>
7272
<td>Table Structure Recognition</td>
7373
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">Online Experience</a></td>
7474
<td rowspan = 7>Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module.</td>
@@ -93,11 +93,45 @@ comments: true
9393
<td>Seal Text Detection</td>
9494
</tr>
9595
<tr>
96-
<td>Text Image Un ra p ping</td>
96+
<td>Text Image Unrapping</td>
9797
</tr>
9898
<tr>
9999
<td>Document Image Orientation Classification</td>
100100
</tr>
101+
<tr>
102+
<td rowspan="8">PP-ChatOCRv4</td>
103+
<td>Table Structure Recognition</td>
104+
<td rowspan="8">Coming Soon</td>
105+
<td rowspan="8">Document Scene Information Extraction v4 (PP-ChatOCRv4) is a PaddlePaddle-featured intelligent analysis solution for documents and images, combining LLM, MLLM, and OCR technologies. Based on PP-ChatOCRv3, it optimizes common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. It integrates massive data and knowledge with the Ernie model, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment methods, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
106+
<td rowspan="8">
107+
<ul>
108+
<li>Knowledge Graph Construction</li>
109+
<li>Detection of Information Related to Specific Events in Online News and Social Media</li>
110+
<li>Extraction and Analysis of Key Information in Academic Literature (especially scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
111+
</ul>
112+
</td>
113+
</tr>
114+
<tr>
115+
<td>Layout Detection</td>
116+
</tr>
117+
<tr>
118+
<td>Text Detection</td>
119+
</tr>
120+
<tr>
121+
<td>Text Recognition</td>
122+
</tr>
123+
<tr>
124+
<td>Seal Text Detection</td>
125+
</tr>
126+
<tr>
127+
<td>Text Image Unrapping</td>
128+
</tr>
129+
<tr>
130+
<td>Document Image Orientation Classification</td>
131+
</tr>
132+
<tr>
133+
<td>Document-based Vision-Language Model</td>
134+
</tr>
101135
<tr>
102136
<td rowspan="5">General OCR</td>
103137
<td>Text Detection</td>
@@ -291,6 +325,59 @@ comments: true
291325
<tr>
292326
<td>Seal Text Detection</td>
293327
</tr>
328+
<tr>
329+
<td rowspan="13">General Layout Parsing v3</td>
330+
<td>Layout Detection Module</td>
331+
<td rowspan="13">Coming Soon</td>
332+
<td rowspan="13">Based on the General Layout Parsing v1 pipeline, the General Layout Parsing v3 pipeline enhances the capabilities of layout detection, table recognition, and formula recognition. It adds the ability to restore multi-column reading order and convert results into Markdown files. It performs exceptionally well in various document data and can handle more complex document data. This pipeline also provides flexible service deployment methods, supporting multiple programming languages on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
333+
<td rowspan="13">
334+
<ul>
335+
<li>Intelligent Document Analysis</li>
336+
<li>Document Digitization</li>
337+
<li>Page Structure Parsing</li>
338+
<li>Complex Table Recognition</li>
339+
<li>Large Model Data Construction</li>
340+
<li>RAG</li>
341+
</ul>
342+
</td>
343+
</tr>
344+
<tr>
345+
<td>Text Detection Module</td>
346+
</tr>
347+
<tr>
348+
<td>Text Recognition Module</td>
349+
</tr>
350+
<tr>
351+
<td>Doc Img Orientation Classification</td>
352+
</tr>
353+
<tr>
354+
<td>Text Image Unrapping Module</td>
355+
</tr>
356+
<tr>
357+
<td>Wired Table Structure Recognition Module</td>
358+
</tr>
359+
<tr>
360+
<td>Wireless Table Structure Recognition Module</td>
361+
</tr>
362+
<tr>
363+
<td>Table Classification Module</td>
364+
</tr>
365+
<tr>
366+
<td>Wired Table Cell Detection Module</td>
367+
</tr>
368+
<tr>
369+
<td>Wireless Table Cell Detection Module</td>
370+
</tr>
371+
<tr>
372+
<td>Text Line Orientation Classification Module</td>
373+
</tr>
374+
<tr>
375+
<td>Formula Recognition Module</td>
376+
</tr>
377+
<tr>
378+
<td>Seal Text Detection Module</td>
379+
</tr>
380+
294381
<tr>
295382
<td rowspan="4">Formula Recognition</td>
296383
<td>Formula Recognition</td>

docs/support_list/pipelines_list.md

+90-6
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,41 @@ comments: true
9898
<tr>
9999
<td>文档图像方向分类</td>
100100
</tr>
101+
<tr>
102+
<td rowspan = 8>文档场景信息抽取v4</td>
103+
<td>表格结构识别</td>
104+
<td rowspan = 8>comming soon</td>
105+
<td rowspan = 8>文档场景信息抽取v4(PP-ChatOCRv4)是飞桨特色的文档和图像智能分析解决方案,结合了 LLM、MLLM 和 OCR 技术,在文档场景信息抽取v3的基础上,优化了版面分析、生僻字、多页 pdf、表格、印章识别等常见的复杂文档信息抽取难点问题,结合文心大模型将海量数据和知识相融合,准确率高且应用广泛。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上部署。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
106+
</td>
107+
<td rowspan="8">
108+
<ul>
109+
<li>知识图谱的构建</li>
110+
<li>在线新闻和社交媒体中特定事件相关信息的检测</li>
111+
<li>学术文献中关键信息的抽取和分析(特别是需要对印章、扭曲图片、更复杂表格进行识别的场景)</li>
112+
</ul>
113+
</td>
114+
</tr>
115+
<tr>
116+
<td>版面区域检测</td>
117+
</tr>
118+
<tr>
119+
<td>文本检测</td>
120+
</tr>
121+
<tr>
122+
<td>文本识别</td>
123+
</tr>
124+
<tr>
125+
<td>印章文本检测</td>
126+
</tr>
127+
<tr>
128+
<td>文本图像矫正</td>
129+
</tr>
130+
<tr>
131+
<td>文档图像方向分类</td>
132+
</tr>
133+
<tr>
134+
<td>文档类视觉语言模型</td>
135+
</tr>
101136
<tr>
102137
<td rowspan = 5>通用OCR</td>
103138
<td>文本检测</td>
@@ -251,11 +286,11 @@ comments: true
251286
</ul></td>
252287
</tr>
253288
<tr>
254-
<td rowspan = 10>通用版面解析</td>
255-
<td>版面区域检测</td>
256-
<td rowspan = 10>暂无</td>
257-
<td rowspan = 10>版面解析是一种从文档图像中提取结构化信息的技术,主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别(OCR)、图像处理和机器学习算法,能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤,最终生成结构化的文档数据,提升数据处理的效率和准确性。</td>
258-
<td rowspan="10">
289+
<td rowspan = 9>通用版面解析</td>
290+
<td>版面区域检测模块</td>
291+
<td rowspan = 9>暂无</td>
292+
<td rowspan = 9>版面解析是一种从文档图像中提取结构化信息的技术,主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别(OCR)、图像处理和机器学习算法,能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤,最终生成结构化的文档数据,提升数据处理的效率和准确性。</td>
293+
<td rowspan="9">
259294
<ul>
260295
<li>金融与法律文档分析</li>
261296
<li>历史文献和档案数字化</li>
@@ -265,7 +300,44 @@ comments: true
265300
</td>
266301
</tr>
267302
<tr>
303+
<td>文本检测模块</td>
304+
</tr>
305+
<tr>
306+
<td>文本识别模块</td>
307+
</tr>
308+
<tr>
309+
<td>文档图像方向分类模块</td>
310+
</tr>
311+
<tr>
312+
<td>文本图像矫正模块</td>
313+
</tr>
314+
<tr>
315+
<td>表格结构识别模块</td>
316+
</tr>
317+
<tr>
318+
<td>文本行方向分类模块</td>
319+
</tr>
320+
<tr>
321+
<td>公式识别模块</td>
322+
</tr>
323+
<tr>
324+
<td>印章文本检测模块</td>
325+
</tr>
326+
<tr>
327+
<td rowspan = 13>通用版面解析v3</td>
268328
<td>版面区域检测模块</td>
329+
<td rowspan = 13>comming soon</td>
330+
<td rowspan = 13>通用版面解析v3产线在通用版面解析v1产线的基础上,强化了版面区域检测、表格识别、公式识别的能力,增加了多栏阅读顺序的恢复能力、结果转换 Markdown 文件的能力,在多种文档数据中,表现优异,可以处理较复杂的文档数据。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。</td>
331+
<td rowspan="13">
332+
<ul>
333+
<li>智能文档分析</li>
334+
<li>文档数字化</li>
335+
<li>页面结构解析</li>
336+
<li>复杂表格识别</li>
337+
<li>大模型数据构建</li>
338+
<li>RAG</li>
339+
</ul>
340+
</td>
269341
</tr>
270342
<tr>
271343
<td>文本检测模块</td>
@@ -280,7 +352,19 @@ comments: true
280352
<td>文本图像矫正模块</td>
281353
</tr>
282354
<tr>
283-
<td>表格结构识别模块</td>
355+
<td>有线表表格结构识别模块</td>
356+
</tr>
357+
<tr>
358+
<td>无线表表格结构识别模块</td>
359+
</tr>
360+
<tr>
361+
<td>表格分类模块</td>
362+
</tr>
363+
<tr>
364+
<td>有线表表格单元格检测模块</td>
365+
</tr>
366+
<tr>
367+
<td>无线表表格单元格检测模块</td>
284368
</tr>
285369
<tr>
286370
<td>文本行方向分类模块</td>

docs/support_list/pipelines_list_npu.en.md

+65
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,71 @@ comments: true
6767
</ul>
6868
</td>
6969
</tr>
70+
<tr>
71+
<td rowspan = 7>PP-ChatOCRv3</td>
72+
<td>Table Structure Recognition</td>
73+
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">Online Experience</a></td>
74+
<td rowspan = 7>Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module.</td>
75+
<td rowspan="7">
76+
<ul>
77+
<li>Construction of knowledge graphs</li>
78+
<li>Detection of information related to specific events in online news and social media</li>
79+
<li>Extraction and analysis of key information in academic literature (especially in scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
80+
</ul>
81+
</td>
82+
</tr>
83+
<tr>
84+
<td>Layout Detection</td>
85+
</tr>
86+
<tr>
87+
<td>Text Detection</td>
88+
</tr>
89+
<tr>
90+
<td>Text Recognition</td>
91+
</tr>
92+
<tr>
93+
<td>Seal Text Detection</td>
94+
</tr>
95+
<tr>
96+
<td>Text Image Unrapping</td>
97+
</tr>
98+
<tr>
99+
<td>Document Image Orientation Classification</td>
100+
</tr>
101+
<tr>
102+
<td rowspan="8">PP-ChatOCRv4</td>
103+
<td>Table Structure Recognition</td>
104+
<td rowspan="8">Coming Soon</td>
105+
<td rowspan="8">Document Scene Information Extraction v4 (PP-ChatOCRv4) is a PaddlePaddle-featured intelligent analysis solution for documents and images, combining LLM, MLLM, and OCR technologies. Based on PP-ChatOCRv3, it optimizes common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. It integrates massive data and knowledge with the Ernie model, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment methods, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
106+
<td rowspan="8">
107+
<ul>
108+
<li>Knowledge Graph Construction</li>
109+
<li>Detection of Information Related to Specific Events in Online News and Social Media</li>
110+
<li>Extraction and Analysis of Key Information in Academic Literature (especially scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
111+
</ul>
112+
</td>
113+
</tr>
114+
<tr>
115+
<td>Layout Detection</td>
116+
</tr>
117+
<tr>
118+
<td>Text Detection</td>
119+
</tr>
120+
<tr>
121+
<td>Text Recognition</td>
122+
</tr>
123+
<tr>
124+
<td>Seal Text Detection</td>
125+
</tr>
126+
<tr>
127+
<td>Text Image Unrapping</td>
128+
</tr>
129+
<tr>
130+
<td>Document Image Orientation Classification</td>
131+
</tr>
132+
<tr>
133+
<td>Document-based Vision-Language Model</td>
134+
</tr>
70135
<tr>
71136
<td rowspan="5">General OCR</td>
72137
<td>Text Detection</td>

0 commit comments

Comments
 (0)