Skip to content

Commit 7b0244e

Browse files
authored
add pipeline to pipeline list (#3899)
1 parent c674d2e commit 7b0244e

File tree

4 files changed

+310
-8
lines changed

4 files changed

+310
-8
lines changed

docs/support_list/pipelines_list.en.md

Lines changed: 89 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ comments: true
6868
</td>
6969
</tr>
7070
<tr>
71-
<td rowspan = 7>Document Scene Information Extraction v3</td>
71+
<td rowspan = 7>PP-ChatOCRv3</td>
7272
<td>Table Structure Recognition</td>
7373
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">Online Experience</a></td>
7474
<td rowspan = 7>Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module.</td>
@@ -93,11 +93,45 @@ comments: true
9393
<td>Seal Text Detection</td>
9494
</tr>
9595
<tr>
96-
<td>Text Image Un ra p ping</td>
96+
<td>Text Image Unrapping</td>
9797
</tr>
9898
<tr>
9999
<td>Document Image Orientation Classification</td>
100100
</tr>
101+
<tr>
102+
<td rowspan="8">PP-ChatOCRv4</td>
103+
<td>Table Structure Recognition</td>
104+
<td rowspan="8">Coming Soon</td>
105+
<td rowspan="8">Document Scene Information Extraction v4 (PP-ChatOCRv4) is a PaddlePaddle-featured intelligent analysis solution for documents and images, combining LLM, MLLM, and OCR technologies. Based on PP-ChatOCRv3, it optimizes common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. It integrates massive data and knowledge with the Ernie model, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment methods, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
106+
<td rowspan="8">
107+
<ul>
108+
<li>Knowledge Graph Construction</li>
109+
<li>Detection of Information Related to Specific Events in Online News and Social Media</li>
110+
<li>Extraction and Analysis of Key Information in Academic Literature (especially scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
111+
</ul>
112+
</td>
113+
</tr>
114+
<tr>
115+
<td>Layout Detection</td>
116+
</tr>
117+
<tr>
118+
<td>Text Detection</td>
119+
</tr>
120+
<tr>
121+
<td>Text Recognition</td>
122+
</tr>
123+
<tr>
124+
<td>Seal Text Detection</td>
125+
</tr>
126+
<tr>
127+
<td>Text Image Unrapping</td>
128+
</tr>
129+
<tr>
130+
<td>Document Image Orientation Classification</td>
131+
</tr>
132+
<tr>
133+
<td>Document-based Vision-Language Model</td>
134+
</tr>
101135
<tr>
102136
<td rowspan="5">General OCR</td>
103137
<td>Text Detection</td>
@@ -291,6 +325,59 @@ comments: true
291325
<tr>
292326
<td>Seal Text Detection</td>
293327
</tr>
328+
<tr>
329+
<td rowspan="13">General Layout Parsing v3</td>
330+
<td>Layout Detection Module</td>
331+
<td rowspan="13">Coming Soon</td>
332+
<td rowspan="13">Based on the General Layout Parsing v1 pipeline, the General Layout Parsing v3 pipeline enhances the capabilities of layout detection, table recognition, and formula recognition. It adds the ability to restore multi-column reading order and convert results into Markdown files. It performs exceptionally well in various document data and can handle more complex document data. This pipeline also provides flexible service deployment methods, supporting multiple programming languages on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
333+
<td rowspan="13">
334+
<ul>
335+
<li>Intelligent Document Analysis</li>
336+
<li>Document Digitization</li>
337+
<li>Page Structure Parsing</li>
338+
<li>Complex Table Recognition</li>
339+
<li>Large Model Data Construction</li>
340+
<li>RAG</li>
341+
</ul>
342+
</td>
343+
</tr>
344+
<tr>
345+
<td>Text Detection Module</td>
346+
</tr>
347+
<tr>
348+
<td>Text Recognition Module</td>
349+
</tr>
350+
<tr>
351+
<td>Doc Img Orientation Classification</td>
352+
</tr>
353+
<tr>
354+
<td>Text Image Unrapping Module</td>
355+
</tr>
356+
<tr>
357+
<td>Wired Table Structure Recognition Module</td>
358+
</tr>
359+
<tr>
360+
<td>Wireless Table Structure Recognition Module</td>
361+
</tr>
362+
<tr>
363+
<td>Table Classification Module</td>
364+
</tr>
365+
<tr>
366+
<td>Wired Table Cell Detection Module</td>
367+
</tr>
368+
<tr>
369+
<td>Wireless Table Cell Detection Module</td>
370+
</tr>
371+
<tr>
372+
<td>Text Line Orientation Classification Module</td>
373+
</tr>
374+
<tr>
375+
<td>Formula Recognition Module</td>
376+
</tr>
377+
<tr>
378+
<td>Seal Text Detection Module</td>
379+
</tr>
380+
294381
<tr>
295382
<td rowspan="4">Formula Recognition</td>
296383
<td>Formula Recognition</td>

docs/support_list/pipelines_list.md

Lines changed: 90 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,41 @@ comments: true
9898
<tr>
9999
<td>文档图像方向分类</td>
100100
</tr>
101+
<tr>
102+
<td rowspan = 8>文档场景信息抽取v4</td>
103+
<td>表格结构识别</td>
104+
<td rowspan = 8>comming soon</td>
105+
<td rowspan = 8>文档场景信息抽取v4(PP-ChatOCRv4)是飞桨特色的文档和图像智能分析解决方案,结合了 LLM、MLLM 和 OCR 技术,在文档场景信息抽取v3的基础上,优化了版面分析、生僻字、多页 pdf、表格、印章识别等常见的复杂文档信息抽取难点问题,结合文心大模型将海量数据和知识相融合,准确率高且应用广泛。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上部署。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
106+
</td>
107+
<td rowspan="8">
108+
<ul>
109+
<li>知识图谱的构建</li>
110+
<li>在线新闻和社交媒体中特定事件相关信息的检测</li>
111+
<li>学术文献中关键信息的抽取和分析(特别是需要对印章、扭曲图片、更复杂表格进行识别的场景)</li>
112+
</ul>
113+
</td>
114+
</tr>
115+
<tr>
116+
<td>版面区域检测</td>
117+
</tr>
118+
<tr>
119+
<td>文本检测</td>
120+
</tr>
121+
<tr>
122+
<td>文本识别</td>
123+
</tr>
124+
<tr>
125+
<td>印章文本检测</td>
126+
</tr>
127+
<tr>
128+
<td>文本图像矫正</td>
129+
</tr>
130+
<tr>
131+
<td>文档图像方向分类</td>
132+
</tr>
133+
<tr>
134+
<td>文档类视觉语言模型</td>
135+
</tr>
101136
<tr>
102137
<td rowspan = 5>通用OCR</td>
103138
<td>文本检测</td>
@@ -251,11 +286,11 @@ comments: true
251286
</ul></td>
252287
</tr>
253288
<tr>
254-
<td rowspan = 10>通用版面解析</td>
255-
<td>版面区域检测</td>
256-
<td rowspan = 10>暂无</td>
257-
<td rowspan = 10>版面解析是一种从文档图像中提取结构化信息的技术,主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别(OCR)、图像处理和机器学习算法,能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤,最终生成结构化的文档数据,提升数据处理的效率和准确性。</td>
258-
<td rowspan="10">
289+
<td rowspan = 9>通用版面解析</td>
290+
<td>版面区域检测模块</td>
291+
<td rowspan = 9>暂无</td>
292+
<td rowspan = 9>版面解析是一种从文档图像中提取结构化信息的技术,主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别(OCR)、图像处理和机器学习算法,能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤,最终生成结构化的文档数据,提升数据处理的效率和准确性。</td>
293+
<td rowspan="9">
259294
<ul>
260295
<li>金融与法律文档分析</li>
261296
<li>历史文献和档案数字化</li>
@@ -265,7 +300,44 @@ comments: true
265300
</td>
266301
</tr>
267302
<tr>
303+
<td>文本检测模块</td>
304+
</tr>
305+
<tr>
306+
<td>文本识别模块</td>
307+
</tr>
308+
<tr>
309+
<td>文档图像方向分类模块</td>
310+
</tr>
311+
<tr>
312+
<td>文本图像矫正模块</td>
313+
</tr>
314+
<tr>
315+
<td>表格结构识别模块</td>
316+
</tr>
317+
<tr>
318+
<td>文本行方向分类模块</td>
319+
</tr>
320+
<tr>
321+
<td>公式识别模块</td>
322+
</tr>
323+
<tr>
324+
<td>印章文本检测模块</td>
325+
</tr>
326+
<tr>
327+
<td rowspan = 13>通用版面解析v3</td>
268328
<td>版面区域检测模块</td>
329+
<td rowspan = 13>comming soon</td>
330+
<td rowspan = 13>通用版面解析v3产线在通用版面解析v1产线的基础上,强化了版面区域检测、表格识别、公式识别的能力,增加了多栏阅读顺序的恢复能力、结果转换 Markdown 文件的能力,在多种文档数据中,表现优异,可以处理较复杂的文档数据。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。</td>
331+
<td rowspan="13">
332+
<ul>
333+
<li>智能文档分析</li>
334+
<li>文档数字化</li>
335+
<li>页面结构解析</li>
336+
<li>复杂表格识别</li>
337+
<li>大模型数据构建</li>
338+
<li>RAG</li>
339+
</ul>
340+
</td>
269341
</tr>
270342
<tr>
271343
<td>文本检测模块</td>
@@ -280,7 +352,19 @@ comments: true
280352
<td>文本图像矫正模块</td>
281353
</tr>
282354
<tr>
283-
<td>表格结构识别模块</td>
355+
<td>有线表表格结构识别模块</td>
356+
</tr>
357+
<tr>
358+
<td>无线表表格结构识别模块</td>
359+
</tr>
360+
<tr>
361+
<td>表格分类模块</td>
362+
</tr>
363+
<tr>
364+
<td>有线表表格单元格检测模块</td>
365+
</tr>
366+
<tr>
367+
<td>无线表表格单元格检测模块</td>
284368
</tr>
285369
<tr>
286370
<td>文本行方向分类模块</td>

docs/support_list/pipelines_list_npu.en.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,71 @@ comments: true
6767
</ul>
6868
</td>
6969
</tr>
70+
<tr>
71+
<td rowspan = 7>PP-ChatOCRv3</td>
72+
<td>Table Structure Recognition</td>
73+
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">Online Experience</a></td>
74+
<td rowspan = 7>Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module.</td>
75+
<td rowspan="7">
76+
<ul>
77+
<li>Construction of knowledge graphs</li>
78+
<li>Detection of information related to specific events in online news and social media</li>
79+
<li>Extraction and analysis of key information in academic literature (especially in scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
80+
</ul>
81+
</td>
82+
</tr>
83+
<tr>
84+
<td>Layout Detection</td>
85+
</tr>
86+
<tr>
87+
<td>Text Detection</td>
88+
</tr>
89+
<tr>
90+
<td>Text Recognition</td>
91+
</tr>
92+
<tr>
93+
<td>Seal Text Detection</td>
94+
</tr>
95+
<tr>
96+
<td>Text Image Unrapping</td>
97+
</tr>
98+
<tr>
99+
<td>Document Image Orientation Classification</td>
100+
</tr>
101+
<tr>
102+
<td rowspan="8">PP-ChatOCRv4</td>
103+
<td>Table Structure Recognition</td>
104+
<td rowspan="8">Coming Soon</td>
105+
<td rowspan="8">Document Scene Information Extraction v4 (PP-ChatOCRv4) is a PaddlePaddle-featured intelligent analysis solution for documents and images, combining LLM, MLLM, and OCR technologies. Based on PP-ChatOCRv3, it optimizes common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. It integrates massive data and knowledge with the Ernie model, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment methods, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
106+
<td rowspan="8">
107+
<ul>
108+
<li>Knowledge Graph Construction</li>
109+
<li>Detection of Information Related to Specific Events in Online News and Social Media</li>
110+
<li>Extraction and Analysis of Key Information in Academic Literature (especially scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
111+
</ul>
112+
</td>
113+
</tr>
114+
<tr>
115+
<td>Layout Detection</td>
116+
</tr>
117+
<tr>
118+
<td>Text Detection</td>
119+
</tr>
120+
<tr>
121+
<td>Text Recognition</td>
122+
</tr>
123+
<tr>
124+
<td>Seal Text Detection</td>
125+
</tr>
126+
<tr>
127+
<td>Text Image Unrapping</td>
128+
</tr>
129+
<tr>
130+
<td>Document Image Orientation Classification</td>
131+
</tr>
132+
<tr>
133+
<td>Document-based Vision-Language Model</td>
134+
</tr>
70135
<tr>
71136
<td rowspan = 2>General OCR</td>
72137
<td >Text Detection</td>

docs/support_list/pipelines_list_npu.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,72 @@ comments: true
6767
</ul>
6868
</td>
6969
</tr>
70+
<tr>
71+
<td rowspan = 7>文档场景信息抽取v3</td>
72+
<td>表格结构识别</td>
73+
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">在线体验</a></td>
74+
<td rowspan = 7>文档图像场景信息抽取v3(PP-ChatOCRv3-doc)是飞桨特色的文档和图像智能分析解决方案,结合了 LLM 和 OCR 技术,一站式解决版面分析、生僻字、多页 pdf、表格、印章识别等常见的复杂文档信息抽取难点问题,结合文心大模型将海量数据和知识相融合,准确率高且应用广泛。开源版支持本地体验和本地部署,支持各个模块的微调训练。</td>
75+
<td rowspan="7">
76+
<ul>
77+
<li>知识图谱的构建</li>
78+
<li>在线新闻和社交媒体中特定事件相关信息的检测</li>
79+
<li>学术文献中关键信息的抽取和分析(特别是需要对印章、扭曲图片、更复杂表格进行识别的场景)</li>
80+
</ul>
81+
</td>
82+
</tr>
83+
<tr>
84+
<td>版面区域检测</td>
85+
</tr>
86+
<tr>
87+
<td>文本检测</td>
88+
</tr>
89+
<tr>
90+
<td>文本识别</td>
91+
</tr>
92+
<tr>
93+
<td>印章文本检测</td>
94+
</tr>
95+
<tr>
96+
<td>文本图像矫正</td>
97+
</tr>
98+
<tr>
99+
<td>文档图像方向分类</td>
100+
</tr>
101+
<tr>
102+
<td rowspan = 8>文档场景信息抽取v4</td>
103+
<td>表格结构识别</td>
104+
<td rowspan = 8>comming soon</td>
105+
<td rowspan = 8>文档场景信息抽取v4(PP-ChatOCRv4)是飞桨特色的文档和图像智能分析解决方案,结合了 LLM、MLLM 和 OCR 技术,在文档场景信息抽取v3的基础上,优化了版面分析、生僻字、多页 pdf、表格、印章识别等常见的复杂文档信息抽取难点问题,结合文心大模型将海量数据和知识相融合,准确率高且应用广泛。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上部署。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
106+
</td>
107+
<td rowspan="8">
108+
<ul>
109+
<li>知识图谱的构建</li>
110+
<li>在线新闻和社交媒体中特定事件相关信息的检测</li>
111+
<li>学术文献中关键信息的抽取和分析(特别是需要对印章、扭曲图片、更复杂表格进行识别的场景)</li>
112+
</ul>
113+
</td>
114+
</tr>
115+
<tr>
116+
<td>版面区域检测</td>
117+
</tr>
118+
<tr>
119+
<td>文本检测</td>
120+
</tr>
121+
<tr>
122+
<td>文本识别</td>
123+
</tr>
124+
<tr>
125+
<td>印章文本检测</td>
126+
</tr>
127+
<tr>
128+
<td>文本图像矫正</td>
129+
</tr>
130+
<tr>
131+
<td>文档图像方向分类</td>
132+
</tr>
133+
<tr>
134+
<td>文档类视觉语言模型</td>
135+
</tr>
70136
<tr>
71137
<td rowspan = 2>通用OCR</td>
72138
<td>文本检测</td>

0 commit comments

Comments
 (0)