Skip to content

Commit ae66f1c

Browse files
Merge pull request #10 from Unstructured-IO/chore/sync-2.8.0-release
chore: sync with upstream PaddleOCR 2.8.0 release
2 parents 225ff53 + 0182cf2 commit ae66f1c

11 files changed

+514
-12
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ PaddleOCR is being oversight by a [PMC](https://github.com/PaddlePaddle/PaddleOC
3737
⚠️ Note: The [Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) module is only for reporting program 🐞 bugs, for the rest of the questions, please move to the [Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions). Please note that if the Issue mentioned is not a bug, it will be moved to the Discussions module.
3838

3939
## 📣 Recent updates
40+
41+
- **🔥2024.7 Added PaddleOCR Algorithm Model Challenge Champion Solutions**:
42+
- Challenge One, OCR End-to-End Recognition Task Champion Solution: [Scene Text Recognition Algorithm-SVTRv2](doc/doc_ch/algorithm_rec_svtrv2.md);
43+
- Challenge Two, General Table Recognition Task Champion Solution: [Table Recognition Algorithm-SLANet-LCNetV2](doc/doc_ch/algorithm_table_slanet.md).
44+
4045
- **🔥2023.8.7 Release PaddleOCR[release/2.7](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7)**
4146
- Release [PP-OCRv4](./doc/doc_ch/PP-OCRv4_introduction.md), support mobile version and server version
4247
- PP-OCRv4-mobile:When the speed is comparable, the effect of the Chinese scene is improved by 4.5% compared with PP-OCRv3, the English scene is improved by 10%, and the average recognition accuracy of the 80-language multilingual model is increased by more than 8%.

README_ch.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,11 @@ PaddleOCR 由 [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122) 监
3131

3232
## 📣 近期更新
3333
- **📚直播和OCR实战打卡营预告**:《PP-ChatOCRv2赋能金融报告信息智能化抽取,新金融效率再升级》课程上线,破解复杂版面、表格识别、信息抽取OCR解析难题,直播时间:6月6日(周四)19:00。并于6月11日启动【政务采购合同信息抽取】实战打卡营。报名链接:https://www.wjx.top/vm/eBcYmqO.aspx?udsid=197406
34+
35+
- **🔥2024.7 添加 PaddleOCR 算法模型挑战赛冠军方案**
36+
- 赛题一:OCR 端到端识别任务冠军方案——[场景文本识别算法-SVTRv2](doc/doc_ch/algorithm_rec_svtrv2.md)
37+
- 赛题二:通用表格识别任务冠军方案——[表格识别算法-SLANet-LCNetV2](doc/doc_ch/algorithm_table_slanet.md)
38+
3439
- **🔥2024.5.10 上线星河零代码产线(OCR 相关)**:全面覆盖了以下四大 OCR 核心任务,提供极便捷的 Badcase 分析和实用的在线体验:
3540
- [通用 OCR](https://aistudio.baidu.com/community/app/91660) (PP-OCRv4)。
3641
- [通用表格识别](https://aistudio.baidu.com/community/app/91661) (SLANet)。
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
Global:
2+
debug: false
3+
use_gpu: true
4+
epoch_num: 200
5+
log_smooth_window: 20
6+
print_batch_step: 10
7+
save_model_dir: ./output/rec_repsvtr_ch
8+
save_epoch_step: 10
9+
eval_batch_step: [0, 1000]
10+
cal_metric_during_train: False
11+
pretrained_model:
12+
checkpoints:
13+
save_inference_dir:
14+
use_visualdl: false
15+
infer_img: doc/imgs_words/ch/word_1.jpg
16+
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
17+
max_text_length: &max_text_length 25
18+
infer_mode: false
19+
use_space_char: true
20+
distributed: true
21+
save_res_path: ./output/rec/predicts_repsvtr.txt
22+
23+
Optimizer:
24+
name: AdamW
25+
beta1: 0.9
26+
beta2: 0.999
27+
epsilon: 1.e-8
28+
weight_decay: 0.025
29+
no_weight_decay_name: norm
30+
one_dim_param_no_weight_decay: True
31+
lr:
32+
name: Cosine
33+
learning_rate: 0.001 # 8gpus 192bs
34+
warmup_epoch: 5
35+
36+
37+
Architecture:
38+
model_type: rec
39+
algorithm: SVTR_HGNet
40+
Transform:
41+
Backbone:
42+
name: RepSVTR
43+
Head:
44+
name: MultiHead
45+
head_list:
46+
- CTCHead:
47+
Neck:
48+
name: svtr
49+
dims: 256
50+
depth: 2
51+
hidden_dims: 256
52+
kernel_size: [1, 3]
53+
use_guide: True
54+
Head:
55+
fc_decay: 0.00001
56+
- NRTRHead:
57+
nrtr_dim: 384
58+
max_text_length: *max_text_length
59+
num_decoder_layers: 2
60+
61+
Loss:
62+
name: MultiLoss
63+
loss_config_list:
64+
- CTCLoss:
65+
- NRTRLoss:
66+
67+
PostProcess:
68+
name: CTCLabelDecode
69+
70+
Metric:
71+
name: RecMetric
72+
main_indicator: acc
73+
74+
75+
Train:
76+
dataset:
77+
name: MultiScaleDataSet
78+
ds_width: false
79+
data_dir: ./train_data/
80+
ext_op_transform_idx: 1
81+
label_file_list:
82+
- ./train_data/train_list.txt
83+
transforms:
84+
- DecodeImage:
85+
img_mode: BGR
86+
channel_first: false
87+
- RecAug:
88+
- MultiLabelEncode:
89+
gtc_encode: NRTRLabelEncode
90+
- KeepKeys:
91+
keep_keys:
92+
- image
93+
- label_ctc
94+
- label_gtc
95+
- length
96+
- valid_ratio
97+
sampler:
98+
name: MultiScaleSampler
99+
scales: [[320, 32], [320, 48], [320, 64]]
100+
first_bs: &bs 192
101+
fix_bs: false
102+
divided_factor: [8, 16] # w, h
103+
is_training: True
104+
loader:
105+
shuffle: true
106+
batch_size_per_card: *bs
107+
drop_last: true
108+
num_workers: 8
109+
Eval:
110+
dataset:
111+
name: SimpleDataSet
112+
data_dir: ./train_data
113+
label_file_list:
114+
- ./train_data/val_list.txt
115+
transforms:
116+
- DecodeImage:
117+
img_mode: BGR
118+
channel_first: false
119+
- MultiLabelEncode:
120+
gtc_encode: NRTRLabelEncode
121+
- RecResizeImg:
122+
image_shape: [3, 48, 320]
123+
- KeepKeys:
124+
keep_keys:
125+
- image
126+
- label_ctc
127+
- label_gtc
128+
- length
129+
- valid_ratio
130+
loader:
131+
shuffle: false
132+
drop_last: false
133+
batch_size_per_card: 128
134+
num_workers: 4
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
Global:
2+
debug: false
3+
use_gpu: true
4+
epoch_num: 200
5+
log_smooth_window: 20
6+
print_batch_step: 10
7+
save_model_dir: ./output/rec_svtrv2_ch
8+
save_epoch_step: 10
9+
eval_batch_step: [0, 1000]
10+
cal_metric_during_train: False
11+
pretrained_model:
12+
checkpoints:
13+
save_inference_dir:
14+
use_visualdl: false
15+
infer_img: doc/imgs_words/ch/word_1.jpg
16+
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
17+
max_text_length: &max_text_length 25
18+
infer_mode: false
19+
use_space_char: true
20+
distributed: true
21+
save_res_path: ./output/rec/predicts_svrtv2.txt
22+
23+
24+
Optimizer:
25+
name: AdamW
26+
beta1: 0.9
27+
beta2: 0.999
28+
epsilon: 1.e-8
29+
weight_decay: 0.05
30+
no_weight_decay_name: norm
31+
one_dim_param_no_weight_decay: True
32+
lr:
33+
name: Cosine
34+
learning_rate: 0.001 # 8gpus 192bs
35+
warmup_epoch: 5
36+
37+
38+
Architecture:
39+
model_type: rec
40+
algorithm: SVTR_HGNet
41+
Transform:
42+
Backbone:
43+
name: SVTRv2
44+
use_pos_embed: False
45+
dims: [128, 256, 384]
46+
depths: [6, 6, 6]
47+
num_heads: [4, 8, 12]
48+
mixer: [['Conv','Conv','Conv','Conv','Conv','Conv'],['Conv','Conv','Global','Global','Global','Global'],['Global','Global','Global','Global','Global','Global']]
49+
local_k: [[5, 5], [5, 5], [-1, -1]]
50+
sub_k: [[2, 1], [2, 1], [-1, -1]]
51+
last_stage: False
52+
use_pool: True
53+
Head:
54+
name: MultiHead
55+
head_list:
56+
- CTCHead:
57+
Neck:
58+
name: svtr
59+
dims: 256
60+
depth: 2
61+
hidden_dims: 256
62+
kernel_size: [1, 3]
63+
use_guide: True
64+
Head:
65+
fc_decay: 0.00001
66+
- NRTRHead:
67+
nrtr_dim: 384
68+
max_text_length: *max_text_length
69+
num_decoder_layers: 2
70+
71+
Loss:
72+
name: MultiLoss
73+
loss_config_list:
74+
- CTCLoss:
75+
- NRTRLoss:
76+
77+
PostProcess:
78+
name: CTCLabelDecode
79+
80+
Metric:
81+
name: RecMetric
82+
main_indicator: acc
83+
84+
Train:
85+
dataset:
86+
name: MultiScaleDataSet
87+
ds_width: false
88+
data_dir: ./train_data/
89+
ext_op_transform_idx: 1
90+
label_file_list:
91+
- ./train_data/train_list.txt
92+
transforms:
93+
- DecodeImage:
94+
img_mode: BGR
95+
channel_first: false
96+
- RecAug:
97+
- MultiLabelEncode:
98+
gtc_encode: NRTRLabelEncode
99+
- KeepKeys:
100+
keep_keys:
101+
- image
102+
- label_ctc
103+
- label_gtc
104+
- length
105+
- valid_ratio
106+
sampler:
107+
name: MultiScaleSampler
108+
scales: [[320, 32], [320, 48], [320, 64]]
109+
first_bs: &bs 192
110+
fix_bs: false
111+
divided_factor: [8, 16] # w, h
112+
is_training: True
113+
loader:
114+
shuffle: true
115+
batch_size_per_card: *bs
116+
drop_last: true
117+
num_workers: 8
118+
Eval:
119+
dataset:
120+
name: SimpleDataSet
121+
data_dir: ./train_data
122+
label_file_list:
123+
- ./train_data/val_list.txt
124+
transforms:
125+
- DecodeImage:
126+
img_mode: BGR
127+
channel_first: false
128+
- MultiLabelEncode:
129+
gtc_encode: NRTRLabelEncode
130+
- RecResizeImg:
131+
image_shape: [3, 48, 320]
132+
- KeepKeys:
133+
keep_keys:
134+
- image
135+
- label_ctc
136+
- label_gtc
137+
- length
138+
- valid_ratio
139+
loader:
140+
shuffle: false
141+
drop_last: false
142+
batch_size_per_card: 128
143+
num_workers: 4

0 commit comments

Comments
 (0)