Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddingLanguage - Hebrew #11625

Closed
wants to merge 43 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
32d2a5f
Update PP-OCRv4_introduction.md
tink2123 Aug 11, 2023
e8401fd
Update PP-OCRv4_introduction.md (#10616)
Topdu Aug 14, 2023
fe5e02e
Update README.md
dyning Aug 16, 2023
f5ac6e9
Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:Release/2.7 (#10655)
UserUnknownFactor Aug 16, 2023
f2ad7aa
Update requirements.txt (#10656)
itasli Aug 17, 2023
521a640
[TIPC]update xpu tipc script (#10658)
USTCKAY Aug 17, 2023
afd02e5
fix-typo (#10642)
dvorst Aug 17, 2023
234757f
Fix fitz camelCase deprecation and .PDF not being recognized as pdf f…
itasli Aug 23, 2023
38ba5f7
fix undefined save_model_dir when using wandb (#10251) (#10709)
itasli Aug 23, 2023
865f6a8
Update custom.md
shiyutang Aug 24, 2023
b1b92cd
fix: release memory after predict (#10688)
chungchamchi19 Aug 30, 2023
cbc31f5
Update setup.py (#10749)
xiezheng-XD Sep 1, 2023
5aa6cff
Update v4 det model path in README.md (#10653)
tengxniu Sep 1, 2023
0a4f470
Update algorithm_kie_vi_layoutxlm_en.md (#10715)
sagarjgb Sep 5, 2023
2521f91
Update algorithm_kie_vi_layoutxlm_en.md (#10716)
sagarjgb Sep 5, 2023
8cf4147
update paddlex intro (#10827)
tink2123 Sep 7, 2023
e3ae45c
update whl (#10916)
andyjiang1116 Sep 15, 2023
0a65ba0
🐞 fix(cpp_infer): 修复rebuild_table的bug (#10810)
JIANG3330 Sep 21, 2023
10434c1
add finnish language files (#10850)
savikko Sep 22, 2023
65247c0
update url for aistudio
tink2123 Sep 25, 2023
13444f7
Update README_en.md
tink2123 Sep 25, 2023
aaec152
Update README.md
tink2123 Sep 28, 2023
07a483f
Update README_en.md
tink2123 Sep 28, 2023
0ee4452
Update README.md
tink2123 Sep 28, 2023
d20d4ff
Update README_en.md
tink2123 Sep 28, 2023
7e6efcd
Update ONNX conversion readme_ch.md (#11030)
greyovo Oct 12, 2023
1bf04ad
Create pull_request_template.md
shiyutang Oct 12, 2023
0839dc4
[TIPC]update tipc scripts and rm fluid api (#11098)
USTCKAY Oct 19, 2023
1ef71d8
Update README.md
tink2123 Nov 3, 2023
cd09b96
Update README.md
tink2123 Nov 3, 2023
8003222
Update README.md
tink2123 Nov 16, 2023
19aac33
Update README.md
tink2123 Nov 16, 2023
60950d4
Update README.md
tink2123 Nov 16, 2023
ac819aa
Update README.md
dyning Nov 17, 2023
a3a07f4
Update README.md
dyning Nov 17, 2023
468f482
Update README.md
tink2123 Nov 22, 2023
ed2ca9c
Update README.md
tink2123 Dec 1, 2023
aa8b4c2
Update README.md
tink2123 Dec 6, 2023
dbe0857
Update README.md
tink2123 Dec 11, 2023
231218a
Modify readme 27 (#11424)
zhangyubo0722 Dec 28, 2023
dc6559a
fix:layout recovery image:xxx.png,err msg: list index out of range (#…
santlchogva Jan 2, 2024
7c89799
Update README.md
tink2123 Jan 17, 2024
cc8f9e6
Create heb_dict.txt
MatufA Feb 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/custom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
name: Issue template
about: Issue template for code error.
title: ''
labels: ''
assignees: ''

---

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

- 系统环境/System Environment:
- 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:
- 运行指令/Command Code:
- 完整报错/Complete Error Message:

我们提供了AceIssueSolver来帮助你解答问题,你是否想要它来解答(请填写yes/no)?/We provide AceIssueSolver to solve issues, do you want it? (Please write yes/no):

请尽量不要包含图片在问题中/Please try to not include the image in the issue.
15 changes: 15 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
### PR 类型 PR types
<!-- One of [ New features | Bug fixes | Function optimization | Performance optimization | Breaking changes | Others ] -->

### PR 变化内容类型 PR changes
<!-- One of [ Models | APIs | Docs | Others ] -->

### 描述 Description
<!-- Describe what this PR does -->

### 提PR之前的检查 Check-list

- [ ] 这个 PR 是提交到dygraph分支或者是一个cherry-pick,否则请先提交到dygarph分支。
This PR is pushed to the dygraph branch or cherry-picked from the dygraph branch. Otherwise, please push your changes to the dygraph branch.
- [ ] 这个PR清楚描述了功能,帮助评审能提升效率。This PR have fully described what it does such that reviewers can speedup.
- [ ] 这个PR已经经过本地测试。This PR can be convered by current tests or already test locally by you.
240 changes: 198 additions & 42 deletions README.md

Large diffs are not rendered by default.

23 changes: 22 additions & 1 deletion README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,30 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel

> It is recommended to start with the “quick experience” in the document tutorial

## ⚡ [Quick Start](https://paddlepaddle.github.io/PaddleOCR/latest/en/quick_start.html)

## ⚡ Quick Experience

- Web online experience
- PP-OCRv4 online experience:https://aistudio.baidu.com/application/detail/7658
- PP-ChatOCR online experience:https://aistudio.baidu.com/application/detail/7709

- One line of code quick use: [Quick Start(Chinese/English/Multilingual/Document Analysis](./doc/doc_en/quickstart_en.md)
- Full-process experience of training, inference, and high-performance deployment in the Paddle AI suite (PaddleX):
- PP-OCRv4:https://aistudio.baidu.com/projectdetail/paddlex/6796224
- PP-ChatOCR:https://aistudio.baidu.com/projectdetail/paddlex/6796372
- Mobile demo experience:[Installation DEMO](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(Based on EasyEdge and Paddle-Lite, support iOS and Android systems)

<a name="Technical exchange and cooperation"></a>

## 📖 Technical exchange and cooperation
- PaddleX —— A one-stop development platform for practical models of selected industries. Includes the following features:
* [High-quality algorithm library] Contains 36 selected models in 10 major task areas, enabling the development of model algorithms for different tasks in one platform. More domain models continue to be enriched! PaddleX also provides complete model training and inference benchmark data, allowing developers to choose the most appropriate model based on business needs.
* [Simple development method] Toolbox/developer dual-mode linkage, no-code + low-code development method, complete the full process of AI development of data, training, verification, and deployment in four steps.
* [Efficient training deployment] Precipitate the best tuning strategy of Baidu algorithm team to achieve the fastest and optimal convergence of each model. Complete deployment SDK support enables rapid industrial-level deployment across platforms and hardware (service-based deployment capabilities are being improved).
* [Rich domestic hardware support] In addition to being used on the AIStudio cloud, PaddleX has also precipitated the Windows local side and is enriching the Linux version, Kunlun Core version, Ascend version, and Cambrian version.
* [Win-win joint creation and co-construction] In addition to conveniently developing AI applications, PaddleX also provides everyone with opportunities to obtain business benefits and explore more business space for enterprises.

PaddleX Official website address:https://www.paddlepaddle.org.cn/paddle/paddleX

PaddleX provides a one-stop full-process high-efficiency development platform for flying paddle ecological model training, pressure, and push. Its mission is to help AI technology quickly land, and its vision is to make everyone an AI Developer!

Expand Down
4 changes: 2 additions & 2 deletions deploy/cpp_infer/src/paddlestructure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,9 @@ std::string PaddleStructure::rebuild_table(
ocr_box[3] += 1;
std::vector<std::vector<float>> dis_list(structure_boxes.size(),
std::vector<float>(3, 100000.0));
for (size_t j = 0; j < structure_boxes.size(); ++j) {
for (int j = 0; j < structure_boxes.size(); j++) {
if (structure_boxes[j].size() == 8) {
structure_box = std::move(Utility::xyxyxyxy2xyxy(structure_boxes[j]));
structure_box = Utility::xyxyxyxy2xyxy(structure_boxes[j]);
} else {
structure_box = structure_boxes[j];
}
Expand Down
3 changes: 2 additions & 1 deletion deploy/paddle2onnx/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./inference/cls_onnx/model.onnx \
--opset_version 11 \
--opset_version 10 \
--input_shape_dict="{'x':[-1,3,-1,-1]}" \
--enable_onnx_checker True
```
After execution, the ONNX model will be saved in `./inference/det_onnx/`, `./inference/rec_onnx/`, `./inference/cls_onnx/` paths respectively
Expand Down
3 changes: 2 additions & 1 deletion deploy/paddle2onnx/readme_ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./inference/cls_onnx/model.onnx \
--opset_version 11 \
--opset_version 10 \
--input_shape_dict="{'x':[-1,3,-1,-1]}" \
--enable_onnx_checker True
```

Expand Down
8 changes: 6 additions & 2 deletions docs/algorithm/kie/algorithm_kie_vi_layoutxlm.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,15 @@ On XFUND_zh dataset, the algorithm reproduction Hmean is as follows.

## 2. Environment

Please refer to ["Environment Preparation"](../../ppocr/environment.en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](../../ppocr/blog/clone.en.md)to clone the project code.
## 2. Environment

Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.


## 3. Model Training / Evaluation / Prediction

Please refer to [KIE tutorial](../../ppocr/model_train/kie.en.md). PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.
Please refer to [KIE tutorial](./kie_en.md). PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.


## 4. Inference and Deployment

Expand Down
11 changes: 9 additions & 2 deletions docs/ppocr/blog/PP-OCRv4_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,15 +109,22 @@ Lite-Neck整体结构沿用PP-OCRv3版本的结构,在参数上稍作精简,

GTC(Guided Training of CTC),是PP-OCRv3识别模型的最有效的策略之一,融合多种文本特征的表达,有效的提升文本识别精度。在PP-OCRv4中使用训练更稳定的Transformer模型NRTR作为指导分支,相比V3版本中的SAR基于循环神经网络的结构,NRTR基于Transformer实现解码过程泛化能力更强,能有效指导CTC分支学习,解决简单场景下快速过拟合的问题。使用Lite-Neck和GTC-NRTR两个策略,识别精度提升至73.21%(+0.5%)。

![img](./images/ppocrv4_gtc.png)
GTC(Guided Training of CTC),是PP-OCRv3识别模型的最有效的策略之一,融合多种文本特征的表达,有效的提升文本识别精度。在PP-OCRv4中使用训练更稳定的Transformer模型NRTR作为指导分支,相比V3版本中的SAR基于循环神经网络的结构,NRTR基于Transformer实现解码过程泛化能力更强,能有效指导CTC分支学习,解决简单场景下快速过拟合的问题。使用Lite-Neck和GTC-NRTR两个策略,识别精度提升至73.21%(+0.5%)。

### (5)Multi-Scale:多尺度训练策略

动态尺度训练策略,是在训练过程中随机resize输入图片的高度,以增强识别模型在端到端串联使用时的鲁棒性。在训练时,每个iter从(32,48,64)三种高度中随机选择一种高度进行resize。实验证明,使用该策略,尽管在识别测试集上准确率没有提升,但在端到端串联评估时,指标提升0.5%。

![img](./images/multi_scale.png)

### (6)DKD:蒸馏策略
动态尺度训练策略,是在训练过程中随机resize输入图片的高度,以增强识别模型在端到端串联使用时的鲁棒性。在训练时,每个iter从(32,48,64)三种高度中随机选择一种高度进行resize。实验证明,使用该策略,尽管在识别测试集上准确率没有提升,但在端到端串联评估时,指标提升0.5%。

<div align="center">
<img src="../ppocr_v4/multi_scale.png" width="500">
</div>


**(6)DKD:蒸馏策略**

识别模型的蒸馏包含两个部分,NRTRhead蒸馏和CTCHead蒸馏;

Expand Down
96 changes: 39 additions & 57 deletions paddleocr.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ def _import_file(module_name, file_path, make_importable=False):
confirm_model_dir_url,
)
from tools.infer import predict_system
from ppocr.utils.utility import check_and_read, get_image_file_list, alpha_to_color, binarize_img
from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url
from tools.infer.utility import draw_ocr, str2bool, check_gpu
from ppstructure.utility import init_args, draw_structure_result
from ppstructure.predict_system import StructureSystem, save_structure_res, to_excel
Expand All @@ -83,9 +85,10 @@ def _import_file(module_name, file_path, make_importable=False):
"convert_info_markdown",
]

SUPPORT_DET_MODEL = ["DB"]
SUPPORT_REC_MODEL = ["CRNN", "SVTR_LCNet"]
BASE_DIR = os.environ.get("PADDLE_OCR_BASE_DIR", os.path.expanduser("~/.paddleocr/"))
SUPPORT_DET_MODEL = ['DB']
VERSION = '2.7.0.3'
SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet']
BASE_DIR = os.path.expanduser("~/.paddleocr/")

DEFAULT_OCR_MODEL_VERSION = "PP-OCRv4"
SUPPORT_OCR_MODEL_VERSION = ["PP-OCR", "PP-OCRv2", "PP-OCRv3", "PP-OCRv4"]
Expand Down Expand Up @@ -693,44 +696,24 @@ def __init__(self, **kwargs):
super().__init__(params)
self.page_num = params.page_num

def ocr(
self,
img,
det=True,
rec=True,
cls=True,
bin=False,
inv=False,
alpha_color=(255, 255, 255),
slice={},
):
def ocr(self,
img,
det=True,
rec=True,
cls=True,
bin=False,
inv=False,
alpha_color=(255, 255, 255)):
"""
OCR with PaddleOCR

Args:
img: Image for OCR. It can be an ndarray, img_path, or a list of ndarrays.
det: Use text detection or not. If False, only text recognition will be executed. Default is True.
rec: Use text recognition or not. If False, only text detection will be executed. Default is True.
cls: Use angle classifier or not. Default is True. If True, the text with a rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance.
bin: Binarize image to black and white. Default is False.
inv: Invert image colors. Default is False.
alpha_color: Set RGB color Tuple for transparent parts replacement. Default is pure white.
slice: Use sliding window inference for large images. Both det and rec must be True. Requires int values for slice["horizontal_stride"], slice["vertical_stride"], slice["merge_x_thres"], slice["merge_y_thres"] (See doc/doc_en/slice_en.md). Default is {}.

Returns:
If both det and rec are True, returns a list of OCR results for each image. Each OCR result is a list of bounding boxes and recognized text for each detected text region.
If det is True and rec is False, returns a list of detected bounding boxes for each image.
If det is False and rec is True, returns a list of recognized text for each image.
If both det and rec are False, returns a list of angle classification results for each image.

Raises:
AssertionError: If the input image is not of type ndarray, list, str, or bytes.
SystemExit: If det is True and the input is a list of images.

Note:
- If the angle classifier is not initialized (use_angle_cls=False), it will not be used during the forward process.
- For PDF files, if the input is a list of images and the page_num is specified, only the first page_num images will be processed.
- The preprocess_image function is used to preprocess the input image by applying alpha color replacement, inversion, and binarization if specified.
args:
img: img for OCR, support ndarray, img_path and list or ndarray
det: use text detection or not. If False, only rec will be exec. Default is True
rec: use text recognition or not. If False, only det will be exec. Default is True
cls: use angle classifier or not. Default is True. If True, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
bin: binarize image to black and white. Default is False.
inv: invert image colors. Default is False.
alpha_color: set RGB color Tuple for transparent parts replacement. Default is pure white.
"""
assert (
det or rec or cls
Expand All @@ -741,7 +724,7 @@ def ocr(
exit(0)
if cls == True and self.use_angle_cls == False:
logger.warning(
"Since the angle classifier is not initialized, it will not be used during the forward process"
'Since the angle classifier is not initialized, it will not be used during the forward process'
)

img, flag_gif, flag_pdf = check_img(img, alpha_color)
Expand All @@ -764,21 +747,22 @@ def preprocess_image(_image):

if det and rec:
ocr_res = []
for img in imgs:
for idx, img in enumerate(imgs):
img = preprocess_image(img)
dt_boxes, rec_res, _ = self.__call__(img, cls, slice)
dt_boxes, rec_res, _ = self.__call__(img, cls)
if not dt_boxes and not rec_res:
ocr_res.append(None)
continue
tmp_res = [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)]
tmp_res = [[box.tolist(), res]
for box, res in zip(dt_boxes, rec_res)]
ocr_res.append(tmp_res)
return ocr_res
elif det and not rec:
ocr_res = []
for img in imgs:
for idx, img in enumerate(imgs):
img = preprocess_image(img)
dt_boxes, elapse = self.text_detector(img)
if dt_boxes.size == 0:
if not dt_boxes:
ocr_res.append(None)
continue
tmp_res = [box.tolist() for box in dt_boxes]
Expand Down Expand Up @@ -973,18 +957,16 @@ def main():
raise NotImplementedError

for img_path in image_file_list:
img_name = os.path.basename(img_path).split(".")[0]
logger.info("{}{}{}".format("*" * 10, img_path, "*" * 10))
if args.type == "ocr":
result = engine.ocr(
img_path,
det=args.det,
rec=args.rec,
cls=args.use_angle_cls,
bin=args.binarize,
inv=args.invert,
alpha_color=args.alphacolor,
)
img_name = os.path.basename(img_path).split('.')[0]
logger.info('{}{}{}'.format('*' * 10, img_path, '*' * 10))
if args.type == 'ocr':
result = engine.ocr(img_path,
det=args.det,
rec=args.rec,
cls=args.use_angle_cls,
bin=args.binarize,
inv=args.invert,
alpha_color=args.alphacolor)
if result is not None:
lines = []
for res in result:
Expand Down
Loading