PaddlePaddle · MatufA · Aug 11, 2023 · Aug 14, 2023 · Aug 16, 2023 · Aug 16, 2023
diff --git a/.github/ISSUE_TEMPLATE/custom.md b/.github/ISSUE_TEMPLATE/custom.md
@@ -0,0 +1,19 @@
+---
+name: Issue template
+about: Issue template for code error.
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
+
+- 系统环境/System Environment：
+- 版本号/Version：Paddle：  PaddleOCR： 问题相关组件/Related components：
+- 运行指令/Command Code：
+- 完整报错/Complete Error Message：
+
+我们提供了AceIssueSolver来帮助你解答问题，你是否想要它来解答(请填写yes/no)?/We provide AceIssueSolver to solve issues, do you want it? (Please write yes/no):
+
+请尽量不要包含图片在问题中/Please try to not include the image in the issue.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,15 @@
+### PR 类型 PR types
+<!-- One of [ New features | Bug fixes | Function optimization | Performance optimization | Breaking changes | Others ] -->
+
+### PR 变化内容类型 PR changes
+<!-- One of [ Models | APIs | Docs | Others ] -->
+
+### 描述 Description
+<!-- Describe what this PR does -->
+
+### 提PR之前的检查 Check-list
+
+- [ ] 这个 PR 是提交到dygraph分支或者是一个cherry-pick，否则请先提交到dygarph分支。
+      This PR is pushed to the dygraph branch or cherry-picked from the dygraph branch. Otherwise, please push your changes to the dygraph branch.
+- [ ] 这个PR清楚描述了功能，帮助评审能提升效率。This PR have fully described what it does such that reviewers can speedup.
+- [ ] 这个PR已经经过本地测试。This PR can be convered by current tests or already test locally by you.
diff --git a/README.md b/README.md
diff --git a/README_en.md b/README_en.md
@@ -73,9 +73,30 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
 
 > It is recommended to start with the “quick experience” in the document tutorial
 
-## ⚡ [Quick Start](https://paddlepaddle.github.io/PaddleOCR/latest/en/quick_start.html)
+
+## ⚡ Quick Experience
+
+- Web online experience
+    - PP-OCRv4 online experience：https://aistudio.baidu.com/application/detail/7658
+    - PP-ChatOCR online experience：https://aistudio.baidu.com/application/detail/7709
+
+- One line of code quick use: [Quick Start（Chinese/English/Multilingual/Document Analysis](./doc/doc_en/quickstart_en.md)
+- Full-process experience of training, inference, and high-performance deployment in the Paddle AI suite (PaddleX)：
+    - PP-OCRv4：https://aistudio.baidu.com/projectdetail/paddlex/6796224
+    - PP-ChatOCR：https://aistudio.baidu.com/projectdetail/paddlex/6796372
+- Mobile demo experience：[Installation DEMO](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(Based on EasyEdge and Paddle-Lite, support iOS and Android systems)
+
+<a name="Technical exchange and cooperation"></a>
 
 ## 📖 Technical exchange and cooperation
+- PaddleX —— A one-stop development platform for practical models of selected industries. Includes the following features:
+* [High-quality algorithm library] Contains 36 selected models in 10 major task areas, enabling the development of model algorithms for different tasks in one platform. More domain models continue to be enriched! PaddleX also provides complete model training and inference benchmark data, allowing developers to choose the most appropriate model based on business needs.
+* [Simple development method] Toolbox/developer dual-mode linkage, no-code + low-code development method, complete the full process of AI development of data, training, verification, and deployment in four steps.
+* [Efficient training deployment] Precipitate the best tuning strategy of Baidu algorithm team to achieve the fastest and optimal convergence of each model. Complete deployment SDK support enables rapid industrial-level deployment across platforms and hardware (service-based deployment capabilities are being improved).
+* [Rich domestic hardware support] In addition to being used on the AIStudio cloud, PaddleX has also precipitated the Windows local side and is enriching the Linux version, Kunlun Core version, Ascend version, and Cambrian version.
+* [Win-win joint creation and co-construction] In addition to conveniently developing AI applications, PaddleX also provides everyone with opportunities to obtain business benefits and explore more business space for enterprises.
+
+PaddleX Official website address：https://www.paddlepaddle.org.cn/paddle/paddleX
 
 PaddleX provides a one-stop full-process high-efficiency development platform for flying paddle ecological model training, pressure, and push. Its mission is to help AI technology quickly land, and its vision is to make everyone an AI Developer!
 

diff --git a/deploy/cpp_infer/src/paddlestructure.cpp b/deploy/cpp_infer/src/paddlestructure.cpp
@@ -152,9 +152,9 @@ std::string PaddleStructure::rebuild_table(
     ocr_box[3] += 1;
     std::vector<std::vector<float>> dis_list(structure_boxes.size(),
                                              std::vector<float>(3, 100000.0));
-    for (size_t j = 0; j < structure_boxes.size(); ++j) {
+    for (int j = 0; j < structure_boxes.size(); j++) {
       if (structure_boxes[j].size() == 8) {
-        structure_box = std::move(Utility::xyxyxyxy2xyxy(structure_boxes[j]));
+        structure_box = Utility::xyxyxyxy2xyxy(structure_boxes[j]);
       } else {
         structure_box = structure_boxes[j];
       }

diff --git a/deploy/paddle2onnx/readme.md b/deploy/paddle2onnx/readme.md
@@ -75,7 +75,8 @@ paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer \
 --model_filename inference.pdmodel \
 --params_filename inference.pdiparams \
 --save_file ./inference/cls_onnx/model.onnx \
---opset_version 11 \
+--opset_version 10 \
+--input_shape_dict="{'x':[-1,3,-1,-1]}" \
 --enable_onnx_checker True
 ```
 After execution, the ONNX model will be saved in `./inference/det_onnx/`, `./inference/rec_onnx/`, `./inference/cls_onnx/` paths respectively

diff --git a/deploy/paddle2onnx/readme_ch.md b/deploy/paddle2onnx/readme_ch.md
@@ -74,7 +74,8 @@ paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer \
 --model_filename inference.pdmodel \
 --params_filename inference.pdiparams \
 --save_file ./inference/cls_onnx/model.onnx \
---opset_version 11 \
+--opset_version 10 \
+--input_shape_dict="{'x':[-1,3,-1,-1]}" \
 --enable_onnx_checker True
 ```
 

diff --git a/docs/algorithm/kie/algorithm_kie_vi_layoutxlm.en.md b/docs/algorithm/kie/algorithm_kie_vi_layoutxlm.en.md
@@ -17,11 +17,15 @@ On XFUND_zh dataset, the algorithm reproduction Hmean is as follows.
 
 ## 2. Environment
 
-Please refer to ["Environment Preparation"](../../ppocr/environment.en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](../../ppocr/blog/clone.en.md)to clone the project code.
+## 2. Environment
+
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
+
 
 ## 3. Model Training / Evaluation / Prediction
 
-Please refer to [KIE tutorial](../../ppocr/model_train/kie.en.md). PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.
+Please refer to [KIE tutorial](./kie_en.md). PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models.
+
 
 ## 4. Inference and Deployment
 

diff --git a/docs/ppocr/blog/PP-OCRv4_introduction.md b/docs/ppocr/blog/PP-OCRv4_introduction.md
@@ -109,15 +109,22 @@ Lite-Neck整体结构沿用PP-OCRv3版本的结构，在参数上稍作精简，
 
 GTC（Guided Training of CTC），是PP-OCRv3识别模型的最有效的策略之一，融合多种文本特征的表达，有效的提升文本识别精度。在PP-OCRv4中使用训练更稳定的Transformer模型NRTR作为指导分支，相比V3版本中的SAR基于循环神经网络的结构，NRTR基于Transformer实现解码过程泛化能力更强，能有效指导CTC分支学习，解决简单场景下快速过拟合的问题。使用Lite-Neck和GTC-NRTR两个策略，识别精度提升至73.21%(+0.5%)。
 
-![img](./images/ppocrv4_gtc.png)
+GTC（Guided Training of CTC），是PP-OCRv3识别模型的最有效的策略之一，融合多种文本特征的表达，有效的提升文本识别精度。在PP-OCRv4中使用训练更稳定的Transformer模型NRTR作为指导分支，相比V3版本中的SAR基于循环神经网络的结构，NRTR基于Transformer实现解码过程泛化能力更强，能有效指导CTC分支学习，解决简单场景下快速过拟合的问题。使用Lite-Neck和GTC-NRTR两个策略，识别精度提升至73.21%(+0.5%)。
 
 ### （5）Multi-Scale：多尺度训练策略
 
 动态尺度训练策略，是在训练过程中随机resize输入图片的高度，以增强识别模型在端到端串联使用时的鲁棒性。在训练时，每个iter从（32，48，64）三种高度中随机选择一种高度进行resize。实验证明，使用该策略，尽管在识别测试集上准确率没有提升，但在端到端串联评估时，指标提升0.5%。
 
 ![img](./images/multi_scale.png)
 
-### （6）DKD：蒸馏策略
+动态尺度训练策略，是在训练过程中随机resize输入图片的高度，以增强识别模型在端到端串联使用时的鲁棒性。在训练时，每个iter从（32，48，64）三种高度中随机选择一种高度进行resize。实验证明，使用该策略，尽管在识别测试集上准确率没有提升，但在端到端串联评估时，指标提升0.5%。
+
+<div align="center">
+    <img src="../ppocr_v4/multi_scale.png" width="500">
+</div>
+
+
+**（6）DKD：蒸馏策略**
 
 识别模型的蒸馏包含两个部分，NRTRhead蒸馏和CTCHead蒸馏;
 

diff --git a/paddleocr.py b/paddleocr.py
@@ -62,6 +62,8 @@ def _import_file(module_name, file_path, make_importable=False):
     confirm_model_dir_url,
 )
 from tools.infer import predict_system
+from ppocr.utils.utility import check_and_read, get_image_file_list, alpha_to_color, binarize_img
+from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url
 from tools.infer.utility import draw_ocr, str2bool, check_gpu
 from ppstructure.utility import init_args, draw_structure_result
 from ppstructure.predict_system import StructureSystem, save_structure_res, to_excel
@@ -83,9 +85,10 @@ def _import_file(module_name, file_path, make_importable=False):
     "convert_info_markdown",
 ]
 
-SUPPORT_DET_MODEL = ["DB"]
-SUPPORT_REC_MODEL = ["CRNN", "SVTR_LCNet"]
-BASE_DIR = os.environ.get("PADDLE_OCR_BASE_DIR", os.path.expanduser("~/.paddleocr/"))
+SUPPORT_DET_MODEL = ['DB']
+VERSION = '2.7.0.3'
+SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet']
+BASE_DIR = os.path.expanduser("~/.paddleocr/")
 
 DEFAULT_OCR_MODEL_VERSION = "PP-OCRv4"
 SUPPORT_OCR_MODEL_VERSION = ["PP-OCR", "PP-OCRv2", "PP-OCRv3", "PP-OCRv4"]
@@ -693,44 +696,24 @@ def __init__(self, **kwargs):
         super().__init__(params)
         self.page_num = params.page_num
 
-    def ocr(
-        self,
-        img,
-        det=True,
-        rec=True,
-        cls=True,
-        bin=False,
-        inv=False,
-        alpha_color=(255, 255, 255),
-        slice={},
-    ):
+    def ocr(self,
+            img,
+            det=True,
+            rec=True,
+            cls=True,
+            bin=False,
+            inv=False,
+            alpha_color=(255, 255, 255)):
         """
         OCR with PaddleOCR
-
-        Args:
-            img: Image for OCR. It can be an ndarray, img_path, or a list of ndarrays.
-            det: Use text detection or not. If False, only text recognition will be executed. Default is True.
-            rec: Use text recognition or not. If False, only text detection will be executed. Default is True.
-            cls: Use angle classifier or not. Default is True. If True, the text with a rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance.
-            bin: Binarize image to black and white. Default is False.
-            inv: Invert image colors. Default is False.
-            alpha_color: Set RGB color Tuple for transparent parts replacement. Default is pure white.
-            slice: Use sliding window inference for large images. Both det and rec must be True. Requires int values for slice["horizontal_stride"], slice["vertical_stride"], slice["merge_x_thres"], slice["merge_y_thres"] (See doc/doc_en/slice_en.md). Default is {}.
-
-        Returns:
-            If both det and rec are True, returns a list of OCR results for each image. Each OCR result is a list of bounding boxes and recognized text for each detected text region.
-            If det is True and rec is False, returns a list of detected bounding boxes for each image.
-            If det is False and rec is True, returns a list of recognized text for each image.
-            If both det and rec are False, returns a list of angle classification results for each image.
-
-        Raises:
-            AssertionError: If the input image is not of type ndarray, list, str, or bytes.
-            SystemExit: If det is True and the input is a list of images.
-
-        Note:
-            - If the angle classifier is not initialized (use_angle_cls=False), it will not be used during the forward process.
-            - For PDF files, if the input is a list of images and the page_num is specified, only the first page_num images will be processed.
-            - The preprocess_image function is used to preprocess the input image by applying alpha color replacement, inversion, and binarization if specified.
+        args：
+            img: img for OCR, support ndarray, img_path and list or ndarray
+            det: use text detection or not. If False, only rec will be exec. Default is True
+            rec: use text recognition or not. If False, only det will be exec. Default is True
+            cls: use angle classifier or not. Default is True. If True, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
+            bin: binarize image to black and white. Default is False.
+            inv: invert image colors. Default is False.
+            alpha_color: set RGB color Tuple for transparent parts replacement. Default is pure white.
         """
         assert (
             det or rec or cls
@@ -741,7 +724,7 @@ def ocr(
             exit(0)
         if cls == True and self.use_angle_cls == False:
             logger.warning(
-                "Since the angle classifier is not initialized, it will not be used during the forward process"
+                'Since the angle classifier is not initialized, it will not be used during the forward process'
             )
 
         img, flag_gif, flag_pdf = check_img(img, alpha_color)
@@ -764,21 +747,22 @@ def preprocess_image(_image):
 
         if det and rec:
             ocr_res = []
-            for img in imgs:
+            for idx, img in enumerate(imgs):
                 img = preprocess_image(img)
-                dt_boxes, rec_res, _ = self.__call__(img, cls, slice)
+                dt_boxes, rec_res, _ = self.__call__(img, cls)
                 if not dt_boxes and not rec_res:
                     ocr_res.append(None)
                     continue
-                tmp_res = [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)]
+                tmp_res = [[box.tolist(), res]
+                           for box, res in zip(dt_boxes, rec_res)]
                 ocr_res.append(tmp_res)
             return ocr_res
         elif det and not rec:
             ocr_res = []
-            for img in imgs:
+            for idx, img in enumerate(imgs):
                 img = preprocess_image(img)
                 dt_boxes, elapse = self.text_detector(img)
-                if dt_boxes.size == 0:
+                if not dt_boxes:
                     ocr_res.append(None)
                     continue
                 tmp_res = [box.tolist() for box in dt_boxes]
@@ -973,18 +957,16 @@ def main():
         raise NotImplementedError
 
     for img_path in image_file_list:
-        img_name = os.path.basename(img_path).split(".")[0]
-        logger.info("{}{}{}".format("*" * 10, img_path, "*" * 10))
-        if args.type == "ocr":
-            result = engine.ocr(
-                img_path,
-                det=args.det,
-                rec=args.rec,
-                cls=args.use_angle_cls,
-                bin=args.binarize,
-                inv=args.invert,
-                alpha_color=args.alphacolor,
-            )
+        img_name = os.path.basename(img_path).split('.')[0]
+        logger.info('{}{}{}'.format('*' * 10, img_path, '*' * 10))
+        if args.type == 'ocr':
+            result = engine.ocr(img_path,
+                                det=args.det,
+                                rec=args.rec,
+                                cls=args.use_angle_cls,
+                                bin=args.binarize,
+                                inv=args.invert,
+                                alpha_color=args.alphacolor)
             if result is not None:
                 lines = []
                 for res in result: