Skip to content

Commit 3963b96

Browse files
authored
Merge pull request #2091 from opendatalab/release-1.3.0
Release 1.3.0
2 parents 41d96cd + 1cd5012 commit 3963b96

File tree

2 files changed

+25
-20
lines changed

2 files changed

+25
-20
lines changed

README.md

+15-12
Original file line numberDiff line numberDiff line change
@@ -47,20 +47,23 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
4747
</div>
4848

4949
# Changelog
50-
- 2025/04/03 Release of version 1.3.0, with many changes in this version:
50+
- 2025/04/03 Release of 1.3.0, in this version we made many optimizations and improvements:
5151
- Installation and compatibility optimization
52-
- By using paddleocr2torch, completely replaced the paddle framework and paddleocr used in the project, resolving conflicts between paddle and torch.
53-
- Removed the use of layoutlmv3 in layout, solving compatibility issues caused by `detectron2`.
54-
- Extended torch version compatibility to 2.2~2.6.
55-
- CUDA compatibility extended to 11.8~12.6 (CUDA version determined by torch), addressing compatibility issues for some users with 50-series and H-series Nvidia GPUs.
56-
- Python compatible versions extended to 3.10~3.12, resolving the issue of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
57-
- Performance optimization (compared to version 1.0.1, formula parsing speed improved by over 1400%, and overall parsing speed improved by over 500%)
58-
- Improved parsing speed for batch processing of multiple small PDF files ([script example](demo/batch_demo.py)).
59-
- Optimized the loading and usage of the mfr model, reducing memory usage and improving parsing speed. (requires re-executing the [model download process](docs/how_to_download_models_en.md) to obtain incremental updates of model files)
60-
- Optimized memory usage, allowing the project to run with as little as 6GB.
61-
- Improved running speed on mps devices.
52+
- By removing the use of `layoutlmv3` in layout, resolved compatibility issues caused by `detectron2`.
53+
- Torch version compatibility extended to 2.2~2.6 (excluding 2.5).
54+
- CUDA compatibility supports 11.8/12.4/12.6 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
55+
- Python compatible versions expanded to 3.10~3.12, solving the problem of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
56+
- Offline deployment process optimized; no internet connection required after successful deployment to download any model files.
57+
- Performance optimization
58+
- By supporting batch processing of multiple PDF files ([script example](demo/batch_demo.py)), improved parsing speed for small files in batches (compared to version 1.0.1, formula parsing speed increased by over 1400%, overall parsing speed increased by over 500%).
59+
- Optimized loading and usage of the mfr model, reducing GPU memory usage and improving parsing speed (requires re-execution of the [model download process](docs/how_to_download_models_en.md) to obtain incremental updates of model files).
60+
- Optimized GPU memory usage, requiring only a minimum of 6GB to run this project.
61+
- Improved running speed on MPS devices.
6262
- Parsing effect optimization
63-
- Updated the mfr model to unimernet(2503), solving the issue of missing line breaks in multi-line formulas.
63+
- Updated the mfr model to `unimernet(2503)`, solving the issue of lost line breaks in multi-line formulas.
64+
- Usability Optimization
65+
- By using `paddleocr2torch`, completely replaced the use of the `paddle` framework and `paddleocr` in the project, resolving conflicts between `paddle` and `torch`, as well as thread safety issues caused by the `paddle` framework.
66+
- Added a real-time progress bar during the parsing process to accurately track progress, making the wait less painful.
6467
- 2025/03/03 1.2.1 released, fixed several bugs:
6568
- Fixed the impact on punctuation marks during full-width to half-width conversion of letters and numbers
6669
- Fixed caption matching inaccuracies in certain scenarios

README_zh-CN.md

+10-8
Original file line numberDiff line numberDiff line change
@@ -46,21 +46,23 @@
4646
</div>
4747

4848
# 更新记录
49-
- 2025/04/03 1.3.0 发布,在这个版本我们做出了许多改变
49+
- 2025/04/03 1.3.0 发布,在这个版本我们做出了许多优化和改进
5050
- 安装与兼容性优化
51-
- 通过使用paddleocr2torch,完全替代了paddle框架以及paddleocr在项目中的使用,解决了paddle和torch的冲突问题
52-
- 通过移除layout中layoutlmv3的使用,解决了由`detectron2`导致的兼容问题
53-
- torch版本兼容扩展到2.2~2.6
54-
- cuda兼容扩展到11.8~12.6(cuda版本由torch决定),解决部分用户50系显卡与H系显卡的兼容问题
51+
- 通过移除layout中`layoutlmv3`的使用,解决了由`detectron2`导致的兼容问题
52+
- torch版本兼容扩展到2.2~2.6(2.5除外)
53+
- cuda兼容支持11.8/12.4/12.6(cuda版本由torch决定),解决部分用户50系显卡与H系显卡的兼容问题
5554
- python兼容版本扩展到3.10~3.12,解决了在非3.10环境下安装时自动降级到0.6.1的问题
5655
- 优化离线部署流程,部署成功后不需要联网下载任何模型文件
57-
- 性能优化(与1.0.1版本相比,公式解析速度最高提升超过1400%,整体解析速度提升超过500%)
58-
- 通过支持多个pdf文件的batch处理([脚本样例](demo/batch_demo.py)),提升了批量小文件的解析速度
56+
- 性能优化
57+
- 通过支持多个pdf文件的batch处理([脚本样例](demo/batch_demo.py)),提升了批量小文件的解析速度 (与1.0.1版本相比,公式解析速度最高提升超过1400%,整体解析速度最高提升超过500%)
5958
- 通过优化mfr模型的加载和使用,降低了显存占用并提升了解析速度(需重新执行[模型下载流程](docs/how_to_download_models_zh_cn.md)以获得模型文件的增量更新)
6059
- 优化显存占用,最低仅需6GB即可运行本项目
6160
- 优化了在mps设备上的运行速度
6261
- 解析效果优化
63-
- mfr模型更新到unimernet(2503),解决多行公式中换行丢失的问题
62+
- mfr模型更新到`unimernet(2503)`,解决多行公式中换行丢失的问题
63+
- 易用性优化
64+
- 通过使用`paddleocr2torch`,完全替代`paddle`框架以及`paddleocr`在项目中的使用,解决了`paddle``torch`的冲突问题,和由于`paddle`框架导致的线程不安全问题
65+
- 解析过程增加实时进度条显示,精准把握解析进度,让等待不再痛苦
6466
- 2025/03/03 1.2.1 发布,修复了一些问题:
6567
- 修复在字母与数字的全角转半角操作时对标点符号的影响
6668
- 修复在某些情况下caption的匹配不准确问题

0 commit comments

Comments
 (0)