feat(readers): add PaddleOCRAPIReader in llama-index-readers-paddle-ocr by alialiaaa0712-cell · Pull Request #22147 · run-llama/llama_index

alialiaaa0712-cell · 2026-06-26T03:49:25Z

Description

Enhances the existing llama-index-readers-paddle-ocr package :

PaddleOCRAPIReader (new class): A new reader that supports both image files (jpg/png/etc.) and PDF documents via the PaddleOCR official SDK. Key features:

use_parse=False (default): calls ocr(), returns plain text per page
use_parse=True + parse-capable model (PP-StructureV3, PaddleOCR-VL-*): calls parse_document(), returns Markdown text per page
use_parse=True + OCR-only model: automatically falls back to ocr()
Configurable model, use_doc_orientation_classify, use_doc_unwarping parameters
True async support via AsyncPaddleOCRClient
Input validation: model name checked against SDK enum, file existence checked before API call

New Package?

Yes
No

Version Bump?

Yes
No

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

All tests in tests/test_readers_paddle_ocr.py pass (45 tests total), covering:

PaddleOCRAPIReader: model validation, options construction, parse vs OCR routing, fallback behavior, file-not-found handling, async load

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

alialiaaa0712-cell · 2026-06-26T06:02:41Z

PTAL

alialiaaa0712-cell added 2 commits June 25, 2026 17:46

migration

99e94f5

fix some nits

d0c08ab

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 26, 2026

fix docs

b203021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(readers): add PaddleOCRAPIReader in llama-index-readers-paddle-ocr#22147

feat(readers): add PaddleOCRAPIReader in llama-index-readers-paddle-ocr#22147
alialiaaa0712-cell wants to merge 3 commits into
run-llama:mainfrom
alialiaaa0712-cell:feat/paddleocr

alialiaaa0712-cell commented Jun 26, 2026 •

edited

Loading

Uh oh!

alialiaaa0712-cell commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alialiaaa0712-cell commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

alialiaaa0712-cell commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alialiaaa0712-cell commented Jun 26, 2026 •

edited

Loading