feat: add docTR OCR engine adapter with absolute coordinate conversion#558
feat: add docTR OCR engine adapter with absolute coordinate conversion#558VishnuPrasath-S-20 wants to merge 3 commits into
Conversation
maziyarpanahi
left a comment
There was a problem hiding this comment.
Thank you @VishnuPrasath-S-20. I reviewed this against #440 / OM-241 and added one maintainer follow-up commit: feat: expose lazy docTR OCR contract.
What I changed:
- added the public OCR entry point
ocr(image, engine="doctr")plusengine="auto"selection; - kept docTR imports lazy so importing
openmed.multimodal.ocrdoes not import or requirepython-doctr; - made
OcrResultan immutable per-word contract with text, absolute pixel bbox, confidence, and page; - preserved docTR relative-to-absolute bbox conversion without downloading models in tests;
- added actionable missing-dependency errors that name
openmed[multimodal]; - pinned
python-doctr>=1.0under themultimodalextra and added its Apache-2.0 license entry to the release policy; - added the missing
Closes #440PR body link and copied #440's labels onto the PR.
Verification on the current PR checkout:
PYTHONPATH=/private/tmp/openmed-pr-558 /Users/maziyar/Developer/openmed/.venv/bin/python -m pytest tests/unit/multimodal/test_ocr_doctr.py tests/unit/release/test_license_policy.py -q-> 12 passed/Users/maziyar/Developer/openmed/.venv/bin/ruff check openmed/multimodal/__init__.py openmed/multimodal/ocr.py scripts/release/check_license_policy.py tests/unit/multimodal/test_ocr_doctr.py pyproject.toml-> passed/Users/maziyar/Developer/openmed/.venv/bin/ruff format --check openmed/multimodal/__init__.py openmed/multimodal/ocr.py scripts/release/check_license_policy.py tests/unit/multimodal/test_ocr_doctr.py-> passed
The branch is mergeable with no conflicts; GitHub has not attached hosted checks to the new head commit yet, so I verified the touched behavior locally.
…dapter # Conflicts: # openmed/multimodal/__init__.py # scripts/release/check_license_policy.py
|
Thank you @VishnuPrasath-S-20. I rechecked this PR against #440 after What changed:
Verification:
The issue labels are already on the PR. The branch is mergeable with no conflicts. GitHub has not reported hosted checks for this fork head, so the validation above is local. |
|
Thank you, @maziyarpanahi! Much appreciate you handling the merge-resolution and keeping the core multimodal dependencies clean. Glad to help improve the docTR OCR pipeline |
Pull Request
Description
This PR introduces a docTR OCR engine adapter to the multimodal suite. It provides a structured wrapper that processes input images, extracts text and bounding boxes using
python-doctr, and standardizes the output into the project's nativeOcrResultschema.Type of Change
Changes Made
openmed/multimodal/ocr.pyimplementing therun_doctr_ocradapter.DOCTR_AVAILABLE) to gracefully handlepython-doctras an optional dependency.pyproject.tomlto registerpython-doctrunder a newmultimodaloptional dependency group.tests/unit/multimodal/test_ocr_doctr.pyusing mock components to validate mapping coordinates and dependency handling without requiring model weight downloads.Testing Done
Validated the changes locally inside the virtual environment using pytest:
pytest tests/unit/multimodal/test_ocr_doctr.pyOutput:
2 passed, 1 warning in 2.11sRelated Issues
Closes #440