Skip to content

feat: add docTR OCR engine adapter with absolute coordinate conversion#558

Open
VishnuPrasath-S-20 wants to merge 3 commits into
maziyarpanahi:masterfrom
VishnuPrasath-S-20:feature/doctr-ocr-adapter
Open

feat: add docTR OCR engine adapter with absolute coordinate conversion#558
VishnuPrasath-S-20 wants to merge 3 commits into
maziyarpanahi:masterfrom
VishnuPrasath-S-20:feature/doctr-ocr-adapter

Conversation

@VishnuPrasath-S-20

@VishnuPrasath-S-20 VishnuPrasath-S-20 commented Jun 21, 2026

Copy link
Copy Markdown

Pull Request

Description

This PR introduces a docTR OCR engine adapter to the multimodal suite. It provides a structured wrapper that processes input images, extracts text and bounding boxes using python-doctr, and standardizes the output into the project's native OcrResult schema.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Test addition/improvement

Changes Made

  • Created openmed/multimodal/ocr.py implementing the run_doctr_ocr adapter.
  • Implemented coordinate conversion mapping to scale docTR's relative floating-point bounds $(0.0 - 1.0)$ into absolute pixel integer coordinates based on page dimensions.
  • Added an import guard flag (DOCTR_AVAILABLE) to gracefully handle python-doctr as an optional dependency.
  • Updated pyproject.toml to register python-doctr under a new multimodal optional dependency group.
  • Added automated unit tests in tests/unit/multimodal/test_ocr_doctr.py using mock components to validate mapping coordinates and dependency handling without requiring model weight downloads.

Testing Done

Validated the changes locally inside the virtual environment using pytest:
pytest tests/unit/multimodal/test_ocr_doctr.py
Output: 2 passed, 1 warning in 2.11s

Related Issues

Closes #440

@maziyarpanahi maziyarpanahi added feature New capability good first issue Good for newcomers help wanted Extra attention is needed P2 Medium roadmap-v2 OpenMed V2 roadmap backlog labels Jun 22, 2026

@maziyarpanahi maziyarpanahi left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @VishnuPrasath-S-20. I reviewed this against #440 / OM-241 and added one maintainer follow-up commit: feat: expose lazy docTR OCR contract.

What I changed:

  • added the public OCR entry point ocr(image, engine="doctr") plus engine="auto" selection;
  • kept docTR imports lazy so importing openmed.multimodal.ocr does not import or require python-doctr;
  • made OcrResult an immutable per-word contract with text, absolute pixel bbox, confidence, and page;
  • preserved docTR relative-to-absolute bbox conversion without downloading models in tests;
  • added actionable missing-dependency errors that name openmed[multimodal];
  • pinned python-doctr>=1.0 under the multimodal extra and added its Apache-2.0 license entry to the release policy;
  • added the missing Closes #440 PR body link and copied #440's labels onto the PR.

Verification on the current PR checkout:

  • PYTHONPATH=/private/tmp/openmed-pr-558 /Users/maziyar/Developer/openmed/.venv/bin/python -m pytest tests/unit/multimodal/test_ocr_doctr.py tests/unit/release/test_license_policy.py -q -> 12 passed
  • /Users/maziyar/Developer/openmed/.venv/bin/ruff check openmed/multimodal/__init__.py openmed/multimodal/ocr.py scripts/release/check_license_policy.py tests/unit/multimodal/test_ocr_doctr.py pyproject.toml -> passed
  • /Users/maziyar/Developer/openmed/.venv/bin/ruff format --check openmed/multimodal/__init__.py openmed/multimodal/ocr.py scripts/release/check_license_policy.py tests/unit/multimodal/test_ocr_doctr.py -> passed

The branch is mergeable with no conflicts; GitHub has not attached hosted checks to the new head commit yet, so I verified the touched behavior locally.

…dapter

# Conflicts:
#	openmed/multimodal/__init__.py
#	scripts/release/check_license_policy.py
@maziyarpanahi

Copy link
Copy Markdown
Owner

Thank you @VishnuPrasath-S-20. I rechecked this PR against #440 after master moved and added a merge-resolution commit: Merge remote-tracking branch 'origin/master' into feature/doctr-ocr-adapter.

What changed:

  • resolved the new multimodal package conflict by keeping both the shared document-ingest contract exports and the docTR OCR adapter exports;
  • consolidated the multimodal optional dependency group so pdfplumber, python-docx, Pillow, and python-doctr live in one valid TOML list;
  • kept both reviewed license entries in the release license policy allowlist.

Verification:

  • /Users/maziyar/Developer/openmed/.venv/bin/python -m pytest tests/unit/multimodal/test_ocr_doctr.py tests/unit/multimodal/test_base_contract.py tests/unit/multimodal/test_multimodal_extra.py -q -> 19 passed
  • /Users/maziyar/Developer/openmed/.venv/bin/ruff check openmed/multimodal/__init__.py openmed/multimodal/ocr.py openmed/multimodal/base.py openmed/multimodal/exceptions.py tests/unit/multimodal/test_ocr_doctr.py tests/unit/multimodal/test_base_contract.py tests/unit/multimodal/test_multimodal_extra.py scripts/release/check_license_policy.py -> passed
  • /Users/maziyar/Developer/openmed/.venv/bin/ruff format --check openmed/multimodal/__init__.py openmed/multimodal/ocr.py openmed/multimodal/base.py openmed/multimodal/exceptions.py tests/unit/multimodal/test_ocr_doctr.py tests/unit/multimodal/test_base_contract.py tests/unit/multimodal/test_multimodal_extra.py scripts/release/check_license_policy.py -> passed
  • /Users/maziyar/Developer/openmed/.venv/bin/python scripts/release/check_license_policy.py -> passed

The issue labels are already on the PR. The branch is mergeable with no conflicts. GitHub has not reported hosted checks for this fork head, so the validation above is local.

@VishnuPrasath-S-20

Copy link
Copy Markdown
Author

Thank you, @maziyarpanahi! Much appreciate you handling the merge-resolution and keeping the core multimodal dependencies clean. Glad to help improve the docTR OCR pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New capability good first issue Good for newcomers help wanted Extra attention is needed P2 Medium roadmap-v2 OpenMed V2 roadmap backlog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add docTR engine adapter to the OCR contract

2 participants