Skip to content

Add Tesseract and PaddleOCR engine adapters with a common OCR result contract #226

Description

@maziyarpanahi

Summary

OCR is the shared backbone for scanned documents, plain images, and DICOM burned-in text, but OM-045 bundles it into the epic without a discrete, swappable adapter layer. We need one OCR contract (word text + bbox + confidence + page) with two interchangeable backends so all image/scan paths reuse it and tests can stub OCR deterministically.

Scope

  • Create openmed/multimodal/ocr.py defining an OcrResult contract (per-word text, bbox, confidence, page) and an ocr(image, *, engine=...) entry point.
  • Implement a Tesseract adapter (pytesseract) and a PaddleOCR adapter, each import-guarded behind the multimodal extra and selectable by name; default to whichever is installed.
  • Expose an OCR->ExtractedDocument bridge so OCR'd text flows into redact_document and detected PHI projects back to pixel bboxes.
  • Add a deterministic in-memory fake OCR engine for unit tests so the suite does not require Tesseract/Paddle binaries.
  • Document the system-level Tesseract binary requirement in the extra's install notes.

Acceptance criteria

  • ocr() returns an OcrResult with word bboxes and confidences from a synthetic image (fake engine in CI).
  • Both Tesseract and PaddleOCR adapters expose the same OcrResult shape (verified via the fake-engine contract test).
  • OCR'd text feeds redact_document and a PHI word projects to the correct pixel bbox.
  • Missing OCR engine yields a clear actionable message.
  • test suite green: .venv/bin/python -m pytest tests/ -q

Out of scope

  • Training or fine-tuning OCR models.
  • Pixel redaction rendering (image-redaction task).
  • Non-English OCR language packs (separate config task).

Files

  • openmed/multimodal/ocr.py
  • pyproject.toml
  • tests/unit/multimodal/test_ocr_contract.py
  • tests/unit/multimodal/test_ocr_engines.py

Task: OM-061 · Milestone: v1.7 · Priority: P1 · Size: M
Depends on: OM-045, OM-058 · Blocks: OM-065, OM-075
Roadmap: OPENMED_V2_UNIFIED_ROADMAP.md sec 5.8 (PaddleOCR/Tesseract); RESEARCH_APPENDIX sec 7; OM-045 scope (c)
Spec: PLANS/V2/EXECUTION/tasks/OM-061.md

Metadata

Metadata

Assignees

Labels

P1HighfeatureNew capabilityroadmap-v2OpenMed V2 roadmap backlog

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions