Add Tesseract and PaddleOCR engine adapters with a common OCR result contract

## Summary
OCR is the shared backbone for scanned documents, plain images, and DICOM burned-in text, but OM-045 bundles it into the epic without a discrete, swappable adapter layer. We need one OCR contract (word text + bbox + confidence + page) with two interchangeable backends so all image/scan paths reuse it and tests can stub OCR deterministically.

## Scope
- [ ] Create openmed/multimodal/ocr.py defining an OcrResult contract (per-word text, bbox, confidence, page) and an `ocr(image, *, engine=...)` entry point.
- [ ] Implement a Tesseract adapter (pytesseract) and a PaddleOCR adapter, each import-guarded behind the multimodal extra and selectable by name; default to whichever is installed.
- [ ] Expose an OCR->ExtractedDocument bridge so OCR'd text flows into redact_document and detected PHI projects back to pixel bboxes.
- [ ] Add a deterministic in-memory fake OCR engine for unit tests so the suite does not require Tesseract/Paddle binaries.
- [ ] Document the system-level Tesseract binary requirement in the extra's install notes.

## Acceptance criteria
- [ ] ocr() returns an OcrResult with word bboxes and confidences from a synthetic image (fake engine in CI).
- [ ] Both Tesseract and PaddleOCR adapters expose the same OcrResult shape (verified via the fake-engine contract test).
- [ ] OCR'd text feeds redact_document and a PHI word projects to the correct pixel bbox.
- [ ] Missing OCR engine yields a clear actionable message.
- [ ] test suite green: .venv/bin/python -m pytest tests/ -q

## Out of scope
- Training or fine-tuning OCR models.
- Pixel redaction rendering (image-redaction task).
- Non-English OCR language packs (separate config task).

## Files
- openmed/multimodal/ocr.py
- pyproject.toml
- tests/unit/multimodal/test_ocr_contract.py
- tests/unit/multimodal/test_ocr_engines.py

---
Task: OM-061  ·  Milestone: v1.7  ·  Priority: P1  ·  Size: M
Depends on: OM-045, OM-058  ·  Blocks: OM-065, OM-075
Roadmap: OPENMED_V2_UNIFIED_ROADMAP.md sec 5.8 (PaddleOCR/Tesseract); RESEARCH_APPENDIX sec 7; OM-045 scope (c)
Spec: PLANS/V2/EXECUTION/tasks/OM-061.md


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tesseract and PaddleOCR engine adapters with a common OCR result contract #226

Summary

Scope

Acceptance criteria

Out of scope

Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Tesseract and PaddleOCR engine adapters with a common OCR result contract #226

Description

Summary

Scope

Acceptance criteria

Out of scope

Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions