You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**SGLang support**: Added `SGLangVLMEngine` for serving VLMs with SGLang.
29
-
-**Optional prompts**: `OCREngine` now accepts `system_prompt=False` / `user_prompt=False` for models that don't need them (e.g., PaddleOCR, LightOn-OCR).
30
-
-**Graceful shutdown**: `concurrent_ocr()` cancels in-flight VLM calls when the consumer stops iterating. CLI Ctrl+C and the web app Stop button now abort cleanly.
-**VLM-based rotation correction**: `rotate_correction` now accepts `"tesseract"`, `"vlm"`, or `False`. Use `"vlm"` when Tesseract isn't installed or struggles with noisy scans.
-**BBox output mode**: New `output_mode="bbox"` returns OCR text with bounding-box coordinates and labels per region. Leave `user_prompt` empty for full-text bbox OCR or set it to a free-text instruction (e.g., `"patient name and DOB"`) for targeted extraction. Built-in format registry covers Qwen3-VL, Gemma 3/4, and GPT-4.1.
33
33
34
34
## Table of Contents
35
35
-[Overview](#overview)
@@ -243,6 +243,49 @@ async def run_ocr():
243
243
asyncio.run(run_ocr())
244
244
```
245
245
246
+
Run OCR with bounding boxes (`output_mode="bbox"`). Leave `user_prompt` empty for full-text bbox OCR, or set it to a free-text instruction for targeted extraction:
0 commit comments