Serverless OCR for PDF files using EasyOCR on Runpod. Provide public PDF URLs and get extracted text with bounding boxes (normalized to 0..1) and confidence per page. Images are auto-rotated using Tesseract OSD before OCR.
pdf_urlorpdf_urls: Single URL string or list of public PDF URLs.languages: List of language codes for EasyOCR (default['ch_sim','en']).gpu: Boolean to use GPU if available (defaulttrue).detail:1for boxes, text, confidence;0for text only (default1).dpi: Rendering DPI for PDF to image (default200).page_indices: List of zero-based page indices to process.page_from/page_to: Page range (inclusive) to process.page_limit: Max number of pages to process.batched: Use EasyOCRreadtext_batchedto process pages in a batch (defaultfalse).n_width/n_height: When batching, resize all pages to these exact dimensions. If not provided and pages differ in size, the largest width/height are used.cudnn_benchmark: Enables cuDNN benchmark mode for consistent batch sizes (defaultfalse).
Orientation: The worker uses Tesseract OSD to detect and correct page orientation prior to OCR. This helps with multilingual documents (Thai, Chinese, English). Minimal orientation metadata is attached per page.
{
"pdf_urls": [
"https://arxiv.org/pdf/1708.01204.pdf"
],
"languages": ["ch_sim", "en"],
"gpu": true,
"detail": 1,
"dpi": 200,
"page_limit": 1,
"batched": true,
"n_width": 1200,
"n_height": 1600,
"cudnn_benchmark": true
}{
"results": [
{
"url": "https://...",
"pages": [
{
"index": 0,
"results": [
{ "box": [[x,y],...], "text": "...", "confidence": 0.99 }
],
"orientation": { "rotate": 0, "script": "Latin" }
}
]
}
],
"languages": ["ch_sim","en"],
"gpu": true,
"detail": 1,
"dpi": 200
}You can run the handler locally by setting INPUT_JSON and executing the file, or by using the Runpod testing CLI. This repo also includes .runpod/tests.json for Hub automated tests.
- Ensure Docker is available and build the image:
docker build -t runpod-easyocr . - Push to your registry or connect the repo to Runpod Hub.
- Create a release on GitHub to trigger Hub ingestion.
- The worker uses PyMuPDF to render PDF pages to images, avoiding external system dependencies.
- The EasyOCR Reader is cached between requests to avoid reloading weights.
- Set default languages via env
READER_LANGS(e.g.,ch_sim,en). - Output
boxcoordinates are normalized:xin [0.0,1.0] relative to image width andyin [0.0,1.0] relative to image height. - Pages are orientation-corrected using Tesseract OSD (
pytesseract). The base image contains Tesseract and language packs for Thai, Simplified/Traditional Chinese, and English.