IndicPhotoOCR Optimization & Stabilization Report

This report summarizes the comprehensive series of performance optimizations, bug fixes, and architectural improvements applied to the IndicPhotoOCR repository up to the current commit string.

🚀 Performance Metrics & Improvements

Significant architectural changes (specifically Model Caching and Batch Inference) have dramatically accelerated the pipeline's execution times.

Metric	Older Version	Current Version	Improvement
Sequential Time	89.61 seconds	17.80 seconds	~5.0x Faster (80% reduction)
Batched Time	92.93 seconds	14.10 seconds	~6.6x Faster (85% reduction)

Key Takeaways

The 5x Sequential Speedup is primarily driven by the Model Caching implementation. Previously, models were re-loaded from disk into memory for every single cropped word loop, which heavily bottlenecked inference. By preserving models in memory, the baseline sequential execution time drastically collapsed from 89.61s to 17.80s.
The additional 21% Batched Speedup (17.80s → 14.10s) is achieved via the newly implemented Batch Inference method, which groups multiple bounding box crops into a single combined tensor/pipeline array, maximizing GPU efficiency and lowering loop overheads.

🛠️ Summary of All Codebase Improvements

1. Architectural Optimizations

Model In-Memory Caching: Added caching functionality to both VIT_identifier and PARseqrecogniser. Models are now loaded once upon request and persisted in memory for consecutive word detections, completely eliminating redundant I/O disk bottlenecks.
Batch Inference Pipeline (feat/batch-inference):
- ViT Identifier: Integrated HuggingFace pipeline batched processing natively into identify_batch().
- PARseq Recogniser: Created custom get_model_output_batch() and recognise_batch() methods utilizing torch.stack() and torch.no_grad() to compute parallel logit extraction across sequence predictions.
- Orchestration: Updated OCR.ocr(image_path, batch_size=X) to natively support batched detection grouping, cleanly reorganizing bounding boxes into language-specific batch tensors before restoring their original associative structure.

2. Stabilization & Safety Hooks

Atomic Model Checkpoint Downloads: Prevented corrupted network downloads! Modified the ensure_model() helper utilities (textbpn, parseq, vit, CLIP). Model shards are strictly written into progressive .tmp suffixes and only explicitly moved via os.rename() to the final endpoint upon fully completing a 200 OK fetch.
Relative to Absolute Paths: Resolved significant routing fragility by detaching hard-coded path invocations (e.g., IndicPhotoOCR/recognition/). Dynamic resolution relies on os.path.dirname(os.path.abspath(__file__)), meaning end-users can invoke IndicPhotoOCR gracefully from anywhere within their root OS directory.
Robust Temporary Crop Handling: Converted hardcoded transient folders (IndicPhotoOCR/script_identification/images/) over to globally compliant temp directories (import tempfile.mkstemp()). Reconciled garbage collection bugs to ensure .jpg crop trails are accurately deleted off the OS without bleeding.

3. Feature Extensions

Confidence Scoring (feat/confidence-scores): Implemented logit pooling to calculate mean token probability arrays during extraction. OCR.recognise now has the capability of emitting the word along with its scalar certainty.

4. Backwards Compatibility & Automated Testing

100% Core Functionality Preserved:
- OCR.recognise() enforces backward API parity using return_confidence=False by default so existing dependent scripts do not suddenly encounter tuple unpacking TypeErrors.
- OCR.ocr() preserves default batch_size=0, guaranteeing existing loops will continue running identically in sequential mode without disruption unless explicitly elevated.
Test Suite Introduction (origin/test):
- Added ~106 extensive pytest conditions checking mocking validations, dataset layout parsing logic (detect_para), and network retry policies to protect against future regressions.

All implementations verified safe & green via the fix/quick-wins branch.

User Guide: New Features & Performance

Welcome to the latest version of IndicPhotoOCR! We have upgraded the pipeline under the hood so it runs exponentially faster, while adding a few requested features to the public API.

Here is a quick breakdown of what is new for users.

🚀 1. Massive Performance Upgrade

You do not need to change any of your code to experience the new performance upgrades. The underlying models are now intelligently cached in memory, which means sequential scanning will naturally run 5x faster on average.

Mode	Previous Version	Current Version
Out-of-the-box (Sequential)	~90 seconds	~18 seconds

⚡ 2. Batch Inference (New Feature)

If you are running OCR on an image containing a large paragraph of text (many words), you can explicitly turn on Batch Inference to process them concurrently for even faster extraction.

How to use it: Pass the batch_size parameter to the main .ocr() method:

from IndicPhotoOCR.ocr import OCR

ocr = OCR()

# Old Way (Still works natively!)
results = ocr.ocr("image.jpg")

# New Way (Enable Batching for faster large-document processing)
results = ocr.ocr("image.jpg", batch_size=32)

Mode	Previous Version	Current Batched
Batched Processing	~93 seconds	~14 seconds

🎯 3. Confidence Scores (New Feature)

You can now extract the neural net's confidence score representing the certainty of the extracted text. This is highly useful for downstream NLP evaluation, filtering bad predictions, or generating Word Error Rate (WER) metrics.

Through the main `ocr()` pipeline:

The confidence score is automatically injected into the default dictionary output for each bounding box:

results = ocr.ocr("image.jpg")

# Example Output
# {
#   "img_0": {
#     "txt": "नमस्ते", 
#     "bbox": [531, 237, 725, 290], 
#     "confidence": 0.985  <--- NEW!
#   }
# }

Through the direct `recognise()` function:

If you are passing your own cropped images directly to the recogniser, you can optionally request the confidence score by passing return_confidence=True:

# Old behavior (returns string text)
text = ocr.recognise("crop.jpg", "hindi")

# New behavior (returns tuple with confidence float)
text, conf_score = ocr.recognise("crop.jpg", "hindi", return_confidence=True)

print(f"I am {conf_score * 100}% confident the text is: {text}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndicPhotoOCR Optimization & Stabilization Report

🚀 Performance Metrics & Improvements

Key Takeaways

🛠️ Summary of All Codebase Improvements

1. Architectural Optimizations

2. Stabilization & Safety Hooks

3. Feature Extensions

4. Backwards Compatibility & Automated Testing

User Guide: New Features & Performance

🚀 1. Massive Performance Upgrade

⚡ 2. Batch Inference (New Feature)

🎯 3. Confidence Scores (New Feature)

Through the main `ocr()` pipeline:

Through the direct `recognise()` function:

FilesExpand file tree

optimazation_report_april2026.md

Latest commit

History

optimazation_report_april2026.md

File metadata and controls

IndicPhotoOCR Optimization & Stabilization Report

🚀 Performance Metrics & Improvements

Key Takeaways

🛠️ Summary of All Codebase Improvements

1. Architectural Optimizations

2. Stabilization & Safety Hooks

3. Feature Extensions

4. Backwards Compatibility & Automated Testing

User Guide: New Features & Performance

🚀 1. Massive Performance Upgrade

⚡ 2. Batch Inference (New Feature)

🎯 3. Confidence Scores (New Feature)

Through the main ocr() pipeline:

Through the direct recognise() function:

Through the main `ocr()` pipeline:

Through the direct `recognise()` function: