Fix Docker permission instability + optimize memory usage in advanced_ocr.py
⸻
This patch brings two major improvements to the Versatile-OCR-Program:
-
Fixes a Docker permission loss issue on Vertex AI / Jupyter environments
-
Optimizes memory usage in
advanced_ocr.pyto handle large, image-heavy PDFs more efficiently
Problem
- Docker commands would fail with
Permission denied, after a Jupyter kernel interruption (due to memory spikes or manual stops).
Root Cause
-
The
jupyteruser was not persistently recognized as a member of thedockergroup unless the machine was rebooted. -
This behavior is specific to Jupyter-based environments (e.g., Vertex AI, Colab Pro VMs) where group permissions are reset per session.
Failed Attempts
-
Adding
sudoinsidesubprocess.run()failed due to the absence of a TTY. -
Using
shell=Truecaused unpredictable behavior and was ultimately removed.
Final Fix
- The
jupyteruser was permanently added to thedockergroup:sudo usermod -aG docker jupyter sudo reboot • All subprocess calls to Docker now use plain docker run without sudo.
Impact • Prevents permission loss on session or kernel restart. • Ensures stable and persistent Docker access inside Jupiter Notebooks. • Simplifies code and avoids reliance on elevated permissions.
⸻
[2] Feature: Memory Optimization in advanced_ocr.py
The advanced_ocr.py module was refactored to significantly reduce memory usage without changing core functionality or output format.
Key Optimizations:
-
Garbage Collection • Added gc.collect() after large memory operations. • Imported the gc module for explicit cleanup.
-
Image Processing • Resized large images before feeding them into OCR pipelines. • Applied JPEG compression with quality 85 to reduce in-memory buffer size. • Used downscaled thumbnails for hash operations. • Released all image buffers immediately after use.
-
Memory Management • Explicitly used del to release large objects. • Used .copy() after cropping to avoid memory leaks from image views. • Switched to page-by-page PDF parsing instead of loading entire files.
-
Efficient String Building • Replaced inefficient += concatenations with list-based string assembly using ''.join(). • Split large text blocks into smaller, manageable chunks.
-
API Handling Improvements • Reduced request payload size for external API calls (e.g., Gemini). • Cleaned up response objects immediately after use to free memory.
Impact • Handles high-resolution, multi-page PDFs (100–200+ pages) without exceeding memory limits. • Prevents kernel crashes on large inputs. • Keeps behavior and output identical to the original.
⸻
Files Affected • ocr_stage1.py (Docker execution logic) • advanced_ocr.py (OCR core logic)
⸻
Recommendation
Use this update if you’re running the Versatile-OCR-Program in a Jupyter-based cloud environment (e.g., Vertex AI, GCP Notebook, Colab Pro). It ensures both system stability and memory efficiency — especially when processing large, image-rich PDF documents.