v2.0_initial Update

Fix Docker permission instability + optimize memory usage in advanced_ocr.py

⸻

Summary

This patch brings two major improvements to the Versatile-OCR-Program:

Fixes a Docker permission loss issue on Vertex AI / Jupyter environments
Optimizes memory usage in advanced_ocr.py to handle large, image-heavy PDFs more efficiently

[1] Fix: Docker Permission Instability After Kernel Interruptions

Problem

Docker commands would fail with Permission denied, after a Jupyter kernel interruption (due to memory spikes or manual stops).

Root Cause

The jupyter user was not persistently recognized as a member of the docker group unless the machine was rebooted.
This behavior is specific to Jupyter-based environments (e.g., Vertex AI, Colab Pro VMs) where group permissions are reset per session.

Failed Attempts

Adding sudo inside subprocess.run() failed due to the absence of a TTY.
Using shell=True caused unpredictable behavior and was ultimately removed.

Final Fix

The jupyter user was permanently added to the docker group:

sudo usermod -aG docker jupyter
sudo reboot

  
•	All subprocess calls to Docker now use plain docker run without sudo.

Impact • Prevents permission loss on session or kernel restart. • Ensures stable and persistent Docker access inside Jupiter Notebooks. • Simplifies code and avoids reliance on elevated permissions.

⸻

[2] Feature: Memory Optimization in advanced_ocr.py

The advanced_ocr.py module was refactored to significantly reduce memory usage without changing core functionality or output format.

Key Optimizations:

Garbage Collection • Added gc.collect() after large memory operations. • Imported the gc module for explicit cleanup.
Image Processing • Resized large images before feeding them into OCR pipelines. • Applied JPEG compression with quality 85 to reduce in-memory buffer size. • Used downscaled thumbnails for hash operations. • Released all image buffers immediately after use.
Memory Management • Explicitly used del to release large objects. • Used .copy() after cropping to avoid memory leaks from image views. • Switched to page-by-page PDF parsing instead of loading entire files.
Efficient String Building • Replaced inefficient += concatenations with list-based string assembly using ''.join(). • Split large text blocks into smaller, manageable chunks.
API Handling Improvements • Reduced request payload size for external API calls (e.g., Gemini). • Cleaned up response objects immediately after use to free memory.

Impact • Handles high-resolution, multi-page PDFs (100–200+ pages) without exceeding memory limits. • Prevents kernel crashes on large inputs. • Keeps behavior and output identical to the original.

⸻

Files Affected • ocr_stage1.py (Docker execution logic) • advanced_ocr.py (OCR core logic)

⸻

Recommendation

Use this update if you’re running the Versatile-OCR-Program in a Jupyter-based cloud environment (e.g., Vertex AI, GCP Notebook, Colab Pro). It ensures both system stability and memory efficiency — especially when processing large, image-rich PDF documents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0_initial Update

Summary

[1] Fix: Docker Permission Instability After Kernel Interruptions

FilesExpand file tree

v2.0_initial.md

Latest commit

History

v2.0_initial.md

File metadata and controls

v2.0_initial Update

Summary

[1] Fix: Docker Permission Instability After Kernel Interruptions