Skip to content

Latest commit

 

History

History
94 lines (61 loc) · 3.22 KB

File metadata and controls

94 lines (61 loc) · 3.22 KB

v2.0_initial Update

Fix Docker permission instability + optimize memory usage in advanced_ocr.py

Summary

This patch brings two major improvements to the Versatile-OCR-Program:

  1. Fixes a Docker permission loss issue on Vertex AI / Jupyter environments

  2. Optimizes memory usage in advanced_ocr.py to handle large, image-heavy PDFs more efficiently


[1] Fix: Docker Permission Instability After Kernel Interruptions

Problem

  • Docker commands would fail with Permission denied, after a Jupyter kernel interruption (due to memory spikes or manual stops).

Root Cause

  • The jupyter user was not persistently recognized as a member of the docker group unless the machine was rebooted.

  • This behavior is specific to Jupyter-based environments (e.g., Vertex AI, Colab Pro VMs) where group permissions are reset per session.

Failed Attempts

  • Adding sudo inside subprocess.run() failed due to the absence of a TTY.

  • Using shell=True caused unpredictable behavior and was ultimately removed.

Final Fix

  • The jupyter user was permanently added to the docker group:
    sudo usermod -aG docker jupyter
    sudo reboot
    
      
    •	All subprocess calls to Docker now use plain docker run without sudo.
    

Impact • Prevents permission loss on session or kernel restart. • Ensures stable and persistent Docker access inside Jupiter Notebooks. • Simplifies code and avoids reliance on elevated permissions.

[2] Feature: Memory Optimization in advanced_ocr.py

The advanced_ocr.py module was refactored to significantly reduce memory usage without changing core functionality or output format.

Key Optimizations:

  1. Garbage Collection • Added gc.collect() after large memory operations. • Imported the gc module for explicit cleanup.

  2. Image Processing • Resized large images before feeding them into OCR pipelines. • Applied JPEG compression with quality 85 to reduce in-memory buffer size. • Used downscaled thumbnails for hash operations. • Released all image buffers immediately after use.

  3. Memory Management • Explicitly used del to release large objects. • Used .copy() after cropping to avoid memory leaks from image views. • Switched to page-by-page PDF parsing instead of loading entire files.

  4. Efficient String Building • Replaced inefficient += concatenations with list-based string assembly using ''.join(). • Split large text blocks into smaller, manageable chunks.

  5. API Handling Improvements • Reduced request payload size for external API calls (e.g., Gemini). • Cleaned up response objects immediately after use to free memory.

Impact • Handles high-resolution, multi-page PDFs (100–200+ pages) without exceeding memory limits. • Prevents kernel crashes on large inputs. • Keeps behavior and output identical to the original.

Files Affected • ocr_stage1.py (Docker execution logic) • advanced_ocr.py (OCR core logic)

Recommendation

Use this update if you’re running the Versatile-OCR-Program in a Jupyter-based cloud environment (e.g., Vertex AI, GCP Notebook, Colab Pro). It ensures both system stability and memory efficiency — especially when processing large, image-rich PDF documents.