OCR System Setup Guide

This guide provides step-by-step instructions for setting up the EJU OCR system, including environment configuration, NVIDIA setup, API key requirements, and file organization.

By default,the files are stored in the user’s directory (/home/jupyter), but you should modify the path according to your own environment.

Important update If you are using the v2.0_initial version, please enter the following bash code in your terminal.

sudo usermod -aG docker jupyter
  sudo reboot

1. Environment File Setup

Create a .env file in your project directory with the following content. Replace the placeholder values with your actual API keys and credentials:

OPENAI_API_KEY=your_openai_api_key_here
MATHPIX_APP_ID=your_mathpix_app_id_here
MATHPIX_APP_KEY=your_mathpix_app_key_here
GOOGLE_SHEETS_SPREADSHEET_ID=your_google_sheets_id_here
GOOGLE_APPLICATION_CREDENTIALS=/home/jupyter/credentials/Vision_S.Account.json
GEMINI_API_KEY=your_gemini_api_key_here

2. Required Python Packages

Install the required Python packages:

pip install google-genai
pip install openai

3. NVIDIA Setup

Follow these steps to set up NVIDIA for GPU acceleration:

3.1. Install NVIDIA Container Toolkit

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

3.2. Configure Docker to Use NVIDIA Runtime

Check if the Docker daemon configuration file exists:

cat /etc/docker/daemon.json

If the file doesn't exist or doesn't contain NVIDIA runtime configuration, create or edit it:

sudo nano /etc/docker/daemon.json

Add the following content (make sure to maintain proper indentation):

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

3.3. Verify GPU Recognition

Test if Docker can access the GPU:

docker run --gpus all nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 nvidia-smi

3.4. Check CUDA Version

Verify the CUDA version:

docker run --gpus all --rm nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 nvcc --version

If both commands display output without errors, your NVIDIA setup is complete!

4. API Key Requirements

You need to obtain API keys from the following services:

OpenAI API Key: Register at OpenAI Platform to get your API key.
Gemini API Key: Get your API key from Google AI Studio.
MathPix API Key and App ID: Register at MathPix to get your API key and App ID.
Google Cloud Service Account: Create a service account with Vision API and Storage permissions in the Google Cloud Console.

5. File Organization

The following files must be in the same directory (e.g., in a docker folder):

Dockerfile
advanced_ocr.py
custom_doclayout_yolo.py

6. Google Cloud Storage (GCS) Bucket Setup

Create a GCS bucket in the Google Cloud Console.
Make sure your service account has the necessary permissions to access this bucket.
Update the GCS_BUCKET_NAME environment variable in your .env file with your bucket name.

7. Credentials Setup

Create a credentials directory to store your Google service account JSON files:

mkdir -p /home/jupyter/credentials

Place your service account JSON files in this directory:

Vision_S.Account.json - For Google Vision API
Sheets_S.Account.json - For Google Sheets API

8. Running the OCR System

After completing all the setup steps, you can run the OCR system using the Docker container:

python ocr_stage1.py

This will:

Build the Docker image if it doesn't exist
Mount the input, output, and credentials directories
Run the OCR processing on your PDF files

Troubleshooting

If you encounter GPU-related errors, make sure your NVIDIA drivers are properly installed and compatible with the CUDA version.
If API calls fail, verify that your API keys are correctly set in the .env file.
For Docker-related issues, check that the Docker daemon is running and properly configured for NVIDIA runtime.

Additional Notes

The OCR system processes PDF files from the input directory specified in the OCR_stage1.py script.
Results are saved to the output directory and also uploaded to your GCS bucket.
To customize the output language, modify the prompt templates in the OCR scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR System Setup Guide

1. Environment File Setup

2. Required Python Packages

3. NVIDIA Setup

3.1. Install NVIDIA Container Toolkit

3.2. Configure Docker to Use NVIDIA Runtime

3.3. Verify GPU Recognition

3.4. Check CUDA Version

4. API Key Requirements

5. File Organization

6. Google Cloud Storage (GCS) Bucket Setup

7. Credentials Setup

8. Running the OCR System

Troubleshooting

Additional Notes

FilesExpand file tree

setup_guide.md

Latest commit

History

setup_guide.md

File metadata and controls

OCR System Setup Guide

1. Environment File Setup

2. Required Python Packages

3. NVIDIA Setup

3.1. Install NVIDIA Container Toolkit

3.2. Configure Docker to Use NVIDIA Runtime

3.3. Verify GPU Recognition

3.4. Check CUDA Version

4. API Key Requirements

5. File Organization

6. Google Cloud Storage (GCS) Bucket Setup

7. Credentials Setup

8. Running the OCR System

Troubleshooting

Additional Notes