This guide provides step-by-step instructions for setting up the EJU OCR system, including environment configuration, NVIDIA setup, API key requirements, and file organization.
By default,the files are stored in the user’s directory (/home/jupyter), but you should modify the path according to your own environment.
Important update If you are using the v2.0_initial version, please enter the following bash code in your terminal.
sudo usermod -aG docker jupyter
sudo rebootCreate a .env file in your project directory with the following content. Replace the placeholder values with your actual API keys and credentials:
OPENAI_API_KEY=your_openai_api_key_here
MATHPIX_APP_ID=your_mathpix_app_id_here
MATHPIX_APP_KEY=your_mathpix_app_key_here
GOOGLE_SHEETS_SPREADSHEET_ID=your_google_sheets_id_here
GOOGLE_APPLICATION_CREDENTIALS=/home/jupyter/credentials/Vision_S.Account.json
GEMINI_API_KEY=your_gemini_api_key_here
Install the required Python packages:
pip install google-genai
pip install openaiFollow these steps to set up NVIDIA for GPU acceleration:
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkitCheck if the Docker daemon configuration file exists:
cat /etc/docker/daemon.jsonIf the file doesn't exist or doesn't contain NVIDIA runtime configuration, create or edit it:
sudo nano /etc/docker/daemon.jsonAdd the following content (make sure to maintain proper indentation):
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}Test if Docker can access the GPU:
docker run --gpus all nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 nvidia-smiVerify the CUDA version:
docker run --gpus all --rm nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 nvcc --versionIf both commands display output without errors, your NVIDIA setup is complete!
You need to obtain API keys from the following services:
- OpenAI API Key: Register at OpenAI Platform to get your API key.
- Gemini API Key: Get your API key from Google AI Studio.
- MathPix API Key and App ID: Register at MathPix to get your API key and App ID.
- Google Cloud Service Account: Create a service account with Vision API and Storage permissions in the Google Cloud Console.
The following files must be in the same directory (e.g., in a docker folder):
Dockerfileadvanced_ocr.pycustom_doclayout_yolo.py
- Create a GCS bucket in the Google Cloud Console.
- Make sure your service account has the necessary permissions to access this bucket.
- Update the
GCS_BUCKET_NAMEenvironment variable in your.envfile with your bucket name.
Create a credentials directory to store your Google service account JSON files:
mkdir -p /home/jupyter/credentials Place your service account JSON files in this directory:
Vision_S.Account.json- For Google Vision APISheets_S.Account.json- For Google Sheets API
After completing all the setup steps, you can run the OCR system using the Docker container:
python ocr_stage1.pyThis will:
- Build the Docker image if it doesn't exist
- Mount the input, output, and credentials directories
- Run the OCR processing on your PDF files
- If you encounter GPU-related errors, make sure your NVIDIA drivers are properly installed and compatible with the CUDA version.
- If API calls fail, verify that your API keys are correctly set in the
.envfile. - For Docker-related issues, check that the Docker daemon is running and properly configured for NVIDIA runtime.
- The OCR system processes PDF files from the input directory specified in the
OCR_stage1.pyscript. - Results are saved to the output directory and also uploaded to your GCS bucket.
- To customize the output language, modify the prompt templates in the OCR scripts.