Ocular OCR Platform

Version 3.0 - Production Ready with Enhanced Features 🚀

Ocular is a comprehensive, enterprise-grade OCR and AI model deployment platform that provides Optical Character Recognition capabilities using multiple providers, cloud GPU deployment, batch processing, and advanced optimization features. The platform combines powerful OCR engines with intelligent cost optimization and scalable infrastructure management.

✨ Key Features

Core OCR Capabilities

🚀 7 OCR Providers: Mistral AI, Google Vision, AWS Textract, Azure Document Intelligence, Tesseract, OLM OCR, RoLM OCR
🎯 4 Processing Strategies: Single, Fallback Chain, Ensemble, Best Quality
📄 Document Support: PDF, images (PNG, JPG, JPEG, BMP, TIFF, WebP)
⚡ Async Processing: Full async/await support for optimal performance

Cloud GPU Deployment

☁️ 4 GPU Providers: RunPod, Vast.ai, Lambda Labs, Replicate
🎛️ Advanced Configuration: Environment vars, scaling policies, health checks, resource limits
📊 Version Management: A/B testing, rollback, performance tracking
💰 Cost Optimization: Provider comparison, smart recommendations, spot instances

Batch Processing & Optimization (NEW in v3.0)

📦 Batch OCR Processing: Job queue with priority system, scheduled jobs (CRON support)
🔧 Token Optimization: 60-85% image size reduction, 30-60% prompt optimization
🚄 vLLM Integration: Continuous batching, prefix caching, 40-70% cost savings
📈 Performance: 2-5x throughput improvement with vLLM

Enterprise Features

🔐 Authentication: Clerk.com integration, RBAC with 26 permissions, API keys
💳 Billing: Stripe integration, automated invoicing, cost tracking, budget alerts
📊 Analytics: Real-time dashboards, deployment metrics, performance trends
🔍 Observability: Logs, distributed tracing, metrics, intelligent alerts

Web Application

🌐 Modern UI: FastAPI backend, Bootstrap 5 frontend, Chart.js visualizations
🎨 Interactive Dashboards: Models, Analytics, Billing, Observability, Settings
🧪 Model Testing: Interactive test interface with performance metrics
📤 Bulk Operations: Batch upload, comparison, export, status updates

Installation

# Install the package
uv pip install ocular-ocr

# Or for development
git clone https://github.com/your-repo/ocular.git
cd ocular
uv add -e .

Quick Start

Basic Usage

import asyncio
from pathlib import Path
from ocular import UnifiedDocumentProcessor

async def basic_ocr():
    """Basic OCR example."""
    print("=== Basic OCR with Ocular ===")
    
    # Initialize processor (uses environment variables)
    processor = UnifiedDocumentProcessor()
    
    # Process a document
    file_path = Path("sample_document.pdf")  # or .jpg, .png
    
    if file_path.exists():
        result = await processor.process_document(file_path)
        print(f"Processed: {result.file_path.name}")
        print(f"Processing time: {result.processing_time:.2f}s")
        print(f"Provider used: {result.provider_used}")
        print(f"Extracted text:\n{result.get_full_text()}")
    else:
        print(f"File not found: {file_path}")

if __name__ == "__main__":
    # Set MISTRAL_API_KEY in your environment
    asyncio.run(basic_ocr())

Web Interface

Start the web application:

# From the project root
cd app
python ocular_app.py

# Or with uvicorn
uvicorn app.ocular_app:app --reload

Visit http://localhost:8000 to use the web interface.

Configuration

Create a .env file in the project root:

# Primary provider (required)
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional providers
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_DOC_INTEL_API_KEY=your_azure_key

# Provider settings
MISTRAL_MODEL=pixtral-12b-2409
TIMEOUT_SECONDS=30
MAX_RETRIES=3

Deployment

Google Cloud Functions

The project includes ready-to-deploy Google Cloud Functions configuration:

# Deploy to staging
gcloud functions deploy ocular-ocr-service-staging \
  --source . \
  --entry-point ocular_ocr \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --set-env-vars MISTRAL_API_KEY=your_key

# Or use the GitHub Actions workflow
git push origin main  # Auto-deploys to staging

Local Development

# Install dependencies
uv add -r requirements.txt

# Run locally
python main.py

# Or with functions framework
functions-framework --target=ocular_ocr --debug

Available Providers

Mistral AI - Vision LLM with PDF support (primary)
Google Cloud Vision - Enterprise OCR API
AWS Textract - Document analysis with forms/tables
Azure Document Intelligence - Microsoft's document AI
Tesseract - Local open-source OCR
Custom RunPod Models - OLM/RoLM OCR

Processing Strategies

Single: Use one specific provider
Fallback: Try providers in order until success
Ensemble: Use multiple providers and combine results
Best: Select best result from multiple providers

Examples

from ocular import UnifiedDocumentProcessor, ProcessingStrategy, ProviderType

# Use specific provider
processor = UnifiedDocumentProcessor()
result = await processor.process_document(
    "document.pdf",
    strategy=ProcessingStrategy.SINGLE,
    providers=[ProviderType.MISTRAL]
)

# Fallback strategy
result = await processor.process_document(
    "document.pdf",
    strategy=ProcessingStrategy.FALLBACK,
    providers=[ProviderType.MISTRAL, ProviderType.TESSERACT]
)

# With custom prompt
result = await processor.process_document(
    "invoice.pdf",
    prompt="Extract the invoice number, date, and total amount"
)

API Reference

Core Classes

UnifiedDocumentProcessor: Main processing interface
OCRResult: Processing result with text and metadata
OcularSettings: Configuration management
ProcessingStrategy: Processing strategy enum
ProviderType: Available provider types

Web API Endpoints

GET / - Web interface
POST /process - Process files
GET /health - Health check
GET /providers - Available providers
GET /debug - Debug information

Contributing

Create a new branch: git checkout -b feature/your-feature
Make your changes
Run tests: python -m pytest
Submit a pull request

License

MIT License - see LICENSE file for details.

Support

Documentation: See CLAUDE.md for detailed development guide
Issues: Report bugs via GitHub Issues
Examples: Check the examples/ directory for more use cases

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.claude		.claude
.github/workflows		.github/workflows
_examples		_examples
data		data
docs		docs
examples		examples
notebooks		notebooks
ocular.egg-info		ocular.egg-info
ocular		ocular
scripts		scripts
tests		tests
.env.example		.env.example
.gcloudignore		.gcloudignore
.gitignore		.gitignore
API_USAGE_EXAMPLES.md		API_USAGE_EXAMPLES.md
BATCH_PROCESSING_GUIDE.md		BATCH_PROCESSING_GUIDE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COMPLETE_FEATURES_LIST.md		COMPLETE_FEATURES_LIST.md
COMPLETE_PROJECT_SUMMARY.md		COMPLETE_PROJECT_SUMMARY.md
DEPLOYMENT_CHECKLIST.md		DEPLOYMENT_CHECKLIST.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
DEPLOYMENT_IMPLEMENTATION.md		DEPLOYMENT_IMPLEMENTATION.md
DESIGN_IMPROVEMENTS_IMPLEMENTED.md		DESIGN_IMPROVEMENTS_IMPLEMENTED.md
DESIGN_REVIEW_REPORT.md		DESIGN_REVIEW_REPORT.md
DEVELOPMENT_PLAN.md		DEVELOPMENT_PLAN.md
DOCUMENT_TEMPLATE.md		DOCUMENT_TEMPLATE.md
FINAL_COMPLETION_REPORT.md		FINAL_COMPLETION_REPORT.md
FINAL_ENHANCEMENTS_SUMMARY.md		FINAL_ENHANCEMENTS_SUMMARY.md
FINAL_PROJECT_STATUS.md		FINAL_PROJECT_STATUS.md
FINAL_PROJECT_SUMMARY.md		FINAL_PROJECT_SUMMARY.md
FINAL_SESSION_SUMMARY.md		FINAL_SESSION_SUMMARY.md
GEMINI.md		GEMINI.md
I18N_IMPLEMENTATION_SUMMARY.md		I18N_IMPLEMENTATION_SUMMARY.md
I18N_README.md		I18N_README.md
I18N_STATUS.md		I18N_STATUS.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
INTERNATIONALIZATION_GUIDE.md		INTERNATIONALIZATION_GUIDE.md
LICENSE		LICENSE
NAVIGATION_REDESIGN.md		NAVIGATION_REDESIGN.md
PROJECT_COMPLETE_SUMMARY.md		PROJECT_COMPLETE_SUMMARY.md
PROJECT_COMPLETION_SESSION_8.md		PROJECT_COMPLETION_SESSION_8.md
PROJECT_COMPLETION_SUMMARY.md		PROJECT_COMPLETION_SUMMARY.md
README.md		README.md
README_COMPLETE.md		README_COMPLETE.md
RELEASE_NOTES_v3.0.md		RELEASE_NOTES_v3.0.md
S3_UPLOAD_GUIDE.md		S3_UPLOAD_GUIDE.md
SESSION_4_SUMMARY.md		SESSION_4_SUMMARY.md
SESSION_7_AUTHENTICATION_SUMMARY.md		SESSION_7_AUTHENTICATION_SUMMARY.md
SESSION_9_SUMMARY.md		SESSION_9_SUMMARY.md
TROUBLESHOOTING_GUIDE.md		TROUBLESHOOTING_GUIDE.md
babel.cfg		babel.cfg
cloudbuild.yaml		cloudbuild.yaml
main-cloudrun.py		main-cloudrun.py
main.py.BAK		main.py.BAK
ocular.db		ocular.db
ocular_deployments.db		ocular_deployments.db
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-i18n.txt		requirements-i18n.txt
requirements.txt		requirements.txt
test_deployment_form.py		test_deployment_form.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ocular OCR Platform

✨ Key Features

Core OCR Capabilities

Cloud GPU Deployment

Batch Processing & Optimization (NEW in v3.0)

Enterprise Features

Web Application

Installation

Quick Start

Basic Usage

Web Interface

Configuration

Deployment

Google Cloud Functions

Local Development

Available Providers

Processing Strategies

Examples

API Reference

Core Classes

Web API Endpoints

Contributing

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

EmanueleCannizzaro/ocular

Folders and files

Latest commit

History

Repository files navigation

Ocular OCR Platform

✨ Key Features

Core OCR Capabilities

Cloud GPU Deployment

Batch Processing & Optimization (NEW in v3.0)

Enterprise Features

Web Application

Installation

Quick Start

Basic Usage

Web Interface

Configuration

Deployment

Google Cloud Functions

Local Development

Available Providers

Processing Strategies

Examples

API Reference

Core Classes

Web API Endpoints

Contributing

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages