Skip to content

EmanueleCannizzaro/ocular

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Ocular OCR Platform

Version 3.0 - Production Ready with Enhanced Features πŸš€

Ocular is a comprehensive, enterprise-grade OCR and AI model deployment platform that provides Optical Character Recognition capabilities using multiple providers, cloud GPU deployment, batch processing, and advanced optimization features. The platform combines powerful OCR engines with intelligent cost optimization and scalable infrastructure management.

✨ Key Features

Core OCR Capabilities

  • πŸš€ 7 OCR Providers: Mistral AI, Google Vision, AWS Textract, Azure Document Intelligence, Tesseract, OLM OCR, RoLM OCR
  • 🎯 4 Processing Strategies: Single, Fallback Chain, Ensemble, Best Quality
  • πŸ“„ Document Support: PDF, images (PNG, JPG, JPEG, BMP, TIFF, WebP)
  • ⚑ Async Processing: Full async/await support for optimal performance

Cloud GPU Deployment

  • ☁️ 4 GPU Providers: RunPod, Vast.ai, Lambda Labs, Replicate
  • πŸŽ›οΈ Advanced Configuration: Environment vars, scaling policies, health checks, resource limits
  • πŸ“Š Version Management: A/B testing, rollback, performance tracking
  • πŸ’° Cost Optimization: Provider comparison, smart recommendations, spot instances

Batch Processing & Optimization (NEW in v3.0)

  • πŸ“¦ Batch OCR Processing: Job queue with priority system, scheduled jobs (CRON support)
  • πŸ”§ Token Optimization: 60-85% image size reduction, 30-60% prompt optimization
  • πŸš„ vLLM Integration: Continuous batching, prefix caching, 40-70% cost savings
  • πŸ“ˆ Performance: 2-5x throughput improvement with vLLM

Enterprise Features

  • πŸ” Authentication: Clerk.com integration, RBAC with 26 permissions, API keys
  • πŸ’³ Billing: Stripe integration, automated invoicing, cost tracking, budget alerts
  • πŸ“Š Analytics: Real-time dashboards, deployment metrics, performance trends
  • πŸ” Observability: Logs, distributed tracing, metrics, intelligent alerts

Web Application

  • 🌐 Modern UI: FastAPI backend, Bootstrap 5 frontend, Chart.js visualizations
  • 🎨 Interactive Dashboards: Models, Analytics, Billing, Observability, Settings
  • πŸ§ͺ Model Testing: Interactive test interface with performance metrics
  • πŸ“€ Bulk Operations: Batch upload, comparison, export, status updates

Installation

# Install the package
uv pip install ocular-ocr

# Or for development
git clone https://github.com/your-repo/ocular.git
cd ocular
uv add -e .

Quick Start

Basic Usage

import asyncio
from pathlib import Path
from ocular import UnifiedDocumentProcessor

async def basic_ocr():
    """Basic OCR example."""
    print("=== Basic OCR with Ocular ===")
    
    # Initialize processor (uses environment variables)
    processor = UnifiedDocumentProcessor()
    
    # Process a document
    file_path = Path("sample_document.pdf")  # or .jpg, .png
    
    if file_path.exists():
        result = await processor.process_document(file_path)
        print(f"Processed: {result.file_path.name}")
        print(f"Processing time: {result.processing_time:.2f}s")
        print(f"Provider used: {result.provider_used}")
        print(f"Extracted text:\n{result.get_full_text()}")
    else:
        print(f"File not found: {file_path}")

if __name__ == "__main__":
    # Set MISTRAL_API_KEY in your environment
    asyncio.run(basic_ocr())

Web Interface

Start the web application:

# From the project root
cd app
python ocular_app.py

# Or with uvicorn
uvicorn app.ocular_app:app --reload

Visit http://localhost:8000 to use the web interface.

Configuration

Create a .env file in the project root:

# Primary provider (required)
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional providers
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_DOC_INTEL_API_KEY=your_azure_key

# Provider settings
MISTRAL_MODEL=pixtral-12b-2409
TIMEOUT_SECONDS=30
MAX_RETRIES=3

Deployment

Google Cloud Functions

The project includes ready-to-deploy Google Cloud Functions configuration:

# Deploy to staging
gcloud functions deploy ocular-ocr-service-staging \
  --source . \
  --entry-point ocular_ocr \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --set-env-vars MISTRAL_API_KEY=your_key

# Or use the GitHub Actions workflow
git push origin main  # Auto-deploys to staging

Local Development

# Install dependencies
uv add -r requirements.txt

# Run locally
python main.py

# Or with functions framework
functions-framework --target=ocular_ocr --debug

Available Providers

  1. Mistral AI - Vision LLM with PDF support (primary)
  2. Google Cloud Vision - Enterprise OCR API
  3. AWS Textract - Document analysis with forms/tables
  4. Azure Document Intelligence - Microsoft's document AI
  5. Tesseract - Local open-source OCR
  6. Custom RunPod Models - OLM/RoLM OCR

Processing Strategies

  • Single: Use one specific provider
  • Fallback: Try providers in order until success
  • Ensemble: Use multiple providers and combine results
  • Best: Select best result from multiple providers

Examples

from ocular import UnifiedDocumentProcessor, ProcessingStrategy, ProviderType

# Use specific provider
processor = UnifiedDocumentProcessor()
result = await processor.process_document(
    "document.pdf",
    strategy=ProcessingStrategy.SINGLE,
    providers=[ProviderType.MISTRAL]
)

# Fallback strategy
result = await processor.process_document(
    "document.pdf",
    strategy=ProcessingStrategy.FALLBACK,
    providers=[ProviderType.MISTRAL, ProviderType.TESSERACT]
)

# With custom prompt
result = await processor.process_document(
    "invoice.pdf",
    prompt="Extract the invoice number, date, and total amount"
)

API Reference

Core Classes

  • UnifiedDocumentProcessor: Main processing interface
  • OCRResult: Processing result with text and metadata
  • OcularSettings: Configuration management
  • ProcessingStrategy: Processing strategy enum
  • ProviderType: Available provider types

Web API Endpoints

  • GET / - Web interface
  • POST /process - Process files
  • GET /health - Health check
  • GET /providers - Available providers
  • GET /debug - Debug information

Contributing

  1. Create a new branch: git checkout -b feature/your-feature
  2. Make your changes
  3. Run tests: python -m pytest
  4. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

  • Documentation: See CLAUDE.md for detailed development guide
  • Issues: Report bugs via GitHub Issues
  • Examples: Check the examples/ directory for more use cases

About

Ocular Deployed on Cloud Service

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •