Skip to content

robertguss/convex-precision-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Precision PDF

Precision PDF Logo

AI-powered PDF data extraction tool with visual verification and confidence

Precision PDF is an open-source document processing platform that extracts structured data from PDFs while showing you exactly where every piece of data comes from. Built with Next.js 15, Convex, and Clerk authentication.

✨ Key Features

  • πŸ” Visual Data Verification - See exactly where extracted data comes from in the original PDF
  • ⚑ Real-time Processing - Live updates as documents are processed
  • πŸ“Š Smart Table Recognition - Automatic table detection and CSV export
  • πŸ“„ Multiple Export Formats - JSON, CSV, DOCX, Markdown, Text, XLSX
  • πŸ₯ Document Type Support - Invoices, medical records, bank statements, forms
  • πŸ“± Multi-page Documents - Handle complex documents with multiple pages
  • πŸ”Œ API Access - Full REST API for developers
  • 🎯 Interactive Demo - Try 8 real examples without signing up

🚨 Security Notice (Important for Developers)

This repository is currently configured for easy local development with ALL AUTHENTICATION AND SECURITY FEATURES DISABLED.

For production deployment, you MUST:

See Security Documentation for complete details.

πŸš€ Quick Start

Prerequisites

  • Node.js (Latest LTS recommended)
  • pnpm package manager
  • Convex CLI (npm install -g convex)

5-Minute Setup

# Clone the repository
git clone https://github.com/yourusername/precision-pdf.git
cd precision-pdf

# Install dependencies
pnpm install

# Set up environment variables
cp .env.example .env.local

# Initialize Convex (creates a new deployment)
npx convex dev

# Start the development server
pnpm run dev

Your app will be running at http://localhost:3000

Note: The FastAPI processing service is optional for local development. Example documents work without it.

πŸ— Architecture Overview

graph TD
    A[Next.js Frontend] --> B[Convex Backend]
    A --> C[API Routes]
    C --> D[FastAPI Service]
    D --> E[Landing AI]
    B --> F[Document Storage]
    B --> G[User Management]
    A --> H[Clerk Auth - DISABLED]
    C --> I[Stripe Payments]
Loading

Core Components:

  • Frontend: Next.js 15 with App Router and Tailwind CSS
  • Backend: Convex for real-time database and serverless functions
  • Authentication: Clerk (currently disabled for local development)
  • Processing: External FastAPI service with Landing AI
  • Payments: Stripe integration
  • UI Components: shadcn/ui component library

πŸ“š Documentation

For Developers

Topic Description Link
Getting Started Complete setup guide πŸ“– Getting Started
Security Config ⚠️ Critical: Auth setup πŸ” Security Guide
Architecture System design & diagrams πŸ— Architecture
API Reference All endpoints & examples πŸ“‘ API Docs
Components UI components & styling 🎨 Components
Testing Writing & running tests πŸ§ͺ Testing
Deployment Production deployment πŸš€ Deployment

For End Users

Topic Description Link
Getting Started How to use the app πŸ‘€ User Guide
Uploading Documents PDF upload process πŸ“„ Upload Guide
Export Formats Available export options πŸ’Ύ Export Guide
Troubleshooting Common issues πŸ”§ Troubleshooting

API Integration

Resource Description Link
curl Examples Command-line usage πŸ’» curl Examples
JavaScript SDK JS/TS integration βš›οΈ JavaScript
Python Examples Python integration 🐍 Python

πŸ›  Development Commands

# Start development servers (both frontend and backend)
pnpm run dev

# Run only frontend (Next.js)
pnpm run dev:frontend

# Run only backend (Convex)
pnpm run dev:backend

# Build for production
pnpm run build

# Run tests
pnpm run test              # Unit tests with Vitest
pnpm run pw:test          # E2E tests with Playwright
pnpm run pw:test:ui       # Playwright UI mode

# Linting and formatting
pnpm run lint

🌍 Environment Variables

Copy .env.example to .env.local and configure:

# Core Services (Required)
NEXT_PUBLIC_CONVEX_URL="https://your-deployment.convex.cloud"
NEXT_PUBLIC_APP_URL="http://localhost:3000"

# Authentication (Clerk) - Currently disabled
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="pk_test_your-clerk-key"
CLERK_SECRET_KEY="sk_test_your-clerk-secret"

# Document Processing (Optional for local dev)
FAST_API_URL="http://localhost:8000"
FAST_API_SECRET_KEY="your-secret-key"

# Payments (Stripe) - Optional for local dev
STRIPE_PUBLISHABLE_KEY="pk_test_your-stripe-key"
STRIPE_SECRET_KEY="sk_test_your-stripe-secret"

See Environment Variables Guide for complete reference.

πŸ”Œ External Dependencies

Required Services

  1. Convex - Backend database and serverless functions

  2. FastAPI Service (Optional for local development)

    • Repository: precision_pdf_fast_api
    • Handles PDF processing with Landing AI
    • Can run locally or deploy to Render

Optional Services (For production)

  1. Clerk - Authentication (currently disabled)
  2. Stripe - Payment processing
  3. Landing AI - Document processing AI
  4. Sentry - Error monitoring

πŸ§ͺ Testing

The project includes comprehensive testing infrastructure:

# Unit Tests (Vitest)
pnpm run test          # Run once
pnpm run test:watch    # Watch mode
pnpm run test:ui       # UI interface

# E2E Tests (Playwright)
pnpm run pw:test       # Headless
pnpm run pw:test:ui    # UI mode
pnpm run pw:test:debug # Debug mode

Currently no tests are implemented, but infrastructure is ready. See Testing Guide.

πŸ“¦ Tech Stack

Frontend

  • Next.js 15 - React framework with App Router
  • React 19 - UI library
  • Tailwind CSS - Utility-first styling
  • shadcn/ui - Component library
  • TypeScript - Type safety

Backend

  • Convex - Real-time database and serverless functions
  • Clerk - Authentication (currently disabled)
  • Stripe - Payment processing

External Services

  • FastAPI - Document processing service
  • Landing AI - AI-powered document extraction

DevOps & Monitoring

  • Vercel - Frontend hosting
  • Render - FastAPI hosting
  • Sentry - Error monitoring
  • PostHog - Analytics

πŸ“„ Example Documents

The app includes 8 pre-processed example documents:

  • πŸ“§ Invoice
  • 🏦 Bank Statements (2)
  • πŸ₯ Medical Reports (2)
  • πŸ“‘ Medical Journal Article
  • 🏠 Mortgage Application
  • πŸ“‹ Settlement Statement

Examples are stored in /public/examples/ and can be explored without authentication.

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details on:

  • Code style and standards
  • Development workflow
  • Pull request process
  • Issue reporting

Quick Contribution Setup

# Fork the repo and clone your fork
git clone https://github.com/yourusername/precision-pdf.git

# Create a feature branch
git checkout -b feature/your-feature-name

# Make your changes and test
pnpm run test
pnpm run lint

# Submit a pull request

πŸ› Troubleshooting

Common Issues

"User not authenticated" errors in development:

  • This is expected since authentication is disabled
  • Check the security configuration guide

Documents not processing:

Build errors:

  • Ensure you're using the latest Node.js LTS
  • Delete node_modules and run pnpm install

πŸ“Š Project Status

  • βœ… Core document processing
  • βœ… Visual verification interface
  • βœ… Multiple export formats
  • βœ… Real-time processing updates
  • ⚠️ Authentication (disabled for local dev)
  • ⚠️ Testing (infrastructure ready)
  • πŸ”„ Documentation (in progress)

πŸ“œ License

This project is open source. License details coming soon.

πŸ†˜ Support

πŸ“ž Contact

For questions about this project:


⭐ Star this repository if you find it useful! ⭐

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published