AI-powered PDF data extraction tool with visual verification and confidence
Precision PDF is an open-source document processing platform that extracts structured data from PDFs while showing you exactly where every piece of data comes from. Built with Next.js 15, Convex, and Clerk authentication.
- π Visual Data Verification - See exactly where extracted data comes from in the original PDF
- β‘ Real-time Processing - Live updates as documents are processed
- π Smart Table Recognition - Automatic table detection and CSV export
- π Multiple Export Formats - JSON, CSV, DOCX, Markdown, Text, XLSX
- π₯ Document Type Support - Invoices, medical records, bank statements, forms
- π± Multi-page Documents - Handle complex documents with multiple pages
- π API Access - Full REST API for developers
- π― Interactive Demo - Try 8 real examples without signing up
This repository is currently configured for easy local development with ALL AUTHENTICATION AND SECURITY FEATURES DISABLED.
For production deployment, you MUST:
- Re-enable authentication in
middleware.ts - Configure all environment variables properly
- Follow the Security Configuration Guide
See Security Documentation for complete details.
- Node.js (Latest LTS recommended)
- pnpm package manager
- Convex CLI (
npm install -g convex)
# Clone the repository
git clone https://github.com/yourusername/precision-pdf.git
cd precision-pdf
# Install dependencies
pnpm install
# Set up environment variables
cp .env.example .env.local
# Initialize Convex (creates a new deployment)
npx convex dev
# Start the development server
pnpm run devYour app will be running at http://localhost:3000
Note: The FastAPI processing service is optional for local development. Example documents work without it.
graph TD
A[Next.js Frontend] --> B[Convex Backend]
A --> C[API Routes]
C --> D[FastAPI Service]
D --> E[Landing AI]
B --> F[Document Storage]
B --> G[User Management]
A --> H[Clerk Auth - DISABLED]
C --> I[Stripe Payments]
Core Components:
- Frontend: Next.js 15 with App Router and Tailwind CSS
- Backend: Convex for real-time database and serverless functions
- Authentication: Clerk (currently disabled for local development)
- Processing: External FastAPI service with Landing AI
- Payments: Stripe integration
- UI Components: shadcn/ui component library
| Topic | Description | Link |
|---|---|---|
| Getting Started | Complete setup guide | π Getting Started |
| Security Config | π Security Guide | |
| Architecture | System design & diagrams | π Architecture |
| API Reference | All endpoints & examples | π‘ API Docs |
| Components | UI components & styling | π¨ Components |
| Testing | Writing & running tests | π§ͺ Testing |
| Deployment | Production deployment | π Deployment |
| Topic | Description | Link |
|---|---|---|
| Getting Started | How to use the app | π€ User Guide |
| Uploading Documents | PDF upload process | π Upload Guide |
| Export Formats | Available export options | πΎ Export Guide |
| Troubleshooting | Common issues | π§ Troubleshooting |
| Resource | Description | Link |
|---|---|---|
| curl Examples | Command-line usage | π» curl Examples |
| JavaScript SDK | JS/TS integration | βοΈ JavaScript |
| Python Examples | Python integration | π Python |
# Start development servers (both frontend and backend)
pnpm run dev
# Run only frontend (Next.js)
pnpm run dev:frontend
# Run only backend (Convex)
pnpm run dev:backend
# Build for production
pnpm run build
# Run tests
pnpm run test # Unit tests with Vitest
pnpm run pw:test # E2E tests with Playwright
pnpm run pw:test:ui # Playwright UI mode
# Linting and formatting
pnpm run lintCopy .env.example to .env.local and configure:
# Core Services (Required)
NEXT_PUBLIC_CONVEX_URL="https://your-deployment.convex.cloud"
NEXT_PUBLIC_APP_URL="http://localhost:3000"
# Authentication (Clerk) - Currently disabled
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="pk_test_your-clerk-key"
CLERK_SECRET_KEY="sk_test_your-clerk-secret"
# Document Processing (Optional for local dev)
FAST_API_URL="http://localhost:8000"
FAST_API_SECRET_KEY="your-secret-key"
# Payments (Stripe) - Optional for local dev
STRIPE_PUBLISHABLE_KEY="pk_test_your-stripe-key"
STRIPE_SECRET_KEY="sk_test_your-stripe-secret"See Environment Variables Guide for complete reference.
-
Convex - Backend database and serverless functions
- Sign up at convex.dev
- Free tier available
-
FastAPI Service (Optional for local development)
- Repository: precision_pdf_fast_api
- Handles PDF processing with Landing AI
- Can run locally or deploy to Render
- Clerk - Authentication (currently disabled)
- Stripe - Payment processing
- Landing AI - Document processing AI
- Sentry - Error monitoring
The project includes comprehensive testing infrastructure:
# Unit Tests (Vitest)
pnpm run test # Run once
pnpm run test:watch # Watch mode
pnpm run test:ui # UI interface
# E2E Tests (Playwright)
pnpm run pw:test # Headless
pnpm run pw:test:ui # UI mode
pnpm run pw:test:debug # Debug modeCurrently no tests are implemented, but infrastructure is ready. See Testing Guide.
- Next.js 15 - React framework with App Router
- React 19 - UI library
- Tailwind CSS - Utility-first styling
- shadcn/ui - Component library
- TypeScript - Type safety
- Convex - Real-time database and serverless functions
- Clerk - Authentication (currently disabled)
- Stripe - Payment processing
- FastAPI - Document processing service
- Landing AI - AI-powered document extraction
- Vercel - Frontend hosting
- Render - FastAPI hosting
- Sentry - Error monitoring
- PostHog - Analytics
The app includes 8 pre-processed example documents:
- π§ Invoice
- π¦ Bank Statements (2)
- π₯ Medical Reports (2)
- π Medical Journal Article
- π Mortgage Application
- π Settlement Statement
Examples are stored in /public/examples/ and can be explored without authentication.
We welcome contributions! Please see our Contributing Guide for details on:
- Code style and standards
- Development workflow
- Pull request process
- Issue reporting
# Fork the repo and clone your fork
git clone https://github.com/yourusername/precision-pdf.git
# Create a feature branch
git checkout -b feature/your-feature-name
# Make your changes and test
pnpm run test
pnpm run lint
# Submit a pull request"User not authenticated" errors in development:
- This is expected since authentication is disabled
- Check the security configuration guide
Documents not processing:
- Ensure FastAPI service is running
- Check environment variable configuration
- See Troubleshooting Guide
Build errors:
- Ensure you're using the latest Node.js LTS
- Delete
node_modulesand runpnpm install
- β Core document processing
- β Visual verification interface
- β Multiple export formats
- β Real-time processing updates
β οΈ Authentication (disabled for local dev)β οΈ Testing (infrastructure ready)- π Documentation (in progress)
This project is open source. License details coming soon.
- Documentation: Browse
/docsfolder - Issues: GitHub Issues
- Discussions: GitHub Discussions
For questions about this project:
- GitHub: @robertguss
- Website: precisionpdf.com
β Star this repository if you find it useful! β
