PageMonk 🧠📄

Transform your documents into structured, intelligent data with AI-powered processing

Features • Quick Start • Documentation • API • Contributing

🌟 Overview

PageMonk is a modern document intelligence platform that combines advanced OCR capabilities with LLM-powered content structuring. Inspired by LlamaIndex and designed with Hex.tech's aesthetic principles, PageMonk transforms any document into clean, searchable markdown and structured data.

Why PageMonk?

AI-First: Leverages Ollama's Qwen2.5 for intelligent content understanding
Developer-Friendly: Clean REST API with comprehensive documentation
Beautiful UI: Modern, intuitive interface that makes document processing enjoyable
Self-Hosted: Run everything locally with full control over your data
Flexible: Custom schema extraction for any document type

✨ Features

🤖 Intelligent Processing

Advanced OCR: Powered by Docling for accurate text extraction and structure recognition
AI Structuring: LLM-powered markdown generation using Ollama Qwen2.5:0.5b
Multi-Format Support: Process PDFs, images, and various document formats seamlessly

🎯 Custom Data Extraction

Schema Builder: Define custom extraction templates for any document type
Flexible Fields: Support for text, numbers, dates, and complex nested structures
Real-Time Processing: Instant extraction with live preview and status updates

🎨 Modern Interface

Hex.tech Inspired: Clean, data-focused design with beautiful typography
Dark Mode: Full dark mode support with system-aware theme switching
Responsive: Perfect experience across desktop, tablet, and mobile devices
Accessibility: WCAG AAA compliant with full keyboard navigation

⚡ Developer Experience

REST API: Comprehensive endpoints for easy integration
Auto Documentation: Interactive API docs at /docs
Type Safety: Full TypeScript support in frontend
Easy Setup: One-command installation and startup

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  React Frontend │────│  FastAPI Backend │────│   Ollama LLM    │
│   (Port 3000)   │    │   (Port 8000)    │    │  (Qwen2.5:0.5b) │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                       ┌──────────────────┐
                       │   SQLite DB      │
                       │  (Documents &    │
                       │    Schemas)      │
                       └──────────────────┘

Tech Stack:

Backend: FastAPI, SQLAlchemy, Docling OCR, Ollama
Frontend: React 18, Tailwind CSS, Axios, React Router
Database: SQLite for simplicity and portability
AI: Ollama with Qwen2.5:0.5b model

🚀 Quick Start

Prerequisites

Python 3.8+
Node.js 16+
Ollama installed

One-Command Setup

# Clone the repository
git clone https://github.com/yourusername/PageMonk.git
cd PageMonk

# Make the startup script executable and run
chmod +x start.sh
./start.sh

This script will:

Install and pull the Qwen2.5:0.5b model
Start the Ollama service
Launch the backend server
Start the frontend development server

Access Points

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Docs: http://localhost:8000/docs

📖 Usage Guide

1. Document Parsing

# Upload a document via CLI
curl -X POST "http://localhost:8000/documents" \
  -F "[email protected]"

# Or use the web interface
# 1. Navigate to http://localhost:3000
# 2. Drag & drop your document
# 3. Click "Parse" to process

The AI will:

Extract text using advanced OCR
Identify document structure (headings, lists, tables)
Generate clean, formatted markdown
Preserve semantic meaning

2. Schema Extraction

Create custom extraction patterns for your documents:

{
  "name": "Invoice Extractor",
  "description": "Extract invoice details",
  "schema_definition": {
    "invoice_number": "string",
    "date": "date",
    "total": "number",
    "items": [
      {
        "description": "string",
        "amount": "number"
      }
    ]
  }
}

Apply schemas via API or UI to extract structured data automatically.

3. API Integration

import requests

# Upload document
with open('document.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/documents',
        files={'file': f}
    )
doc_id = response.json()['id']

# Parse document
requests.post(f'http://localhost:8000/parse/{doc_id}')

# Get parsed content
content = requests.get(f'http://localhost:8000/documents/{doc_id}')
print(content.json()['markdown_content'])

📊 API Endpoints

Documents

GET /documents - List all documents with pagination
POST /documents - Upload new document
GET /documents/{id} - Get document details and content
DELETE /documents/{id} - Delete document
POST /parse/{id} - Parse document with AI

Schemas

GET /schemas - List all extraction schemas
POST /schemas - Create new schema
GET /schemas/{id} - Get schema details
PUT /schemas/{id} - Update schema
DELETE /schemas/{id} - Delete schema
POST /extract - Extract data using schema

For complete interactive documentation, visit http://localhost:8000/docs

🔧 Manual Setup

Click to expand manual setup instructions

1. Install Ollama and Model

# Install Ollama from https://ollama.ai/
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the model
ollama pull qwen2.5:0.5b

# Start Ollama service
ollama serve

2. Backend Setup

cd backend
pip install -r requirements.txt
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

3. Frontend Setup

cd frontend
npm install
npm start

4. Environment Variables (Optional)

Create .env files for custom configuration:

Backend .env:

DATABASE_URL=sqlite:///./documents.db
OLLAMA_BASE_URL=http://localhost:11434
DEFAULT_MODEL=qwen2.5:0.5b

Frontend .env:

REACT_APP_API_URL=http://localhost:8000

📁 Project Structure

PageMonk/
├── backend/                    # FastAPI backend
│   ├── app/
│   │   ├── main.py            # Application entry point
│   │   ├── database.py        # SQLAlchemy models
│   │   ├── models.py          # Pydantic schemas
│   │   └── processor.py       # Document processing logic
│   └── requirements.txt
├── frontend/                   # React frontend
│   ├── src/
│   │   ├── components/        # Reusable UI components
│   │   │   ├── layout/        # Layout components
│   │   │   └── ui/            # Base UI components
│   │   ├── pages/             # Application pages
│   │   │   ├── Home.js        # Dashboard
│   │   │   ├── Parse.js       # Document parsing
│   │   │   ├── Extract.js     # Schema extraction
│   │   │   ├── Documents.js   # Document management
│   │   │   └── Schemas.js     # Schema management
│   │   ├── services/          # API services
│   │   └── App.js             # Main component
│   ├── tailwind.config.js
│   └── package.json
├── start.sh                    # Quick start script
└── README.md

🎨 Design System

PageMonk features a comprehensive design system inspired by modern data platforms:

Typography: Inter font family with optimized scales
Colors: Sophisticated indigo/purple gradient with semantic meanings
Components: 50+ reusable components with consistent API
Animations: Subtle, purposeful transitions for better UX
Accessibility: WCAG AAA compliant with keyboard navigation

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow existing code style and conventions
Add tests for new features
Update documentation as needed
Ensure all tests pass before submitting PR

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Design Inspiration: Hex.tech for UI/UX patterns
AI Processing: Ollama community for local LLM capabilities
OCR Engine: Docling for document understanding
Icons: Heroicons for beautiful iconography

📞 Support & Community

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Check our Wiki

Made with ❤️ for the document processing community

⭐ Star us on GitHub if you find PageMonk useful!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
backend		backend
examples		examples
frontend		frontend
.gitignore		.gitignore
.python-version		.python-version
ChatGPT Image Oct 5, 2025, 07_06_32 PM.png		ChatGPT Image Oct 5, 2025, 07_06_32 PM.png
DESIGN_SYSTEM.md		DESIGN_SYSTEM.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
demo_balance_sheet.py		demo_balance_sheet.py
docker-compose.yml		docker-compose.yml
logo.png		logo.png
output.md		output.md
pyproject.toml		pyproject.toml
setup.sh		setup.sh
start_pagemonk.sh		start_pagemonk.sh
uv.lock		uv.lock

deyaa1251/PageMonk

Folders and files

Latest commit

History

Repository files navigation