Transform your documents into structured, intelligent data with AI-powered processing
Features β’ Quick Start β’ Documentation β’ API β’ Contributing
PageMonk is a modern document intelligence platform that combines advanced OCR capabilities with LLM-powered content structuring. Inspired by LlamaIndex and designed with Hex.tech's aesthetic principles, PageMonk transforms any document into clean, searchable markdown and structured data.
- AI-First: Leverages Ollama's Qwen2.5 for intelligent content understanding
- Developer-Friendly: Clean REST API with comprehensive documentation
- Beautiful UI: Modern, intuitive interface that makes document processing enjoyable
- Self-Hosted: Run everything locally with full control over your data
- Flexible: Custom schema extraction for any document type
- Advanced OCR: Powered by Docling for accurate text extraction and structure recognition
- AI Structuring: LLM-powered markdown generation using Ollama Qwen2.5:0.5b
- Multi-Format Support: Process PDFs, images, and various document formats seamlessly
- Schema Builder: Define custom extraction templates for any document type
- Flexible Fields: Support for text, numbers, dates, and complex nested structures
- Real-Time Processing: Instant extraction with live preview and status updates
- Hex.tech Inspired: Clean, data-focused design with beautiful typography
- Dark Mode: Full dark mode support with system-aware theme switching
- Responsive: Perfect experience across desktop, tablet, and mobile devices
- Accessibility: WCAG AAA compliant with full keyboard navigation
- REST API: Comprehensive endpoints for easy integration
- Auto Documentation: Interactive API docs at
/docs - Type Safety: Full TypeScript support in frontend
- Easy Setup: One-command installation and startup
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β React Frontend ββββββ FastAPI Backend ββββββ Ollama LLM β
β (Port 3000) β β (Port 8000) β β (Qwen2.5:0.5b) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
ββββββββββββββββββββ
β SQLite DB β
β (Documents & β
β Schemas) β
ββββββββββββββββββββ
Tech Stack:
- Backend: FastAPI, SQLAlchemy, Docling OCR, Ollama
- Frontend: React 18, Tailwind CSS, Axios, React Router
- Database: SQLite for simplicity and portability
- AI: Ollama with Qwen2.5:0.5b model
- Python 3.8+
- Node.js 16+
- Ollama installed
# Clone the repository
git clone https://github.com/yourusername/PageMonk.git
cd PageMonk
# Make the startup script executable and run
chmod +x start.sh
./start.shThis script will:
- Install and pull the Qwen2.5:0.5b model
- Start the Ollama service
- Launch the backend server
- Start the frontend development server
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
# Upload a document via CLI
curl -X POST "http://localhost:8000/documents" \
-F "[email protected]"
# Or use the web interface
# 1. Navigate to http://localhost:3000
# 2. Drag & drop your document
# 3. Click "Parse" to processThe AI will:
- Extract text using advanced OCR
- Identify document structure (headings, lists, tables)
- Generate clean, formatted markdown
- Preserve semantic meaning
Create custom extraction patterns for your documents:
{
"name": "Invoice Extractor",
"description": "Extract invoice details",
"schema_definition": {
"invoice_number": "string",
"date": "date",
"total": "number",
"items": [
{
"description": "string",
"amount": "number"
}
]
}
}Apply schemas via API or UI to extract structured data automatically.
import requests
# Upload document
with open('document.pdf', 'rb') as f:
response = requests.post(
'http://localhost:8000/documents',
files={'file': f}
)
doc_id = response.json()['id']
# Parse document
requests.post(f'http://localhost:8000/parse/{doc_id}')
# Get parsed content
content = requests.get(f'http://localhost:8000/documents/{doc_id}')
print(content.json()['markdown_content'])GET /documents- List all documents with paginationPOST /documents- Upload new documentGET /documents/{id}- Get document details and contentDELETE /documents/{id}- Delete documentPOST /parse/{id}- Parse document with AI
GET /schemas- List all extraction schemasPOST /schemas- Create new schemaGET /schemas/{id}- Get schema detailsPUT /schemas/{id}- Update schemaDELETE /schemas/{id}- Delete schemaPOST /extract- Extract data using schema
For complete interactive documentation, visit http://localhost:8000/docs
Click to expand manual setup instructions
# Install Ollama from https://ollama.ai/
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the model
ollama pull qwen2.5:0.5b
# Start Ollama service
ollama servecd backend
pip install -r requirements.txt
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
npm startCreate .env files for custom configuration:
Backend .env:
DATABASE_URL=sqlite:///./documents.db
OLLAMA_BASE_URL=http://localhost:11434
DEFAULT_MODEL=qwen2.5:0.5bFrontend .env:
REACT_APP_API_URL=http://localhost:8000PageMonk/
βββ backend/ # FastAPI backend
β βββ app/
β β βββ main.py # Application entry point
β β βββ database.py # SQLAlchemy models
β β βββ models.py # Pydantic schemas
β β βββ processor.py # Document processing logic
β βββ requirements.txt
βββ frontend/ # React frontend
β βββ src/
β β βββ components/ # Reusable UI components
β β β βββ layout/ # Layout components
β β β βββ ui/ # Base UI components
β β βββ pages/ # Application pages
β β β βββ Home.js # Dashboard
β β β βββ Parse.js # Document parsing
β β β βββ Extract.js # Schema extraction
β β β βββ Documents.js # Document management
β β β βββ Schemas.js # Schema management
β β βββ services/ # API services
β β βββ App.js # Main component
β βββ tailwind.config.js
β βββ package.json
βββ start.sh # Quick start script
βββ README.md
PageMonk features a comprehensive design system inspired by modern data platforms:
- Typography: Inter font family with optimized scales
- Colors: Sophisticated indigo/purple gradient with semantic meanings
- Components: 50+ reusable components with consistent API
- Animations: Subtle, purposeful transitions for better UX
- Accessibility: WCAG AAA compliant with keyboard navigation
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow existing code style and conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
This project is licensed under the MIT License - see the LICENSE file for details.
- Design Inspiration: Hex.tech for UI/UX patterns
- AI Processing: Ollama community for local LLM capabilities
- OCR Engine: Docling for document understanding
- Icons: Heroicons for beautiful iconography
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Check our Wiki
Made with β€οΈ for the document processing community
β Star us on GitHub if you find PageMonk useful!