A powerful OCR application for extracting structured markdown from images and PDFs
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
Typhoon OCR is a model for extracting structured markdown from images or PDFs. It supports document layout analysis and table extraction, returning results in markdown or HTML.
This fork provides a modern Next.js web application alongside the original Gradio demo, featuring:
- 🚀 Modern UI with dark theme and premium aesthetics
- 📄 Multi-page PDF support with interactive page selection
- 🔗 URL import for loading documents directly from the web
- 📊 Real-time progress indicators during OCR processing
- 🎨 Compare mode to view original and extracted text side-by-side
This fork focuses on Windows 10/11. For macOS/Linux setup, please refer to the official Typhoon OCR repository.
📝 See CHANGELOG.md for latest updates.
To get a local copy up and running follow these steps.
- Windows 10/11 with Python 3.10+
- Node.js 18+ with npm
- Poppler (for PDF processing)
Install Poppler using PowerShell:
iwr -useb https://github.com/oschwartz10612/poppler-windows/releases/download/v25.07.0-0/Release-25.07.0-0.zip -OutFile $env:TEMP\poppler.zip; rm C:\poppler -Recurse -Force -ErrorAction SilentlyContinue; Expand-Archive $env:TEMP\poppler.zip C:\poppler -Force; $bin=(Get-ChildItem C:\poppler -Recurse -Filter pdfinfo.exe | Select-Object -First 1).DirectoryName; if(-not $bin){throw "pdfinfo.exe not found under C:\poppler"}; $u=[Environment]::GetEnvironmentVariable('Path','User'); if([string]::IsNullOrEmpty($u)){$u=''}; if($u -notlike "*$bin*"){[Environment]::SetEnvironmentVariable('Path', ($u.TrimEnd(';')+';'+$bin).Trim(';'), 'User')}; $env:Path+=';'+$bin; pdfinfo -vVerify installation:
pdfinfo -v
pdftoppm -v-
Clone the repo
git clone https://github.com/naravid19/typhoon-ocr.git cd typhoon-ocr -
Configure environment
Create a
.envfile in the project root:TYPHOON_BASE_URL=https://api.opentyphoon.ai/v1 TYPHOON_API_KEY=YOUR_API_KEY TYPHOON_OCR_MODEL=typhoon-ocr
-
Set up Backend (Python)
python -m venv venv .\venv\Scripts\activate pip install -r backend/requirements.txt
-
Set up Frontend (Next.js)
cd frontend npm install -
Run the application
Simply double-click the
start_app.batfile in the project root.The script automatically detects your virtual environment and opens the browser for you.
Terminal 1 - Backend:
python -m uvicorn backend.main:app --reload --port 8000
Terminal 2 - Frontend:
cd frontend npm run dev -
Open in browser
Navigate to http://localhost:3000/ocr
- Upload a document - Drag & drop or click to upload PDF/images
- Import from URL - Paste a URL to load documents directly from the web
- Select pages - For multi-page PDFs, select specific pages or use quick actions (Select All, Odd/Even, Range)
- Configure parameters - Adjust temperature, top_p, and other OCR settings
- Run OCR - Click "Run OCR" and monitor progress
- View results - Switch between Combined and Compare views
- ✅ Upload PDF or images (PNG, JPG, WebP)
- ✅ Multi-page PDF selection with visual grid preview
- ✅ Import documents from URL (with CORS proxy)
- ✅ Shift-click for range selection
- ✅ Quick actions: Select All, Odd/Even pages, Custom range
- ✅ Two task types:
default(Markdown) andstructure(HTML tables) - ✅ Real-time progress indicator
- ✅ Compare mode: Original image vs. extracted text
- ✅ Copy extracted text with one click
- ✅ Code generator for API integration
- Modern Next.js frontend
- Multi-page PDF selection with preview
- URL import with proxy
- Progress indicators
- Compare view mode
- Batch processing
- Export to Markdown/HTML file
- Support for more document types
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the Apache 2.0 License. See LICENSE for more information.
Project Link: https://github.com/naravid19/typhoon-ocr
- SCB10X Typhoon OCR - Original project
- OpenAI - API compatibility
- Best-README-Template - README template
- Shields.io - Badges
