Skip to content

Windows-focused fork of Typhoon OCR featuring a modern Next.js web app. Supports multi-page PDF/image OCR to Markdown/HTML, interactive preview, and URL import.

License

Notifications You must be signed in to change notification settings

naravid19/typhoon-ocr

 
 

Repository files navigation

English ภาษาไทย

Contributors Forks Stargazers Issues Apache 2.0 License


Logo

Typhoon OCR

A powerful OCR application for extracting structured markdown from images and PDFs
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Features
  5. Roadmap
  6. Contributing
  7. License
  8. Changelog
  9. Contact
  10. Acknowledgments

About The Project

Product Name Screen Shot

Typhoon OCR is a model for extracting structured markdown from images or PDFs. It supports document layout analysis and table extraction, returning results in markdown or HTML.

This fork provides a modern Next.js web application alongside the original Gradio demo, featuring:

  • 🚀 Modern UI with dark theme and premium aesthetics
  • 📄 Multi-page PDF support with interactive page selection
  • 🔗 URL import for loading documents directly from the web
  • 📊 Real-time progress indicators during OCR processing
  • 🎨 Compare mode to view original and extracted text side-by-side

This fork focuses on Windows 10/11. For macOS/Linux setup, please refer to the official Typhoon OCR repository.

📝 See CHANGELOG.md for latest updates.

(back to top)

Built With

  • Next
  • React
  • TailwindCSS
  • FastAPI
  • Python

(back to top)

Getting Started

To get a local copy up and running follow these steps.

Prerequisites

  • Windows 10/11 with Python 3.10+
  • Node.js 18+ with npm
  • Poppler (for PDF processing)

Install Poppler using PowerShell:

iwr -useb https://github.com/oschwartz10612/poppler-windows/releases/download/v25.07.0-0/Release-25.07.0-0.zip -OutFile $env:TEMP\poppler.zip; rm C:\poppler -Recurse -Force -ErrorAction SilentlyContinue; Expand-Archive $env:TEMP\poppler.zip C:\poppler -Force; $bin=(Get-ChildItem C:\poppler -Recurse -Filter pdfinfo.exe | Select-Object -First 1).DirectoryName; if(-not $bin){throw "pdfinfo.exe not found under C:\poppler"}; $u=[Environment]::GetEnvironmentVariable('Path','User'); if([string]::IsNullOrEmpty($u)){$u=''}; if($u -notlike "*$bin*"){[Environment]::SetEnvironmentVariable('Path', ($u.TrimEnd(';')+';'+$bin).Trim(';'), 'User')}; $env:Path+=';'+$bin; pdfinfo -v

Verify installation:

pdfinfo -v
pdftoppm -v

Installation

  1. Clone the repo

    git clone https://github.com/naravid19/typhoon-ocr.git
    cd typhoon-ocr
  2. Configure environment

    Create a .env file in the project root:

    TYPHOON_BASE_URL=https://api.opentyphoon.ai/v1
    TYPHOON_API_KEY=YOUR_API_KEY
    TYPHOON_OCR_MODEL=typhoon-ocr
  3. Set up Backend (Python)

    python -m venv venv
    .\venv\Scripts\activate
    pip install -r backend/requirements.txt
  4. Set up Frontend (Next.js)

    cd frontend
    npm install
  5. Run the application

    Option A: One-Click Start (Recommended)

    Simply double-click the start_app.bat file in the project root.

    The script automatically detects your virtual environment and opens the browser for you.

    Option B: Manual Start

    Terminal 1 - Backend:

    python -m uvicorn backend.main:app --reload --port 8000

    Terminal 2 - Frontend:

    cd frontend
    npm run dev
  6. Open in browser

    Navigate to http://localhost:3000/ocr

(back to top)

Usage

  1. Upload a document - Drag & drop or click to upload PDF/images
  2. Import from URL - Paste a URL to load documents directly from the web
  3. Select pages - For multi-page PDFs, select specific pages or use quick actions (Select All, Odd/Even, Range)
  4. Configure parameters - Adjust temperature, top_p, and other OCR settings
  5. Run OCR - Click "Run OCR" and monitor progress
  6. View results - Switch between Combined and Compare views

(back to top)

Features

  • ✅ Upload PDF or images (PNG, JPG, WebP)
  • ✅ Multi-page PDF selection with visual grid preview
  • ✅ Import documents from URL (with CORS proxy)
  • ✅ Shift-click for range selection
  • ✅ Quick actions: Select All, Odd/Even pages, Custom range
  • ✅ Two task types: default (Markdown) and structure (HTML tables)
  • ✅ Real-time progress indicator
  • ✅ Compare mode: Original image vs. extracted text
  • ✅ Copy extracted text with one click
  • ✅ Code generator for API integration

(back to top)

Roadmap

  • Modern Next.js frontend
  • Multi-page PDF selection with preview
  • URL import with proxy
  • Progress indicators
  • Compare view mode
  • Batch processing
  • Export to Markdown/HTML file
  • Support for more document types

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

(back to top)

Contact

Project Link: https://github.com/naravid19/typhoon-ocr

(back to top)

Acknowledgments

(back to top)

About

Windows-focused fork of Typhoon OCR featuring a modern Next.js web app. Supports multi-page PDF/image OCR to Markdown/HTML, interactive preview, and URL import.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 68.5%
  • TypeScript 24.4%
  • Batchfile 3.9%
  • CSS 3.0%
  • Other 0.2%