🚀 OmniDocs

Unified, modern, and blazing-fast Document AI for Python

OmniDocs is your all in one Python toolkit for extracting tables, text, math, and OCR from PDFs and image, powered by classic libraries and state of the art deep learning models. Build robust document workflows with a single, consistent API.

🧩 Unified, production-ready API for all tasks
🏎️ Fast, GPU-accelerated, and easy to extend

⚡ Quick Start

Get started quickly with practical examples for various document processing tasks in the Quick Start Guide.

🏁 Get Started

See the Getting Started Guide
Dive into the API Reference

📖 Tutorials

Table Extraction: Overview
Text Extraction: Overview
- PyMuPDF
- PDFPlumber
- PyPDF2
- PDFText
- Surya Text
- Docling Parse
Math Extraction: Overview
- UniMERNet
- SuryaMath
- Nougat
- Donut
OCR Extraction: Overview
- PaddleOCR
- Tesseract
- EasyOCR
- SuryaOCR

🛠️ Installation

Choose your preferred method:

PyPI (Recommended):
```
pip install omnidocs
```
uv pip (Fastest):
```
uv pip install omnidocs
```

From Source:

git clone https://github.com/adithya-s-k/OmniDocs.git
cd OmniDocs
pip install . 
or 
uv sync

Conda (if available):
```
conda install -c conda-forge omnidocs
```

🏗️ How It Works

OmniDocs organizes document processing tasks into modular components. Each component corresponds to a specific task and offers:

A Unified Interface: Consistent input and output formats.
Model Independence: Switch between libraries or models effortlessly.
Pipeline Flexibility: Combine components to create custom workflows.

📈 Roadmap

Add support for semantic understanding tasks (e.g., entity extraction).
Integrate pre-trained transformer models for context-aware document analysis.
Expand pipelines for multilingual document processing.
Add CLI support for batch processing.

🤝 Contributing

We welcome contributions to OmniDocs! Here's how you can help:

Fork the repository.
Create a new branch for your feature or bug fix.
Commit your changes and open a pull request.

For more details, refer to our CONTRIBUTING.md.

🛡️ License

This project is licensed under multiple licenses, depending on the models and libraries you use in your pipeline. Please refer to the individual licenses of each component for specific terms and conditions.

🌟 Support the Project

If you find OmniDocs helpful, please give us a ⭐ on GitHub and share it with others in the community.

🗨️ Join the Community

For discussions, questions, or feedback:

Issues: Report bugs or suggest features here.
Email: Reach out at [email protected], [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
omnidocs		omnidocs
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
ROADMAP.md		ROADMAP.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 OmniDocs

⚡ Quick Start

🏁 Get Started

📖 Tutorials

🛠️ Installation

🏗️ How It Works

📈 Roadmap

🤝 Contributing

🛡️ License

🌟 Support the Project

🗨️ Join the Community

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

adithya-s-k/Omnidocs

Folders and files

Latest commit

History

Repository files navigation

🚀 OmniDocs

⚡ Quick Start

🏁 Get Started

📖 Tutorials

🛠️ Installation

🏗️ How It Works

📈 Roadmap

🤝 Contributing

🛡️ License

🌟 Support the Project

🗨️ Join the Community

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages