Unified, modern, and blazing-fast Document AI for Python
OmniDocs is your all in one Python toolkit for extracting tables, text, math, and OCR from PDFs and image, powered by classic libraries and state of the art deep learning models. Build robust document workflows with a single, consistent API.
- 🧩 Unified, production-ready API for all tasks
- 🏎️ Fast, GPU-accelerated, and easy to extend
Get started quickly with practical examples for various document processing tasks in the Quick Start Guide.
- See the Getting Started Guide
- Dive into the API Reference
- Table Extraction: Overview
- Text Extraction: Overview
- Math Extraction: Overview
- OCR Extraction: Overview
Choose your preferred method:
- PyPI (Recommended):
pip install omnidocs
- uv pip (Fastest):
uv pip install omnidocs
- From Source:
git clone https://github.com/adithya-s-k/OmniDocs.git cd OmniDocs pip install . or uv sync
- Conda (if available):
conda install -c conda-forge omnidocs
OmniDocs organizes document processing tasks into modular components. Each component corresponds to a specific task and offers:
- A Unified Interface: Consistent input and output formats.
- Model Independence: Switch between libraries or models effortlessly.
- Pipeline Flexibility: Combine components to create custom workflows.
- Add support for semantic understanding tasks (e.g., entity extraction).
- Integrate pre-trained transformer models for context-aware document analysis.
- Expand pipelines for multilingual document processing.
- Add CLI support for batch processing.
We welcome contributions to OmniDocs! Here's how you can help:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes and open a pull request.
For more details, refer to our CONTRIBUTING.md.
This project is licensed under multiple licenses, depending on the models and libraries you use in your pipeline. Please refer to the individual licenses of each component for specific terms and conditions.
If you find OmniDocs helpful, please give us a ⭐ on GitHub and share it with others in the community.
For discussions, questions, or feedback:
- Issues: Report bugs or suggest features here.
- Email: Reach out at [email protected], [email protected]
