PyMuPDF

High-performance Python libraries for PDF processing, data extraction, and LLM document pipelines.

What we build

This organisation maintains Python libraries for working with PDF and other document formats — from low-level manipulation to LLM-ready data extraction.

Repositories

PyMuPDF — core library

The foundation of everything here. PyMuPDF wraps the MuPDF C engine and exposes a full Python API for reading, rendering, editing, and converting PDF, XPS, EPUB, MOBI, CBZ, and image files.

pip install pymupdf

import pymupdf

doc = pymupdf.open("report.pdf")
page = doc[0]
print(page.get_text())           # extract text
pix = page.get_pixmap(dpi=150)   # render to image
pix.save("page.png")

Key capabilities: text and image extraction · page rendering at any DPI · annotation create/edit/delete · redaction · PDF creation and merging · encryption · OCR via Tesseract · form fields · 10+ output formats

→ Documentation · PyPI · Changelog

pymupdf4llm — PDF → LLM-ready data

Turn any document into clean, structured data for RAG pipelines, vector stores, and LLM ingestion — in one line. No GPU, no cloud, no tokens required.

pip install pymupdf4llm

import pymupdf4llm

md   = pymupdf4llm.to_markdown("paper.pdf")       # Markdown
data = pymupdf4llm.to_json("paper.pdf")           # JSON with bboxes
text = pymupdf4llm.to_text("paper.pdf")           # plain text

Key capabilities: layout-aware extraction · multi-column reading order · table detection → Markdown · smart hybrid OCR (only where needed) · page chunking with metadata · LlamaIndex and LangChain integrations · 10–250× cheaper than vision-LLM approaches

→ Documentation · PyPI · Live demo

langchain-pymupdf4llm — LangChain integration

A drop-in LangChain document loader and parser backed by pymupdf4llm. Extracts PDF content as Markdown and feeds it directly into any LangChain retrieval chain.

pip install langchain-pymupdf4llm

from langchain_pymupdf4llm import PyMuPDF4LLMLoader

loader = PyMuPDF4LLMLoader("document.pdf", mode="single")
docs = loader.load()

→ PyPI

pymupdf4llm-mcp — MCP server

An MCP (Model Context Protocol) server exposing pymupdf4llm as a tool. Gives any MCP-compatible AI client (Claude, Cursor, Windsurf) direct access to PDF-to-Markdown extraction.

uvx pymupdf4llm-mcp@latest stdio

→ PyPI

PyMuPDF-Utilities — demos & examples

A collection of working example scripts, Jupyter notebooks, and GUI demos built on PyMuPDF. Covers image handling, annotation workflows, data extraction patterns, and more — useful as a recipe book alongside the official documentation.

→ Browse examples

pymupdf-fonts — optional font collection

An optional font package extending the fonts available for text output in PyMuPDF. Includes additional Unicode-compatible typefaces beyond the 14 standard PDF fonts.

pip install pymupdf-fonts

Supported document formats

Format	PyMuPDF	pymupdf4llm
PDF (all versions, encrypted)	✅	✅
XPS / OpenXPS	✅	✅
EPUB / MOBI / FB2	✅	✅
CBZ / CBT (comic book)	✅	—
Images (PNG, JPG, TIFF…)	✅	✅ (with OCR)
DOCX / XLSX / PPTX / HWP	—	✅ (Pro only)
SVG	✅ (limited)	—

Documentation & community

Resource	Link
Full documentation	https://pymupdf.readthedocs.io?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=docs
pymupdf4llm docs	https://pymupdf.readthedocs.io/en/latest/pymupdf4llm?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=docs
Live demo	https://demo.pymupdf.io?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=demo
Discord	https://pymupdf.io/discord/artifex?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=discord
Forum	https://forum.mupdf.com?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=forum
PyPI · PyMuPDF	https://pypi.org/project/pymupdf
PyPI · pymupdf4llm	https://pypi.org/project/pymupdf4llm
Bug reports & issues	https://github.com/pymupdf/PyMuPDF/issues

License

All repositories in this organisation are available under the GNU AGPL v3 for open-source use, and under a commercial licence for proprietary applications. Commercial licences are available from Artifex Software — the creators and maintainers of MuPDF, PyMuPDF, and this organisation.

_{Maintained by Artifex Software, Inc. · pymupdf.io}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyMuPDF

PyMuPDF

What we build

Repositories

PyMuPDF — core library

pymupdf4llm — PDF → LLM-ready data

langchain-pymupdf4llm — LangChain integration

pymupdf4llm-mcp — MCP server

PyMuPDF-Utilities — demos & examples

pymupdf-fonts — optional font collection

Supported document formats

Documentation & community

License

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!