Skip to content
@pymupdf

PyMuPDF

This represents the central repository, PyMuPDF and related repositories

PyMuPDF logo

PyMuPDF

High-performance Python libraries for PDF processing, data extraction, and LLM document pipelines.

PyMuPDF on PyPI pymupdf4llm on PyPI Documentation Discord Forum License


What we build

This organisation maintains Python libraries for working with PDF and other document formats — from low-level manipulation to LLM-ready data extraction.


Repositories

PyMuPDF — core library

Stars PyPI

The foundation of everything here. PyMuPDF wraps the MuPDF C engine and exposes a full Python API for reading, rendering, editing, and converting PDF, XPS, EPUB, MOBI, CBZ, and image files.

pip install pymupdf
import pymupdf

doc = pymupdf.open("report.pdf")
page = doc[0]
print(page.get_text())           # extract text
pix = page.get_pixmap(dpi=150)   # render to image
pix.save("page.png")

Key capabilities: text and image extraction · page rendering at any DPI · annotation create/edit/delete · redaction · PDF creation and merging · encryption · OCR via Tesseract · form fields · 10+ output formats

Documentation · PyPI · Changelog


pymupdf4llm — PDF → LLM-ready data

Stars PyPI

Turn any document into clean, structured data for RAG pipelines, vector stores, and LLM ingestion — in one line. No GPU, no cloud, no tokens required.

pip install pymupdf4llm
import pymupdf4llm

md   = pymupdf4llm.to_markdown("paper.pdf")       # Markdown
data = pymupdf4llm.to_json("paper.pdf")           # JSON with bboxes
text = pymupdf4llm.to_text("paper.pdf")           # plain text

Key capabilities: layout-aware extraction · multi-column reading order · table detection → Markdown · smart hybrid OCR (only where needed) · page chunking with metadata · LlamaIndex and LangChain integrations · 10–250× cheaper than vision-LLM approaches

Documentation · PyPI · Live demo


langchain-pymupdf4llm — LangChain integration

PyPI

A drop-in LangChain document loader and parser backed by pymupdf4llm. Extracts PDF content as Markdown and feeds it directly into any LangChain retrieval chain.

pip install langchain-pymupdf4llm
from langchain_pymupdf4llm import PyMuPDF4LLMLoader

loader = PyMuPDF4LLMLoader("document.pdf", mode="single")
docs = loader.load()

PyPI


pymupdf4llm-mcp — MCP server

PyPI

An MCP (Model Context Protocol) server exposing pymupdf4llm as a tool. Gives any MCP-compatible AI client (Claude, Cursor, Windsurf) direct access to PDF-to-Markdown extraction.

uvx pymupdf4llm-mcp@latest stdio

PyPI


PyMuPDF-Utilities — demos & examples

Stars

A collection of working example scripts, Jupyter notebooks, and GUI demos built on PyMuPDF. Covers image handling, annotation workflows, data extraction patterns, and more — useful as a recipe book alongside the official documentation.

Browse examples


pymupdf-fonts — optional font collection

An optional font package extending the fonts available for text output in PyMuPDF. Includes additional Unicode-compatible typefaces beyond the 14 standard PDF fonts.

pip install pymupdf-fonts

Supported document formats

Format PyMuPDF pymupdf4llm
PDF (all versions, encrypted)
XPS / OpenXPS
EPUB / MOBI / FB2
CBZ / CBT (comic book)
Images (PNG, JPG, TIFF…) ✅ (with OCR)
DOCX / XLSX / PPTX / HWP ✅ (Pro only)
SVG ✅ (limited)

Documentation & community

Resource Link
Full documentation https://pymupdf.readthedocs.io?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=docs
pymupdf4llm docs https://pymupdf.readthedocs.io/en/latest/pymupdf4llm?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=docs
Live demo https://demo.pymupdf.io?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=demo
Discord https://pymupdf.io/discord/artifex?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=discord
Forum https://forum.mupdf.com?utm_source=github&utm_medium=referral&utm_campaign=pymupdf_github&utm_content=documentation_community&utm_term=forum
PyPI · PyMuPDF https://pypi.org/project/pymupdf
PyPI · pymupdf4llm https://pypi.org/project/pymupdf4llm
Bug reports & issues https://github.com/pymupdf/PyMuPDF/issues

License

All repositories in this organisation are available under the GNU AGPL v3 for open-source use, and under a commercial licence for proprietary applications. Commercial licences are available from Artifex Software — the creators and maintainers of MuPDF, PyMuPDF, and this organisation.


Maintained by Artifex Software, Inc. · pymupdf.io

Pinned Loading

  1. PyMuPDF PyMuPDF Public

    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

    Python 9.9k 732

  2. PyMuPDF-Utilities PyMuPDF-Utilities Public

    Demos, examples and utilities using PyMuPDF

    Jupyter Notebook 717 178

  3. pymupdf4llm pymupdf4llm Public

    PyMuPDF4LLM

    Python 1.8k 218

  4. pymupdf4llm-mcp pymupdf4llm-mcp Public

    Python 65 10

  5. pymupdf-fonts pymupdf-fonts Public

    Collection of optional fonts for PyMuPDF

    Python 13 3

Repositories

Showing 10 of 14 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…