Skip to content

JustusRijke/DiffPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiffPDF

Build codecov Python 3.10+ License: MIT

CLI tool for detecting structural, textual, and visual differences between PDF files, for use in automatic regression tests.

How It Works

DiffPDF uses a fail-fast sequential pipeline to compare PDFs:

  1. Hash Check - SHA-256 comparison. If identical, exit immediately with pass.
  2. Page Count - Verify both PDFs have the same number of pages.
  3. Text Content - Extract and compare text from all pages (ignoring whitespace).
  4. Visual Check - Render pages to images and compare using pixelmatch.

Each stage only runs if all previous stages pass.

⚠️ Performance Warning: The Python port of pixelmatch is extremely slow.

Installation

pip install diffpdf

CLI Usage

Usage: diffpdf [OPTIONS] REFERENCE ACTUAL

  Compare two PDF files for structural, textual, and visual differences.

Options:
  --threshold FLOAT       Pixelmatch threshold (0.0-1.0)
  --dpi INTEGER           Render resolution
  --output-dir DIRECTORY  Diff image output directory
  -v, --verbose           Increase verbosity (-v for INFO, -vv for DEBUG)
  --save-log              Write log output to log.txt
  --version               Show the version and exit.
  --help                  Show this message and exit.

Exit Codes

  • 0 — Pass (PDFs are equivalent)
  • 1 — Fail (differences detected)
  • 2 — Error (invalid input or processing error)

Library Usage

Call the CLI from Python:

from diffpdf import main
main(["-vv","foo.pdf", "bar.pdf"])

Development

pip install -e .[dev]
pytest tests/ -v
ruff check .

Acknowledgements

Built with PyMuPDF for PDF parsing and pixelmatch-py (Python port of pixelmatch) for visual comparison.

About

A tool for comparing PDF files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •