CLI tool for detecting structural, textual, and visual differences between PDF files, for use in automatic regression tests.
DiffPDF uses a fail-fast sequential pipeline to compare PDFs:
- Hash Check - SHA-256 comparison. If identical, exit immediately with pass.
- Page Count - Verify both PDFs have the same number of pages.
- Text Content - Extract and compare text from all pages (ignoring whitespace).
- Visual Check - Render pages to images and compare using pixelmatch.
Each stage only runs if all previous stages pass.
pip install diffpdfUsage: diffpdf [OPTIONS] REFERENCE ACTUAL
Compare two PDF files for structural, textual, and visual differences.
Options:
--threshold FLOAT Pixelmatch threshold (0.0-1.0)
--dpi INTEGER Render resolution
--output-dir DIRECTORY Diff image output directory
-v, --verbose Increase verbosity (-v for INFO, -vv for DEBUG)
--save-log Write log output to log.txt
--version Show the version and exit.
--help Show this message and exit.
Exit Codes
0— Pass (PDFs are equivalent)1— Fail (differences detected)2— Error (invalid input or processing error)
Call the CLI from Python:
from diffpdf import main
main(["-vv","foo.pdf", "bar.pdf"])pip install -e .[dev]
pytest tests/ -v
ruff check .Built with PyMuPDF for PDF parsing and pixelmatch-py (Python port of pixelmatch) for visual comparison.