A Python utility for trimming and editing PDF documents via a text search cutoff or explicit page deletion (ranges, before/after), with batch processing and automatic blank-page removal.
I needed to print a lot of PDFs that all shared the same structure — each had different content of text and had identical pages with images at the end of the file which i did not want to be printed. Opening each file individually to select which pages to print was tedious and error-prone.
The solution started as a quick Python script: bulk find a specific text string in the PDF's (the title where the images started) and automatically remove everything after it. That one-file python prototype worked perfectly - a simple command python3 pdftrim "Foto's" && lp output/*.pdf, +50 PDF's processed and printed. My work was done in a blink!
But I saw potential to make it more useful : explicit page ranges, blank page detection and a cleaner architecture. What began as a one-day automation hack evolved into a flexible tool for common PDF manipulation tasks I occasionally need - or will need in the future.
- Text-based trimming: Remove pages starting from a specific search string
- Page deletion: Delete specific pages/ranges or pages before/after a page number
- Blank page detection: Automatically identify and remove blank or decorative pages
- Batch processing: Process multiple PDFs in a directory or single files
- Flexible output: Configurable output directory and file naming
- Debug mode: Detailed logging for troubleshooting
- Environment configuration: Customize behavior via environment variables
- Python 3.10+ (tested with Python 3.13.7)
- PyMuPDF library for PDF processing
pip install -r requirements.txtOr install directly:
pip install "PyMuPDF>=1.24.0,<2.0.0"You can run via python pdftrim.py ..., or use the wrapper script ./pdftrim.sh ... to automatically use the local virtualenv at .venv.
# Process all PDFs in current directory (batch mode)
python pdftrim.py --delete --search "search_string"
# or:
./pdftrim.sh --delete --search "search_string"
# Process a specific PDF file
python pdftrim.py --file input.pdf --delete --search "search_string"
# or:
./pdftrim.sh --file input.pdf --delete --search "search_string"
# Delete specific pages (1-based)
python pdftrim.py --file input.pdf --delete "1-4,7"
# Keep only specific pages (1-based) - inverse of delete-by-spec
python pdftrim.py --file input.pdf --keep "1-4,7"
# Invert before/after behavior (keep instead of delete)
python pdftrim.py --file input.pdf --keep --before 10 # keeps pages 1-9
# Delete pages before a page number (1-based)
python pdftrim.py --file input.pdf --delete --before 10 # deletes pages 1-9
# Delete pages after a page number (1-based)
python pdftrim.py --file input.pdf --delete --after 10 # deletes pages 11-end
# Combine before + after (allowed)
python pdftrim.py --file input.pdf --delete --before 10 --after 12
# Invert text-based trimming (keep content starting at the match)
python pdftrim.py --file input.pdf --keep --search "search_string"# Remove pages after "Chapter 5" from all PDFs in directory
python pdftrim.py -d -s "Chapter 5"
# Process specific document, remove pages after "Appendix A"
python pdftrim.py -f document.pdf -d -s "Appendix A"
# Process with custom output directory
PDF_TRIMMER_OUTPUT_DIR=processed python pdftrim.py -d -s "References"
# Delete pages 1-4 and 7
python pdftrim.py -f document.pdf -d "1-4,7"
# Keep only pages 1-4 and 7
python pdftrim.py -f document.pdf -k "1-4,7"
# Remove everything before page 10
python pdftrim.py -f document.pdf -d -b 10
# Remove everything after page 10
python pdftrim.py -f document.pdf -d -a 10- Page numbers are 1-based for all page deletion flags (including
--keep). - For
--search,--before, and--after, you must specify a mode flag:--deleteor--keep. --beforeand--aftercan be combined; other operations are mutually exclusive.- The tool refuses to create an empty PDF (if an operation would delete all pages).
-f,--file: Input PDF file path (omit for batch mode in current directory)-s,--search: Trim based on the first occurrence of this search string (requires--deleteor--keep)-d,--delete: Delete mode; with a spec deletes pages/ranges (e.g.1-4,7)-k,--keep: Keep mode; with a spec keeps pages/ranges (e.g.1-4,7)-b,--before: Before-page selection (requires--deleteor--keep)-a,--after: After-page selection (requires--deleteor--keep)--help,-h: Show help message--version,-v: Show version information
| Variable | Default | Description |
|---|---|---|
PDF_TRIMMER_DEBUG |
false |
Enable debug output |
PDF_TRIMMER_OUTPUT_DIR |
output |
Set output directory |
PDF_TRIMMER_OUTPUT_SUFFIX |
_edit |
Set output file suffix |
PDF_TRIMMER_PDF_PATTERN |
*.pdf |
PDF file search pattern |
PDF_TRIMMER_PROCESSED_SUFFIX |
_edit.pdf |
Skip files with this suffix |
- Text Search: Locates the specified search string in the PDF
- Page Trimming: Removes all pages after the search string location
- Blank Detection: Analyzes remaining pages for meaningful content
- Content Filtering: Removes pages with minimal or decorative-only content
- Output Generation: Saves processed PDF with configurable naming
The tool uses sophisticated algorithms to identify blank pages:
- Text Analysis: Checks for meaningful text content (>20 characters)
- Content Blocks: Analyzes text block structure and substance
- Decorative Filtering: Distinguishes between content and page decorations
- Size Heuristics: Considers page dimensions and content density
The PDF Trimmer follows a clean architecture with dependency injection:
src/
├── core/ # Core processing logic
├── models/ # Data models and PDF wrappers
├── services/ # File and workflow management
├── ui/ # User interface and CLI handling
├── di/ # Dependency injection container
└── config/ # Configuration management
- PDFProcessor: Core PDF processing and trimming logic
- TextSearchEngine: Text location and analysis
- FileService: File operations and validation
- WorkflowManager: Orchestrates processing workflow
- DisplayManager: Output formatting and logging
- DependencyContainer: Component wiring and lifecycle
pip install -r requirements-dev.txt
pytestpdftrim/
├── pdftrim.py # Main entry point
├── requirements.txt # Dependencies
├── src/ # Source code
├── planning/ # Development documentation
└── output/ # Default output directory
- Type Hints: Comprehensive type annotations throughout
- Clean Architecture: Separation of concerns with clear interfaces
- Dependency Injection: Testable and modular component design
- Error Handling: Robust error management with user-friendly messages
# Enable debug logging
PDF_TRIMMER_DEBUG=true python pdftrim.py --search "search_string"
# Debug with custom settings
PDF_TRIMMER_DEBUG=true PDF_TRIMMER_OUTPUT_DIR=debug python pdftrim.py --file document.pdf --search "text"- Fork the repository
- Create a feature branch
- Make your changes with appropriate tests
- Ensure code follows project style guidelines
- Submit a pull request
For development work, you may want additional tools:
# Type checking
pip install mypy
# Code formatting
pip install black
# Linting
pip install ruff
# Testing (when tests are added)
pip install pytest pytest-covMIT License. See LICENSE.
See CHANGELOG.md for version history and changes.
For issues, questions, or contributions, please create an issue or start a discussion.