PDF Trimmer

A Python utility for trimming and editing PDF documents via a text search cutoff or explicit page deletion (ranges, before/after), with batch processing and automatic blank-page removal.

Why

I needed to print a lot of PDFs that all shared the same structure — each had different content of text and had identical pages with images at the end of the file which i did not want to be printed. Opening each file individually to select which pages to print was tedious and error-prone.

The solution started as a quick Python script: bulk find a specific text string in the PDF's (the title where the images started) and automatically remove everything after it. That one-file python prototype worked perfectly - a simple command python3 pdftrim "Foto's" && lp output/*.pdf, +50 PDF's processed and printed. My work was done in a blink!

But I saw potential to make it more useful : explicit page ranges, blank page detection and a cleaner architecture. What began as a one-day automation hack evolved into a flexible tool for common PDF manipulation tasks I occasionally need - or will need in the future.

Features

Text-based trimming: Remove pages starting from a specific search string
Page deletion: Delete specific pages/ranges or pages before/after a page number
Blank page detection: Automatically identify and remove blank or decorative pages
Batch processing: Process multiple PDFs in a directory or single files
Flexible output: Configurable output directory and file naming
Debug mode: Detailed logging for troubleshooting
Environment configuration: Customize behavior via environment variables

Installation

Requirements

Python 3.10+ (tested with Python 3.13.7)
PyMuPDF library for PDF processing

Install Dependencies

pip install -r requirements.txt

Or install directly:

pip install "PyMuPDF>=1.24.0,<2.0.0"

Usage

You can run via python pdftrim.py ..., or use the wrapper script ./pdftrim.sh ... to automatically use the local virtualenv at .venv.

Basic Usage

# Process all PDFs in current directory (batch mode)
python pdftrim.py --delete --search "search_string"
# or:
./pdftrim.sh --delete --search "search_string"

# Process a specific PDF file
python pdftrim.py --file input.pdf --delete --search "search_string"
# or:
./pdftrim.sh --file input.pdf --delete --search "search_string"

# Delete specific pages (1-based)
python pdftrim.py --file input.pdf --delete "1-4,7"

# Keep only specific pages (1-based) - inverse of delete-by-spec
python pdftrim.py --file input.pdf --keep "1-4,7"

# Invert before/after behavior (keep instead of delete)
python pdftrim.py --file input.pdf --keep --before 10   # keeps pages 1-9

# Delete pages before a page number (1-based)
python pdftrim.py --file input.pdf --delete --before 10   # deletes pages 1-9

# Delete pages after a page number (1-based)
python pdftrim.py --file input.pdf --delete --after 10    # deletes pages 11-end

# Combine before + after (allowed)
python pdftrim.py --file input.pdf --delete --before 10 --after 12

# Invert text-based trimming (keep content starting at the match)
python pdftrim.py --file input.pdf --keep --search "search_string"

Examples

# Remove pages after "Chapter 5" from all PDFs in directory
python pdftrim.py -d -s "Chapter 5"

# Process specific document, remove pages after "Appendix A"
python pdftrim.py -f document.pdf -d -s "Appendix A"

# Process with custom output directory
PDF_TRIMMER_OUTPUT_DIR=processed python pdftrim.py -d -s "References"

# Delete pages 1-4 and 7
python pdftrim.py -f document.pdf -d "1-4,7"

# Keep only pages 1-4 and 7
python pdftrim.py -f document.pdf -k "1-4,7"

# Remove everything before page 10
python pdftrim.py -f document.pdf -d -b 10

# Remove everything after page 10
python pdftrim.py -f document.pdf -d -a 10

Notes

Page numbers are 1-based for all page deletion flags (including --keep).
For --search, --before, and --after, you must specify a mode flag: --delete or --keep.
--before and --after can be combined; other operations are mutually exclusive.
The tool refuses to create an empty PDF (if an operation would delete all pages).

Command Line Options

-f, --file: Input PDF file path (omit for batch mode in current directory)
-s, --search: Trim based on the first occurrence of this search string (requires --delete or --keep)
-d, --delete: Delete mode; with a spec deletes pages/ranges (e.g. 1-4,7)
-k, --keep: Keep mode; with a spec keeps pages/ranges (e.g. 1-4,7)
-b, --before: Before-page selection (requires --delete or --keep)
-a, --after: After-page selection (requires --delete or --keep)
--help, -h: Show help message
--version, -v: Show version information

Environment Variables

Variable	Default	Description
`PDF_TRIMMER_DEBUG`	`false`	Enable debug output
`PDF_TRIMMER_OUTPUT_DIR`	`output`	Set output directory
`PDF_TRIMMER_OUTPUT_SUFFIX`	`_edit`	Set output file suffix
`PDF_TRIMMER_PDF_PATTERN`	`*.pdf`	PDF file search pattern
`PDF_TRIMMER_PROCESSED_SUFFIX`	`_edit.pdf`	Skip files with this suffix

How It Works

Text Search: Locates the specified search string in the PDF
Page Trimming: Removes all pages after the search string location
Blank Detection: Analyzes remaining pages for meaningful content
Content Filtering: Removes pages with minimal or decorative-only content
Output Generation: Saves processed PDF with configurable naming

Blank Page Detection

The tool uses sophisticated algorithms to identify blank pages:

Text Analysis: Checks for meaningful text content (>20 characters)
Content Blocks: Analyzes text block structure and substance
Decorative Filtering: Distinguishes between content and page decorations
Size Heuristics: Considers page dimensions and content density

Architecture

The PDF Trimmer follows a clean architecture with dependency injection:

src/
├── core/           # Core processing logic
├── models/         # Data models and PDF wrappers
├── services/       # File and workflow management
├── ui/            # User interface and CLI handling
├── di/            # Dependency injection container
└── config/        # Configuration management

Key Components

PDFProcessor: Core PDF processing and trimming logic
TextSearchEngine: Text location and analysis
FileService: File operations and validation
WorkflowManager: Orchestrates processing workflow
DisplayManager: Output formatting and logging
DependencyContainer: Component wiring and lifecycle

Development

Running tests

pip install -r requirements-dev.txt
pytest

Project Structure

pdftrim/
├── pdftrim.py          # Main entry point
├── requirements.txt    # Dependencies
├── src/               # Source code
├── planning/          # Development documentation
└── output/           # Default output directory

Code Quality

Type Hints: Comprehensive type annotations throughout
Clean Architecture: Separation of concerns with clear interfaces
Dependency Injection: Testable and modular component design
Error Handling: Robust error management with user-friendly messages

Running in Debug Mode

# Enable debug logging
PDF_TRIMMER_DEBUG=true python pdftrim.py --search "search_string"

# Debug with custom settings
PDF_TRIMMER_DEBUG=true PDF_TRIMMER_OUTPUT_DIR=debug python pdftrim.py --file document.pdf --search "text"

Contributing

Fork the repository
Create a feature branch
Make your changes with appropriate tests
Ensure code follows project style guidelines
Submit a pull request

Development Dependencies

For development work, you may want additional tools:

# Type checking
pip install mypy

# Code formatting
pip install black

# Linting
pip install ruff

# Testing (when tests are added)
pip install pytest pytest-cov

License

MIT License. See LICENSE.

Changelog

See CHANGELOG.md for version history and changes.

Support

For issues, questions, or contributions, please create an issue or start a discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Trimmer

Why

Features

Installation

Requirements

Install Dependencies

Usage

Basic Usage

Examples

Notes

Command Line Options

Environment Variables

How It Works

Blank Page Detection

Architecture

Key Components

Development

Running tests

Project Structure

Code Quality

Running in Debug Mode

Contributing

Development Dependencies

License

Changelog

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pdftrim.py		pdftrim.py
pdftrim.sh		pdftrim.sh
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PDF Trimmer

Why

Features

Installation

Requirements

Install Dependencies

Usage

Basic Usage

Examples

Notes

Command Line Options

Environment Variables

How It Works

Blank Page Detection

Architecture

Key Components

Development

Running tests

Project Structure

Code Quality

Running in Debug Mode

Contributing

Development Dependencies

License

Changelog

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages