A powerful, production-ready utility for automatically applying Bates numbers to PDF documents and converting various file types to PDF format. Features an advanced GUI with real-time preview capabilities, comprehensive file conversion support, and intelligent stamp positioning.
- Universal File Processing: Automatically processes PDFs and converts supported file types to PDF
- Intelligent Bates Numbering: Sequential numbering with customizable prefixes and zero-padding
- Original File Preservation: Maintains original files alongside processed versions
- Directory Structure Preservation: Keeps folder hierarchy intact in output
- Multi-page Support: Proper numbering across multi-page documents
- Secured PDF Handling: Automatically unlocks password-protected PDFs for processing
Direct PDF Processing:
- Native PDF documents with full page-by-page numbering
Automatic PDF Conversion:
- Spreadsheets:
.xlsx,.xls,.xlsm,.xlsb,.csv - Images:
.png,.jpg,.jpeg,.gif,.bmp - Email:
.emlfiles with full content extraction - Documents:
.docx,.doc(with unoconv dependency)
Smart Conversion Features:
- High-quality Excel/CSV to PDF with proper formatting and table styling
- Intelligent column width calculation and page layout optimization
- Email content extraction with headers (From, To, Subject, Date)
- Image optimization with proper scaling and aspect ratio preservation
Modern User Experience:
- Real-time Preview: Live preview of files with stamp positioning
- File Navigation: Browse through processable files with arrow keys or mouse wheel
- PDF Page Navigation: Navigate multi-page PDFs page by page
- Drag & Drop Support: Drop files or folders directly into the interface
- Visual Feedback: See exactly how stamps will appear before processing
Preview Capabilities:
- PDF Visual Preview: Actual page rendering when pdf2image is available
- File Content Preview: Text preview for spreadsheets, emails, and other formats
- Real-time Stamp Overlay: See stamp positioning, colors, and effects instantly
- Navigation Hints: Clear keyboard shortcuts and navigation instructions
Positioning & Layout:
- Precision Positioning: X/Y coordinates (0.0-1.0 range) for exact placement
- Visual Position Editor: Real-time adjustment with immediate preview
- Vertical Offset: Fine-tune background and border positioning
- Multi-page Consistency: Stamps appear identically across all pages
Appearance Options:
- Colors: Black, red, blue, green, gray with opacity control (0-100%)
- Font Sizing: Configurable font size (6-20pt) with preview scaling
- Background Support: Optional colored backgrounds with transparency
- Border Options: Optional borders with configurable width and colors
- Multiple Background Colors: White, light gray, gray, yellow, light blue, light green
Professional Styling:
- Text Opacity: Full transparency control for subtle watermarking
- Background Opacity: Separate opacity control for backgrounds
- Smart Centering: Automatic text centering with configurable offsets
- PDF Coordinate Mapping: Proper coordinate conversion for accurate placement
Excel Report Generation:
- File Inventory: Complete list of processed files with Bates numbers
- Page Counting: Accurate page counts for all documents
- Processing Metadata: Creation dates, processing timestamps, file paths
- Auto-formatted Columns: Professional Excel formatting with proper column widths
Combined PDF Creation:
- Master Document: Single PDF containing all processed files in Bates order
- Error Handling: Graceful handling of corrupted or problematic files
- Sorting Logic: Proper numerical sorting of Bates numbers
Robust Processing:
- Graceful Failures: Continues processing even when individual files fail
- Issues Directory: Automatically moves problematic files to
_FILES WITH ISSUESfolder - Detailed Logging: Comprehensive logs for troubleshooting and auditing
- PDF Unlocking: Automatic handling of secured/encrypted PDFs
- Dependency Management: Automatic installation of required packages
Smart File Filtering:
- System File Exclusion: Automatically ignores
.DS_Store,Thumbs.db, temp files - Extension Validation: Only processes supported file types
- Duplicate Prevention: Avoids processing the same file multiple times
Default Settings:
- Prefix: "BDT" (customizable)
- Starting Number: 14323 (customizable)
- Digits: 5 (1-10 range)
- Position: Bottom-right corner
- Color: Black with 100% opacity
Advanced Options:
- Command Line Support: Full CLI interface for automation
- Batch Processing: Handle entire directory trees
- Output Control: Timestamped output directories
- Platform Support: Cross-platform compatibility (Windows, macOS, Linux)
git clone [repository-url]
cd bates-master
pip install -r requirements.txtCore Requirements:
- Python 3.6+
- PyPDF2 / PyPDF4 (PDF processing)
- reportlab (PDF generation)
- openpyxl (Excel reports)
- tkinter (GUI - usually included with Python)
Optional Enhancements:
pdf2image+poppler(for PDF visual previews)tkinterdnd2(for drag & drop support)unoconv(for additional document conversion)
Automatic Installation: The utility will attempt to install missing dependencies automatically.
python bates_master.pyWorkflow:
- Select Source: Browse or drag & drop a folder/file
- Configure Settings: Adjust prefix, numbering, and stamp appearance
- Preview: Use navigation to preview files and stamp positioning
- Process: Click "π·οΈ Start Bates Numbering Process"
- Review: Automatic opening of output folder with results
Navigation Controls:
- Files: ββ arrows, Ctrl+ββ, or mouse wheel
- PDF Pages: ββ arrows, Page Up/Down, or mouse wheel
- Position Adjustment: Real-time X/Y coordinate controls
python bates_master.py input_directory output_directory [options]Options:
--prefix PREFIX: Set Bates number prefix--zero-pad N: Number of digits (default: 5)--start N: Starting number (default: 1)
Examples:
# Basic usage
python bates_master.py /path/to/documents /path/to/output
# Custom settings
python bates_master.py /documents /output --prefix "LEGAL" --zero-pad 6 --start 1000
# Single file processing
python bates_master.py /path/to/document.pdf /output --prefix "DOC"BATES_[PREFIX]_[TIMESTAMP]/
βββ bates_report.xlsx # Comprehensive Excel report
βββ BATES_[PREFIX]_[TIMESTAMP]_processing.log # Detailed processing log
βββ [PREFIX]_combined.pdf # Master combined document
βββ [original_folder_structure]/ # Maintained directory structure
β βββ [PREFIX]000001_document.pdf
β βββ [PREFIX]000001_document.xlsx # Original preserved
β βββ [PREFIX]000002_spreadsheet.pdf
β βββ [PREFIX]000002_spreadsheet.csv # Original preserved
βββ _FILES WITH ISSUES/ # Problematic files (if any)
βββ problematic_file_ISSUE_reason.pdf
Excel Report Includes:
- Bates Number assignment
- Original filename and path
- File type and page count
- Processing timestamps
- Auto-formatted professional layout
- Live File Browser: Navigate through all processable files
- PDF Page Viewer: Page-by-page preview for multi-page documents
- Stamp Overlay: See exact stamp placement before processing
- Content Preview: Text previews for spreadsheets, emails, and documents
- Coordinate-based Positioning: Precise placement using 0.0-1.0 coordinates
- Multi-layer Rendering: Background, border, and text layers
- Typography Control: Font sizing, color, and opacity management
- Cross-page Consistency: Identical appearance across all document pages
- Secured PDF Processing: Automatic unlocking and re-securing
- Graceful Degradation: Continues processing despite individual file failures
- Comprehensive Logging: Detailed audit trail for compliance
- Automatic Recovery: Smart fallback mechanisms for edge cases
Minimum:
- Python 3.6 or higher
- 4GB RAM (for large document processing)
- Cross-platform: Windows 10+, macOS 10.14+, Linux (Ubuntu 18.04+)
Recommended for Full Features:
- Python 3.8+
- 8GB RAM
- PDF2Image + Poppler (for visual PDF previews)
- SSD storage (for faster processing)
Common Issues:
- No visual PDF preview: Install
pdf2imageandpoppler-utils - Drag & drop not working: Install
tkinterdnd2 - Document conversion fails: Install
unoconvor LibreOffice - Permission errors: Run with appropriate file system permissions
Performance Optimization:
- Use SSD storage for large batch processing
- Ensure sufficient RAM for high-resolution image conversion
- Close other applications during large batch operations
This project welcomes contributions! Areas for enhancement:
- Additional file format support
- Advanced stamp templates
- Batch processing optimizations
- Integration with document management systems
[License information to be added]
For issues, feature requests, or questions:
- Open an issue in the repository
- Check the processing logs for detailed error information
- Review the troubleshooting section above