Automatically rename Audible audiobook PDF companions from cryptic codes like bk_adbl_022796.pdf to their actual book titles like Misbehaving.pdf.
When you download PDF companions from Audible, they come with unhelpful filenames:
bk_adbl_022796.pdf
bk_rand_002806.pdf
bk_sans_007977.pdf
bk_harp_004529.pdf
Good luck remembering which book is which!
This tool automatically extracts the book title from each PDF and renames the files:
Misbehaving.pdf
Thinking Fast and Slow.pdf
Leonardo da Vinci.pdf
The Intelligent Investor.pdf
- 3-tier title extraction: Tries multiple methods to find the title
- PDF metadata
- Text extraction from content
- OCR for image-based PDFs (optional)
- Dry run mode: Preview changes before committing
- Safe renaming: Handles filename conflicts and special characters
- Verbose output: See exactly how titles are being extracted
pip install pdfplumber pypdfOCR allows the tool to extract titles from image-based PDFs (like O'Reilly books):
# Install Python packages
pip install pdfplumber pypdf pytesseract pdf2image
# Install Tesseract OCR engine
# macOS
brew install tesseract poppler
# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils
# Windows
# Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
# Also install poppler: https://github.com/oschwartz10612/poppler-windows/releasesgit clone https://github.com/yourusername/audible-pdf-renamer.git
cd audible-pdf-renamer
pip install -r requirements.txt# Rename PDFs in current directory
python audible_pdf_renamer.py
# Rename PDFs in a specific folder
python audible_pdf_renamer.py ~/Downloads/Audible
# Rename PDFs in a folder with spaces
python audible_pdf_renamer.py "/path/to/Audible Booknotes"See what would be renamed without actually changing anything:
python audible_pdf_renamer.py --dry-run
python audible_pdf_renamer.py ~/Downloads/Audible -nSee detailed information about how titles are extracted:
python audible_pdf_renamer.py --verbose
python audible_pdf_renamer.py ~/Downloads/Audible -vSkip OCR extraction for faster processing (may miss some titles):
python audible_pdf_renamer.py --no-ocrProcess all PDFs, not just Audible-named ones:
python audible_pdf_renamer.py --pattern "*.pdf"python audible_pdf_renamer.py ~/Downloads/Audible --dry-run --verboseThe tool uses a 3-tier fallback approach to extract book titles:
Many PDFs have the title stored in their metadata properties. This is the fastest and most reliable method when available.
If metadata isn't available, the tool extracts text from the first few pages and looks for title patterns. It intelligently skips boilerplate content like copyright notices and publisher information.
Some PDFs (particularly from publishers like O'Reilly) have image-based content where text extraction doesn't work. The tool renders these pages as images and uses Tesseract OCR to read the title.
Audible PDF Renamer v1.0.0
Folder: /Users/you/Downloads/Audible Booknotes
OCR: Available
Found 51 PDF(s) to process
======================================================================
bk_rand_002806.pdf
→ Thinking Fast and Slow.pdf
(extracted via text)
✓ Renamed successfully
bk_sans_007977.pdf
→ Leonardo da Vinci.pdf
(extracted via text)
✓ Renamed successfully
bk_upfr_000065.pdf
→ Information Architecture for the Web and Beyond.pdf
(extracted via ocr)
✓ Renamed successfully
======================================================================
Summary:
✓ Renamed: 51
✗ Failed: 0
By default, the tool only processes files starting with bk_ (Audible's naming convention). If your files have different names, use the --pattern flag:
python audible_pdf_renamer.py --pattern "*.pdf"-
Make sure Tesseract is installed and in your PATH:
tesseract --version
-
Make sure poppler-utils is installed (required by pdf2image):
# Check if pdftoppm is available which pdftoppm
The tool does its best to extract titles, but some PDFs may have unusual formatting. You can:
- Use
--verboseto see what's being extracted - Manually rename problematic files
- Open an issue with details about the problematic PDF
Contributions are welcome! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
- Built with pdfplumber for PDF text extraction
- Uses pypdf for PDF metadata
- OCR powered by Tesseract via pytesseract