A tool for translating PDF documents using the DeepL API while preserving formatting and images.
- Full PDF document translation with preserved formatting
- Support for maintaining images, tables, and other elements
- Translation of all languages supported by the DeepL API
- Automatic handling of large documents
- Python 3.7+
- DeepL API key (you can get a free one at DeepL Developer)
- Microsoft Word or LibreOffice (for DOCX to PDF conversion)
# Clone the repository
git clone https://github.com/KNXKO/deepl-pdf-translator.git
cd deepl-pdf-translator
# Install dependencies
pip install -r requirements.txt
python pdf_translator.py path_to_pdf target_language --auth-key YOUR_DEEPL_API_KEY
Example:
python pdf_translator.py translated.pdf SK --auth-key xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
path_to_pdf
: Path to the PDF file to translatetarget_language
: Target language code (e.g., SK, EN, DE, FR...)--auth-key
: DeepL API key--output
: (optional) Path to the output file
DeepL API supports these language codes:
- BG (Bulgarian)
- CS (Czech)
- DA (Danish)
- DE (German)
- EL (Greek)
- EN (English)
- ES (Spanish)
- ET (Estonian)
- FI (Finnish)
- FR (French)
- HU (Hungarian)
- ID (Indonesian)
- IT (Italian)
- JA (Japanese)
- LT (Lithuanian)
- LV (Latvian)
- NL (Dutch)
- PL (Polish)
- PT (Portuguese)
- RO (Romanian)
- RU (Russian)
- SK (Slovak)
- SL (Slovenian)
- SV (Swedish)
- TR (Turkish)
- UK (Ukrainian)
- ZH (Chinese)
- The PDF is converted to DOCX format to preserve formatting and images
- All text elements within the DOCX are translated using the DeepL API
- The translated DOCX is converted back to PDF
- Parallel processing for faster translation
- Translation memory to avoid re-translating repeated phrases
- OCR support for scanned documents
- Better handling of complex nested objects
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.