A high-fidelity document translation toolkit designed for large PDFs and text files. This tool supports a granular "Extract -> Refine -> Translate" workflow, ensuring the highest quality by enabling manual text cleanup before AI translation.
- Smart Chunking: Automatically slices PDFs based on Bookmarks (TOC) or Chapter headings.
- Two-Pass Translation: Performs a literal technical draft followed by an eloquent scholarly refinement.
- Dual Outputs: Saves both
_draft.txt(literal) and_final.txt(eloquent) for every section. - Refinement-First Workflow: Extracts raw text so you can fix OCR errors or remove headers/footers before translating.
- Resumable: Skips already processed segments to save time and API quota.
- Flexible Filters: Process a single section, a range of indices, or the entire book.
- Systematic Organization: Automatically creates dedicated folders for segments and translations based on the input filename.
- Ensure you have Python 3.10+ installed.
- Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate - Install dependencies:
pip install google-genai pymupdf
- Set your Gemini API Key:
export GEMINI_API_KEY="your_api_key_here"
Break the PDF into manageable chunks and extract the text for refinement.
python main.py --input MyBook.pdf --mode bookmark --action prepare- Result: A new folder
sections_MyBook/is created containing.pdfsegments and.txtextractions.
Open the .txt files in sections_MyBook/ and clean up any "noise" like page numbers, running headers, or bad line breaks.
Submit your refined text to Gemini for a two-pass translation.
# Translate with a custom glossary (.json or .txt)
python main.py --input MyBook.pdf --action translate --glossary glossary.json
## Translate a single section
python main.py --input MyBook.pdf --action translate --index 7
# Translate a range of sections
python main.py --input MyBook.pdf --action translate --start 10 --end 20- Result: Final translations are saved in
translations_MyBook/...._draft.txt: The initial technical draft...._final.txt: The polished, natural translation.
Note: To re-translate a section, you must first remove or rename its existing _final.txt file.
| Flag | Description | Options |
|---|---|---|
--input |
(Required) Path to your PDF or TXT file. | File path |
--mode |
How the book should be divided. | bookmark (default), chapter, full |
--action |
What step to perform. | slice, extract, translate, prepare (slice+extract) |
--lang |
The target language for translation. | e.g., Persian, Spanish, French |
--glossary |
Path to glossary file. | .json or .txt |
--index |
Process only this specific section number. | Integer |
--start / --end |
Process a range of sections. | Integers |
--key |
Your Google Gemini API Key. | String |
sections_<filename>/: Contains the sliced PDF parts and extracted text files.translations_<filename>/: Contains the final dual-pass output from Gemini.
This project was entirely authored and architected by Google Gemini.