Skip to content

soheilsolhjoo/TarzabanG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrazbanG

A high-fidelity document translation toolkit designed for large PDFs and text files. This tool supports a granular "Extract -> Refine -> Translate" workflow, ensuring the highest quality by enabling manual text cleanup before AI translation.

Features

  • Smart Chunking: Automatically slices PDFs based on Bookmarks (TOC) or Chapter headings.
  • Two-Pass Translation: Performs a literal technical draft followed by an eloquent scholarly refinement.
  • Dual Outputs: Saves both _draft.txt (literal) and _final.txt (eloquent) for every section.
  • Refinement-First Workflow: Extracts raw text so you can fix OCR errors or remove headers/footers before translating.
  • Resumable: Skips already processed segments to save time and API quota.
  • Flexible Filters: Process a single section, a range of indices, or the entire book.
  • Systematic Organization: Automatically creates dedicated folders for segments and translations based on the input filename.

Installation

  1. Ensure you have Python 3.10+ installed.
  2. Create and activate a virtual environment:
    python3 -m venv venv
    source venv/bin/activate
  3. Install dependencies:
    pip install google-genai pymupdf
  4. Set your Gemini API Key:
    export GEMINI_API_KEY="your_api_key_here"

Usage

1. Preparation (Slicing & Extraction)

Break the PDF into manageable chunks and extract the text for refinement.

python main.py --input MyBook.pdf --mode bookmark --action prepare
  • Result: A new folder sections_MyBook/ is created containing .pdf segments and .txt extractions.

2. Manual Refinement (Recommended)

Open the .txt files in sections_MyBook/ and clean up any "noise" like page numbers, running headers, or bad line breaks.

3. Translation

Submit your refined text to Gemini for a two-pass translation.

# Translate with a custom glossary (.json or .txt)
python main.py --input MyBook.pdf --action translate --glossary glossary.json

## Translate a single section
python main.py --input MyBook.pdf --action translate --index 7

# Translate a range of sections
python main.py --input MyBook.pdf --action translate --start 10 --end 20
  • Result: Final translations are saved in translations_MyBook/.
    • ..._draft.txt: The initial technical draft.
    • ..._final.txt: The polished, natural translation.

Note: To re-translate a section, you must first remove or rename its existing _final.txt file.

Command-Line Arguments

Flag Description Options
--input (Required) Path to your PDF or TXT file. File path
--mode How the book should be divided. bookmark (default), chapter, full
--action What step to perform. slice, extract, translate, prepare (slice+extract)
--lang The target language for translation. e.g., Persian, Spanish, French
--glossary Path to glossary file. .json or .txt
--index Process only this specific section number. Integer
--start / --end Process a range of sections. Integers
--key Your Google Gemini API Key. String

Folders Created

  • sections_<filename>/: Contains the sliced PDF parts and extracted text files.
  • translations_<filename>/: Contains the final dual-pass output from Gemini.

Acknowledgments

This project was entirely authored and architected by Google Gemini.

About

TrazbanG is an advanced toolkit designed for book and document translation. It offers a granular "Slice → Extract → Human Refine → Translate" workflow. By allowing users to manually clean extracted text before submitting it to Google Gemini, TrazbanG ensures maximum fidelity and stylistic consistency for complex manuscripts and large PDFs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages