A browser-based tool for translating PDF books and markdown documents to any language using OpenAI's GPT models. Upload a PDF, clean up OCR errors, and get professional translations—all from your browser.
Book Translate handles the complete document translation pipeline:
- Convert - Transform PDFs into editable markdown
- Clean Up - Fix OCR errors and PDF conversion artifacts
- Translate - Get natural, context-aware translations in any language
All processing happens client-side in your browser. Your documents and API key never leave your machine (except for API calls directly to OpenAI).
Visit science.github.io/ai-translator to launch the app in your browser.
Navigate to Settings and enter your OpenAI API key. Need help? Click "How to get an API key" for step-by-step instructions including:
- Creating an OpenAI account
- Setting up billing (required for API access)
- Generating and saving your key
Option A: One-Step Translation (Recommended)
- Go to One Step Translation in the sidebar
- Upload a PDF or markdown file
- Enter your target language (e.g., "Japanese", "formal German")
- Click Start and wait for processing
- Download your translated document
Option B: Step-by-Step
- Upload your document on the Home page
- Convert it on the Convert PDF page
- Clean it up on the Cleanup page
- Translate it on the Translate page
- Download from My Documents
The flagship feature—a unified pipeline that handles everything:
- Upload PDF or markdown → Automatic conversion (if needed), cleanup, and translation
- Configure cleanup and translation settings independently
- Track progress through each phase with visual indicators
- Download as Markdown or Word document
Outputs:
- Converted Markdown (raw PDF extraction, PDF input only)
- Cleaned Markdown (OCR errors fixed)
- Translated (target language only)
- Bilingual (alternating original/translated sections)
Extract text from PDF files while preserving structure:
- Headers, paragraphs, and formatting maintained
- Fast local processing (no API calls required)
- Works with text-based and scanned PDFs
Fix common OCR and PDF conversion errors:
- Missing letters:
ontents→Contents - Broken words:
Ww hile→While - Page numbers and footer markers removed
- Paragraph flow restored across page breaks
- PDF gibberish text removed
Natural, publication-quality translations:
- Target language flexibility - Specify language and tone (e.g., "formal Japanese", "conversational Spanish")
- Language history - Quick access to recently used languages
- Context-aware translation - Optional feature that provides surrounding text context for consistent tone
- Dual output - Get target-language-only and bilingual versions
Full document library with organization features:
- Phase filtering - View by stage: Uploaded, Converted, Cleaned, Translated
- Search - Find documents by name
- Preview - View raw markdown or rendered HTML
- Export - Download as original format or Word document
- Storage tracking - Monitor browser storage usage
Configure defaults and manage your workspace:
- API key configuration with connection testing
- Default model, chunk size, and reasoning effort
- Context-aware translation toggle
- Storage management with one-click cleanup
| Model | Speed | Quality | Best For |
|---|---|---|---|
| gpt-5.2 | Slower | Highest | Final translations |
| gpt-5-mini | Medium | High | General use (default) |
| gpt-4.1 | Fast | Good | Quick drafts |
| gpt-4.1-mini | Fastest | Good | Large documents |
GPT-5 models support configurable reasoning effort (Low/Medium/High) for balancing speed vs. quality.
Use One Step Translation when:
- You have a PDF or markdown file that needs translation
- You want the simplest experience
- You need all output formats
Use individual pages when:
- You only need PDF conversion (no translation)
- You want to review/edit between steps
- You need more control over each phase
The target language field accepts natural descriptions:
Japanese- Standard translationFormal Japanese- Business/academic toneConversational Brazilian Portuguese- Casual, regional styleSimplified Chinese for technical readers- Domain-specific adaptation
Your recent languages are saved for quick reuse.
Documents progress through phases, visible in My Documents:
| Phase | Description |
|---|---|
| Uploaded | Original file, unprocessed |
| Converted | PDF extracted to markdown |
| Cleaned | OCR errors and artifacts fixed |
| Translated | Final translation (with variant tags) |
- Local storage only - Documents stored in your browser's IndexedDB
- API key security - Stored locally in localStorage, never transmitted except to OpenAI
- No server - All processing is client-side
- Clear anytime - Delete all documents from Settings
For automation or batch processing, a command-line interface is also available:
# Convert PDF to markdown
node src/index.js book.pdf --pdf-to-md --output-dir converted/
# Clean up OCR errors
node src/index.js converted/book.md --rectify --output-dir cleaned/
# Translate to Japanese
node src/index.js cleaned/book-rectified.md --output-dir translated/See CLI Documentation below for full options.
- Node.js 20 LTS or later
- OpenAI API key with billing enabled
To host the web app locally or deploy your own instance:
cd web-app
npm install
npm run dev # Start development server at http://localhost:5173
npm run build # Build for production (outputs to build/)
npm run preview # Preview production build locallycd web-app
npm run dev # Start development server
npm run build # Build for production
npm run check # TypeScript type checking
npm run test:unit # Run Vitest unit tests
npm run test:e2e # Run Playwright browser tests
npm run test:e2e:ui # Playwright with interactive UInpm install
npm test # Run all Jest tests
npm test -- --testPathPattern=chunker # Run specific test fileWeb App:
- SvelteKit 2 / Svelte 5
- Tailwind CSS 4
- TypeScript
- IndexedDB (via idb)
- Vitest + Playwright
CLI:
- Node.js ES Modules
- OpenAI SDK
- Jest
book-translate/
├── web-app/ # Browser application
│ ├── src/
│ │ ├── routes/ # SvelteKit pages
│ │ │ ├── +page.svelte # Home/Upload
│ │ │ ├── workflow/ # One Step Translation
│ │ │ ├── convert/ # PDF to Markdown
│ │ │ ├── cleanup/ # Document rectification
│ │ │ ├── translate/ # Translation
│ │ │ ├── documents/ # Document library
│ │ │ └── settings/ # Configuration
│ │ └── lib/
│ │ ├── stores/ # State management
│ │ ├── services/ # API clients, document operations
│ │ └── components/ # Reusable UI components
│ └── tests/ # E2E tests
├── src/ # CLI tool
│ ├── index.js # CLI entry point
│ ├── chunker.js # Document chunking
│ ├── translator.js # OpenAI translation
│ ├── rectifier.js # OCR error correction
│ └── pdfConverter.js # PDF extraction
└── test/ # CLI tests
node src/index.js <input-file> [options]Modes (mutually exclusive):
--pdf-to-md- Convert PDF to markdown--rectify- Clean up OCR errors (English to English)- (default) - Translate to Japanese
Options:
--output-dir <path>- Output directory (default:output/)--chunk-size <n>- Characters per chunk (default:4000)--model <name>- OpenAI model (default:gpt-5-mini)--reasoning-effort <level>- GPT-5 reasoning:low,medium,high
Examples:
# Full pipeline
node src/index.js book.pdf --pdf-to-md --output-dir step1/
node src/index.js step1/book.md --rectify --output-dir step2/
node src/index.js step2/book-rectified.md --output-dir final/
# Quick translation with faster model
node src/index.js clean-book.md --model gpt-4o --chunk-size 3000Create .env from the template:
cp .env.example .envAdd your OpenAI API key:
OPENAI_API_KEY=sk-...
"Insufficient quota" error
- Add credits to your OpenAI account at platform.openai.com/settings/organization/billing
"Rate limit exceeded" error
- The app automatically retries with backoff
- New accounts have lower limits that increase over time
- Try reducing chunk size or using a faster model
API key not working
- Ensure billing is set up (required even with free credits)
- Check the key hasn't been revoked
- Verify no extra spaces when pasting
Poor PDF conversion quality
- Scanned PDFs produce more OCR errors—always run Cleanup
- Some complex layouts may not convert perfectly
Runtime:
openai- OpenAI API client@opendocsg/pdf2md- PDF to markdown conversionidb- IndexedDB wrappermarked- Markdown rendering@mohtasham/md-to-docx- Word document export
Development:
jest/vitest- Testing frameworksplaywright- Browser testingsvelte/sveltekit- Web frameworktailwindcss- Styling