We're in the AI era. You want to chat with your favorite technical books using Claude Code, Cursor, or any LLM tool. This gets you there.
Export any O'Reilly book to Markdown, PDF, EPUB, JSON, or plain text. Download by chapters so you don't burn through your context window.
Requires a valid O'Reilly Learning subscription.
For personal and educational use only. Please read the O'Reilly Terms of Service.
Inspired by safaribooks by @lorenzodifuccia.
- Export by chapters - save tokens, focus on what matters
- LLM-ready formats - Markdown, JSON, plain text optimized for AI
- Traditional formats - PDF and EPUB 3
- O'Reilly V2 API - fast and reliable
- Images & styles included - complete book experience
- Web UI - search, preview, download
git clone https://github.com/mosaibah/oreilly-downloader.git
cd oreilly-downloader
docker compose up -dgit clone https://github.com/mosaibah/oreilly-downloader.git
cd oreilly-downloader
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python main.pyThen open http://localhost:8000
Click "Set Cookies" in the web interface and follow the steps:
Plugin-based microkernel design:
| Layer | Components |
|---|---|
| Kernel | Plugin registry, shared HTTP client |
| Core | Auth, Book, Chapters, Assets, HtmlProcessor |
| Output | Epub, Markdown, Pdf, PlainText, JsonExport |
| Utility | Chunking, Token, Downloader |
GET /api/status - auth check
GET /api/search?q= - find books
GET /api/book/{id} - metadata
POST /api/download - start export
GET /api/progress - SSE stream
Found a bug or have an idea? PRs and issues are always welcome!
- Chunking: streaming & memory fix —
chunk_book()now streams chunks directly to disk instead of accumulating in memory. Replacedtiktokentokenizer with a word-count heuristic to avoid memory spikes on large books. (@zirkleta) - System: command injection fix —
_show_macos_picker()rejects paths containing"before interpolating into osascript, preventing command injection via crafted directory names. (@zirkleta) patch_chunk_titles.py— New utility script that backfillsbook_titleinto existing*_chunks.jsonlfiles in the output directory. (@zirkleta)
MIT

