Skip to content

Mosaibah/oreilly-ingest

Repository files navigation

O'Reilly Ingest

We're in the AI era. You want to chat with your favorite technical books using Claude Code, Cursor, or any LLM tool. This gets you there.

Export any O'Reilly book to Markdown, PDF, EPUB, JSON, or plain text. Download by chapters so you don't burn through your context window.

Requires a valid O'Reilly Learning subscription.

Disclaimer

For personal and educational use only. Please read the O'Reilly Terms of Service.

Credits

Inspired by safaribooks by @lorenzodifuccia.

Features

  • Export by chapters - save tokens, focus on what matters
  • LLM-ready formats - Markdown, JSON, plain text optimized for AI
  • Traditional formats - PDF and EPUB 3
  • O'Reilly V2 API - fast and reliable
  • Images & styles included - complete book experience
  • Web UI - search, preview, download

Main Page

Quick Start

Docker

git clone https://github.com/mosaibah/oreilly-downloader.git
cd oreilly-downloader
docker compose up -d

Python

git clone https://github.com/mosaibah/oreilly-downloader.git
cd oreilly-downloader
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python main.py

Then open http://localhost:8000

Setting Up Cookies

Click "Set Cookies" in the web interface and follow the steps:

Cookie Setup

Architecture

Plugin-based microkernel design:

Layer Components
Kernel Plugin registry, shared HTTP client
Core Auth, Book, Chapters, Assets, HtmlProcessor
Output Epub, Markdown, Pdf, PlainText, JsonExport
Utility Chunking, Token, Downloader

API

GET  /api/status       - auth check
GET  /api/search?q=    - find books
GET  /api/book/{id}    - metadata
POST /api/download     - start export
GET  /api/progress     - SSE stream

Contributing

Found a bug or have an idea? PRs and issues are always welcome!

Recent Changes

  • Chunking: streaming & memory fixchunk_book() now streams chunks directly to disk instead of accumulating in memory. Replaced tiktoken tokenizer with a word-count heuristic to avoid memory spikes on large books. (@zirkleta)
  • System: command injection fix_show_macos_picker() rejects paths containing " before interpolating into osascript, preventing command injection via crafted directory names. (@zirkleta)
  • patch_chunk_titles.py — New utility script that backfills book_title into existing *_chunks.jsonl files in the output directory. (@zirkleta)

License

MIT

Star History

Star History Chart

About

Oreilly Downloader (PDF, EPUB, Markdown, JSON, Text)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors