Split EPUB ebooks into Markdown files, preserving the table of contents hierarchy as a multi-level directory structure.
Reading English-language EPUBs is a core part of language learning for millions of Chinese and Asian readers. But the workflow is painful: you toggle between the book and a dictionary, copy-paste paragraphs into translation tools, and lose your place — over and over.
epub2md was built to change that. By splitting an EPUB into per-chapter Markdown files, you unlock a fundamentally better reading loop:
- Feed chapters directly into AI — Drop a chapter into Claude, ChatGPT, or any LLM for instant translation, vocabulary annotation, or bilingual interleaving. No more copy-pasting from a clunky e-reader.
- Bilingual interleaved reading — Markdown makes it trivial to produce paragraph-by-paragraph bilingual text (original + translation), which matches how most Asian learners actually read: English paragraph first, Chinese translation below, repeat. This interleaved rhythm keeps you in the flow instead of bouncing between tabs.
- Chapter-level context control — LLMs have context limits. Splitting by chapter means each chunk is the right size for high-quality AI assistance — no truncation, no loss, no hallucination from over-stuffing.
- Markdown as universal format — Notes, highlights, and AI annotations live alongside the text. Edit in Obsidian, VS Code, or any Markdown editor. Version-control your study notes with Git. The format is yours.
In short: epub2md turns a locked-up ebook into a learner-ready, AI-ready, Markdown-native study kit.
- Zero dependencies — uses only the Python standard library (
zipfile,html.parser,re,json, etc.) - TOC-aware splitting — reads NCX (EPUB2) and NAV (EPUB3) tables of contents to build the chapter hierarchy
- Directory structure — creates nested folders matching the book's chapter hierarchy
- Image extraction — saves all images to an
images/directory and fixes Markdown references with correct relative paths - Smart title enhancement — automatically enriches thin titles (e.g.
"1","Part II") by extracting descriptive text from the HTML content - Spine fallback — when no TOC is available, falls back to the OPF spine order and extracts real titles from HTML headings
- Manifest tracking — generates
manifest.jsonwith full chapter metadata for progress tracking and resuming
python scripts/epub2md.py <input.epub> [--output-dir <dir>]<input.epub>— path to the EPUB file--output-dir— output directory (defaults to a directory named after the EPUB file in the same location)
This project doubles as a Claude Code skill.
Clone this repository into your Claude Code skills folder:
# macOS / Linux
git clone https://github.com/AdkinsHan/epub2md.git ~/.claude/skills/epub2md
# Windows (PowerShell)
git clone https://github.com/AdkinsHan/epub2md.git "$env:USERPROFILE\.claude\skills\epub2md"Or if you've already cloned it elsewhere:
# macOS / Linux
cp -r /path/to/epub2md ~/.claude/skills/epub2md
# Windows (PowerShell)
Copy-Item -Recurse "D:\path\to\epub2md" "$env:USERPROFILE\.claude\skills\epub2md"Restart Claude Code or reload the session — the /epub2md command will be available automatically.
In a Claude Code session, simply type:
/epub2md <path-to-book.epub>
The skill will:
- Validate and resolve the EPUB file path
- Run the Python script to split the EPUB into chapters
- Present the chapter list and output structure for review
- Verify the results
You can also specify a custom output directory:
/epub2md /path/to/book.epub --output-dir /path/to/output
Or simply mention an EPUB file in conversation — the skill auto-triggers on keywords like "EPUB", "epub to markdown", or "split ebook".
Given MyBook.epub, the output looks like:
MyBook/
├── 001_Foreword.md
├── 002_Chapter One/
│ ├── 002_Chapter One.md
│ ├── 003_Section 1.1.md
│ └── 004_Section 1.2.md
├── 005_Chapter Two/
│ ├── 005_Chapter Two.md
│ └── 006_Section 2.1.md
├── images/
│ ├── cover.png
│ ├── fig1.jpg
│ └── fig2.png
└── manifest.json
Image references in Markdown files use correct relative paths (e.g. images/cover.png for root-level chapters, ../images/cover.png for nested chapters).
manifest.json contains:
{
"source_file": "MyBook.epub",
"book_title": "My Book",
"output_dir": "/path/to/MyBook",
"total_chapters": 6,
"attachments_dir": "images",
"image_count": 3,
"chapters": [
{
"index": 0,
"title": "Foreword",
"src": "foreword.xhtml",
"dir_path": "",
"filename": "001_Foreword.md",
"full_path": "/path/to/MyBook/001_Foreword.md",
"level": 0,
"status": "done"
}
]
}Supports common EPUB HTML elements:
| HTML | Markdown |
|---|---|
<h1>–<h6> |
#–###### |
<p> |
Paragraph with blank line |
<strong>, <b> |
**bold** |
<em>, <i> |
*italic* |
<code> |
`inline` |
<pre> |
Fenced code block |
<ul>, <ol>, <li> |
- item / 1. item |
<blockquote> |
> quote |
<a> |
[text](href) |
<img> |
 |
<hr> |
--- |
<dl>, <dt>, <dd> |
Definition lists |
<sup>, <sub> |
^superscript, ~subscript |
| HTML entities | Unicode characters |
- No table support — HTML
<table>elements are preserved as-is or stripped; complex tables will not render correctly in Markdown. - No CSS styling — Font sizes, colors, column layouts, and other CSS-driven formatting are lost during conversion.
- Footnotes & endnotes — EPUB footnote links and back-references are not converted to Markdown footnote syntax (
[^1]); they become plain links. - SVG images — Only raster images (PNG, JPG, etc.) are extracted. SVG and embedded base64 images are not handled.
- DRM-protected EPUBs — Files with digital rights management will fail to open. This tool only works with DRM-free EPUBs.
- Encoding edge cases — While the script tries UTF-8, GBK, and Latin-1, some rarely-encoded EPUBs may produce garbled text. Check the output if the source file uses an unusual encoding.
- Merged chapters — Some EPUBs pack multiple logical chapters into a single HTML file. The script splits by TOC entries, not by internal headings, so these will appear as one long chapter.
- Not a full ebook reader — This tool is designed for extraction and conversion, not for reading. It does not preserve reading position, bookmarks, or annotations from the original EPUB.