epub2md

Split EPUB ebooks into Markdown files, preserving the table of contents hierarchy as a multi-level directory structure.

Why This Skill?

Reading English-language EPUBs is a core part of language learning for millions of Chinese and Asian readers. But the workflow is painful: you toggle between the book and a dictionary, copy-paste paragraphs into translation tools, and lose your place — over and over.

epub2md was built to change that. By splitting an EPUB into per-chapter Markdown files, you unlock a fundamentally better reading loop:

Feed chapters directly into AI — Drop a chapter into Claude, ChatGPT, or any LLM for instant translation, vocabulary annotation, or bilingual interleaving. No more copy-pasting from a clunky e-reader.
Bilingual interleaved reading — Markdown makes it trivial to produce paragraph-by-paragraph bilingual text (original + translation), which matches how most Asian learners actually read: English paragraph first, Chinese translation below, repeat. This interleaved rhythm keeps you in the flow instead of bouncing between tabs.
Chapter-level context control — LLMs have context limits. Splitting by chapter means each chunk is the right size for high-quality AI assistance — no truncation, no loss, no hallucination from over-stuffing.
Markdown as universal format — Notes, highlights, and AI annotations live alongside the text. Edit in Obsidian, VS Code, or any Markdown editor. Version-control your study notes with Git. The format is yours.

In short: epub2md turns a locked-up ebook into a learner-ready, AI-ready, Markdown-native study kit.

Features

Zero dependencies — uses only the Python standard library (zipfile, html.parser, re, json, etc.)
TOC-aware splitting — reads NCX (EPUB2) and NAV (EPUB3) tables of contents to build the chapter hierarchy
Directory structure — creates nested folders matching the book's chapter hierarchy
Image extraction — saves all images to an images/ directory and fixes Markdown references with correct relative paths
Smart title enhancement — automatically enriches thin titles (e.g. "1", "Part II") by extracting descriptive text from the HTML content
Spine fallback — when no TOC is available, falls back to the OPF spine order and extracts real titles from HTML headings
Manifest tracking — generates manifest.json with full chapter metadata for progress tracking and resuming

Usage

CLI

python scripts/epub2md.py <input.epub> [--output-dir <dir>]

<input.epub> — path to the EPUB file
--output-dir — output directory (defaults to a directory named after the EPUB file in the same location)

Claude Code Skill

This project doubles as a Claude Code skill.

Install the Skill

Clone this repository into your Claude Code skills folder:

# macOS / Linux
git clone https://github.com/AdkinsHan/epub2md.git ~/.claude/skills/epub2md

# Windows (PowerShell)
git clone https://github.com/AdkinsHan/epub2md.git "$env:USERPROFILE\.claude\skills\epub2md"

Or if you've already cloned it elsewhere:

# macOS / Linux
cp -r /path/to/epub2md ~/.claude/skills/epub2md

# Windows (PowerShell)
Copy-Item -Recurse "D:\path\to\epub2md" "$env:USERPROFILE\.claude\skills\epub2md"

Restart Claude Code or reload the session — the /epub2md command will be available automatically.

Use the Skill

In a Claude Code session, simply type:

/epub2md <path-to-book.epub>

The skill will:

Validate and resolve the EPUB file path
Run the Python script to split the EPUB into chapters
Present the chapter list and output structure for review
Verify the results

You can also specify a custom output directory:

/epub2md /path/to/book.epub --output-dir /path/to/output

Or simply mention an EPUB file in conversation — the skill auto-triggers on keywords like "EPUB", "epub to markdown", or "split ebook".

Output Structure

Given MyBook.epub, the output looks like:

MyBook/
├── 001_Foreword.md
├── 002_Chapter One/
│   ├── 002_Chapter One.md
│   ├── 003_Section 1.1.md
│   └── 004_Section 1.2.md
├── 005_Chapter Two/
│   ├── 005_Chapter Two.md
│   └── 006_Section 2.1.md
├── images/
│   ├── cover.png
│   ├── fig1.jpg
│   └── fig2.png
└── manifest.json

Image references in Markdown files use correct relative paths (e.g. images/cover.png for root-level chapters, ../images/cover.png for nested chapters).

Manifest Format

manifest.json contains:

{
  "source_file": "MyBook.epub",
  "book_title": "My Book",
  "output_dir": "/path/to/MyBook",
  "total_chapters": 6,
  "attachments_dir": "images",
  "image_count": 3,
  "chapters": [
    {
      "index": 0,
      "title": "Foreword",
      "src": "foreword.xhtml",
      "dir_path": "",
      "filename": "001_Foreword.md",
      "full_path": "/path/to/MyBook/001_Foreword.md",
      "level": 0,
      "status": "done"
    }
  ]
}

HTML to Markdown Conversion

Supports common EPUB HTML elements:

HTML	Markdown
`<h1>`–`<h6>`	`#`–`######`
`<p>`	Paragraph with blank line
`<strong>`, `<b>`	`bold`
`<em>`, `<i>`	`italic`
`<code>`	`inline`
`<pre>`	Fenced code block
`<ul>`, `<ol>`, `<li>`	`- item` / `1. item`
`<blockquote>`	`> quote`
`<a>`	`[text](href)`
`<img>`	`![alt](src)`
`<hr>`	`---`
`<dl>`, `<dt>`, `<dd>`	Definition lists
`<sup>`, `<sub>`	`^superscript`, `~subscript`
HTML entities	Unicode characters

Limitations & Caveats

No table support — HTML <table> elements are preserved as-is or stripped; complex tables will not render correctly in Markdown.
No CSS styling — Font sizes, colors, column layouts, and other CSS-driven formatting are lost during conversion.
Footnotes & endnotes — EPUB footnote links and back-references are not converted to Markdown footnote syntax ([^1]); they become plain links.
SVG images — Only raster images (PNG, JPG, etc.) are extracted. SVG and embedded base64 images are not handled.
DRM-protected EPUBs — Files with digital rights management will fail to open. This tool only works with DRM-free EPUBs.
Encoding edge cases — While the script tries UTF-8, GBK, and Latin-1, some rarely-encoded EPUBs may produce garbled text. Check the output if the source file uses an unusual encoding.
Merged chapters — Some EPUBs pack multiple logical chapters into a single HTML file. The script splits by TOC entries, not by internal headings, so these will appear as one long chapter.
Not a full ebook reader — This tool is designed for extraction and conversion, not for reading. It does not preserve reading position, bookmarks, or annotations from the original EPUB.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

epub2md

Why This Skill?

Features

Usage

CLI

Claude Code Skill

Install the Skill

Use the Skill

Output Structure

Manifest Format

HTML to Markdown Conversion

Limitations & Caveats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

epub2md

Why This Skill?

Features

Usage

CLI

Claude Code Skill

Install the Skill

Use the Skill

Output Structure

Manifest Format

HTML to Markdown Conversion

Limitations & Caveats

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages