Audiopub

Turn your EPUBs into high-fidelity audiobooks locally.

Audiopub is a slick, desktop-based power tool that converts ebooks into chapterized .m4b audiobooks using on-device TTS engines. It runs entirely on your machine—no cloud APIs, no per-character fees.

Supported TTS Engines:

Supertonic (default): Supertone's high-quality diffusion-based TTS
NeuTTS Air: Instant voice cloning from 3-15 second audio samples

Features

⚡ Local & Private: Powered by ONNX Runtime. Zero data leaves your rig.
🚀 GPU Acceleration: Optional CUDA support for 10x faster synthesis on NVIDIA GPUs.
💎 Deep Dark UI: A beautiful, responsive glass-morphism interface built with NiceGUI.
🧠 Smart Context: Splits text intelligently by sentence to maintain narrative flow.
⏯️ Resumable: Crash? Quit? No problem. Resume exactly where you left off.
📦 Auto-Muxing: Outputs ready-to-listen .m4b files with proper metadata and chapters.
🎚️ Configurable Quality: Adjust inference steps (2-128) for speed/quality tradeoff.

Quick Start

Install:

git clone https://github.com/yourusername/audiopub.git
cd audiopub
git lfs pull  # Essential: Downloads the AI models
pip install -r requirements.txt

Run:
```
python audiopub/main.py
```
The WebUI launches automatically at http://localhost:8080
Generate:
- Select your EPUB and Voice
- GPU acceleration is enabled by default with 16-step quality setting
- Adjust GPU toggle and inference steps as needed in the UI
- Hit Generate and enjoy high-quality audiobooks at 6-10x faster speed! ⚡

Note: GPU is automatically configured on startup. For manual setup or troubleshooting, see GPU_SETUP.md.

Requirements

Python 3.9+
FFmpeg (Must be in your PATH)
Git LFS (For model weights)

Voice Styles

For Supertonic (default):

Drop your custom .json voice style configs into audiopub/assets/. The app will auto-detect them.

For NeuTTS Air:

Install additional dependencies:

pip install -r requirements-neutts.txt
sudo apt-get install espeak  # or: brew install espeak on macOS

Set the TTS engine:
```
export AUDIOPUB_TTS_ENGINE=neutts-air
```
Add voice samples: Place .wav audio files (3-15 seconds) with matching .txt transcript files in:
- audiopub/assets/reference_audio/
Example:
```
reference_audio/
├── narrator1.wav    # 5 seconds of clean speech
└── narrator1.txt    # Transcript of the audio
```
See audiopub/assets/reference_audio/README.md for detailed setup instructions.

Switching TTS Engines

Change engines by setting the environment variable:

# Use NeuTTS Air (with voice cloning)
export AUDIOPUB_TTS_ENGINE=neutts-air

# Use Supertonic (default)
export AUDIOPUB_TTS_ENGINE=supertonic

GPU Acceleration

Default Configuration

✅ GPU acceleration is enabled by default in the WebUI with quality-focused settings (16-step inference for balanced quality/speed).

In the WebUI:

"GPU ACCELERATION" toggle starts as ON
"INFERENCE STEPS" defaults to 16 (balanced quality)
Adjust steps with the slider (2-128) anytime:
- 2-5 steps: Real-time/streaming (fastest, lower quality)
- 16 steps: Balanced quality/speed (default)
- 32+ steps: High quality (slower, best audio)

Setup

Automatic (Recommended): GPU support is automatically configured on startup if available.

Manual Setup: If you need to manually enable GPU:

# Enable GPU for current shell session
source setup_gpu.sh

# Or add to your ~/.bashrc or ~/.zshrc

Requirements:

NVIDIA GPU with CUDA support
CUDA 11.8+ or 12.x
onnxruntime-gpu installed (installed by default)

Benchmarking

Test GPU performance on your hardware:

# CPU-only benchmark
python benchmark_gpu.py

# GPU benchmark (with CUDA setup)
source setup_gpu.sh
python benchmark_gpu.py --gpu --steps 2,5,16,32,64,128

# Save results for comparison
python benchmark_gpu.py --gpu --output gpu_results.json

Real-World Performance Examples:

RTX 2070 (Tested):

Steps  | GPU Speed      | CPU Speed    | Speedup
-------|----------------|--------------|--------
2      | 1915-3614 c/s  | 182-409 c/s  | 8.8-10.5x
16     | 534-1091 c/s   | 89-163 c/s   | 6.0-6.7x
32     | 285-606 c/s    | 56-98 c/s    | 5.1-6.2x

Expected Performance (RTX4090):

GPU: ~12,000 chars/sec (2-step) → ~600 chars/sec (16-step)
CPU: ~1,200 chars/sec (2-step) → ~400 chars/sec (16-step)

See GPU_SETUP.md, GPU_DEFAULTS.md, and GPU_BENCHMARKING.md for detailed setup, configuration, performance tuning, and troubleshooting.

For AI Agents (Claude Code, GitHub Copilot, etc.)

This repository is optimized for AI-assisted development. Before working on Audiopub:

Read AGENT_NOTES.md – Quick system prompt (one page)
Review AGENT_GUIDE.md – Operating procedures & testing checklist
Understand ARCHITECTURE.md – Module boundaries & what's locked

Key constraints:

✅ Safe changes: Add voice styles, optimize performance, fix bugs, add tests
❌ Unsafe changes: Remove CPU support, break TTS factory, add cloud APIs, remove channel boundaries
🔒 Locked files: tts_base.py (method signatures), some aspects of audio.py, epub.py, worker.py

Additional context:

STRATEGY.md – Project philosophy & non-negotiable constraints
ENTRYPOINTS.md – Module map & data flow diagrams
ROADMAP.md – Phase tracking & current blockers
repo_manifest.json – Machine-readable metadata

These documents live in the repo permanently and are updated as the project evolves.

Built for audiophiles who code.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
audiopub		audiopub
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENT_GUIDE.md		AGENT_GUIDE.md
AGENT_NOTES.md		AGENT_NOTES.md
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
ENTRYPOINTS.md		ENTRYPOINTS.md
GPU_BENCHMARKING.md		GPU_BENCHMARKING.md
GPU_DEFAULTS.md		GPU_DEFAULTS.md
GPU_SETUP.md		GPU_SETUP.md
LICENSE		LICENSE
MODULE_MAP.yaml		MODULE_MAP.yaml
NEUTTS_AIR_INTEGRATION.md		NEUTTS_AIR_INTEGRATION.md
README.md		README.md
ROADMAP.md		ROADMAP.md
STRATEGY.md		STRATEGY.md
benchmark_gpu.py		benchmark_gpu.py
cpu_final.json		cpu_final.json
cpu_test.json		cpu_test.json
cpu_test_heavy.json		cpu_test_heavy.json
error_verification.png		error_verification.png
gpu_final.json		gpu_final.json
gpu_test.json		gpu_test.json
gpu_test_fixed.json		gpu_test_fixed.json
gpu_test_heavy.json		gpu_test_heavy.json
repo_manifest.json		repo_manifest.json
requirements-neutts.txt		requirements-neutts.txt
requirements.txt		requirements.txt
screenshot.png		screenshot.png
setup_gpu.sh		setup_gpu.sh
setup_gpu_env.py		setup_gpu_env.py
verify_frontend.py		verify_frontend.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audiopub

Supported TTS Engines:

Features

Quick Start

Requirements

Voice Styles

For Supertonic (default):

For NeuTTS Air:

Switching TTS Engines

GPU Acceleration

Default Configuration

Setup

Benchmarking

For AI Agents (Claude Code, GitHub Copilot, etc.)

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

License

hebbihebb/Audiopub

Folders and files

Latest commit

History

Repository files navigation

Audiopub

Supported TTS Engines:

Features

Quick Start

Requirements

Voice Styles

For Supertonic (default):

For NeuTTS Air:

Switching TTS Engines

GPU Acceleration

Default Configuration

Setup

Benchmarking

For AI Agents (Claude Code, GitHub Copilot, etc.)

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages