Turn your EPUBs into high-fidelity audiobooks locally.
Audiopub is a slick, desktop-based power tool that converts ebooks into chapterized .m4b audiobooks using on-device TTS engines. It runs entirely on your machine—no cloud APIs, no per-character fees.
- Supertonic (default): Supertone's high-quality diffusion-based TTS
- NeuTTS Air: Instant voice cloning from 3-15 second audio samples
- ⚡ Local & Private: Powered by ONNX Runtime. Zero data leaves your rig.
- 🚀 GPU Acceleration: Optional CUDA support for 10x faster synthesis on NVIDIA GPUs.
- 💎 Deep Dark UI: A beautiful, responsive glass-morphism interface built with NiceGUI.
- 🧠 Smart Context: Splits text intelligently by sentence to maintain narrative flow.
- ⏯️ Resumable: Crash? Quit? No problem. Resume exactly where you left off.
- 📦 Auto-Muxing: Outputs ready-to-listen
.m4bfiles with proper metadata and chapters. - 🎚️ Configurable Quality: Adjust inference steps (2-128) for speed/quality tradeoff.
-
Install:
git clone https://github.com/yourusername/audiopub.git cd audiopub git lfs pull # Essential: Downloads the AI models pip install -r requirements.txt
-
Run:
python audiopub/main.py
The WebUI launches automatically at
http://localhost:8080 -
Generate:
- Select your EPUB and Voice
- GPU acceleration is enabled by default with 16-step quality setting
- Adjust GPU toggle and inference steps as needed in the UI
- Hit Generate and enjoy high-quality audiobooks at 6-10x faster speed! ⚡
Note: GPU is automatically configured on startup. For manual setup or troubleshooting, see GPU_SETUP.md.
- Python 3.9+
- FFmpeg (Must be in your PATH)
- Git LFS (For model weights)
Drop your custom .json voice style configs into audiopub/assets/. The app will auto-detect them.
-
Install additional dependencies:
pip install -r requirements-neutts.txt sudo apt-get install espeak # or: brew install espeak on macOS -
Set the TTS engine:
export AUDIOPUB_TTS_ENGINE=neutts-air -
Add voice samples: Place
.wavaudio files (3-15 seconds) with matching.txttranscript files in:audiopub/assets/reference_audio/
Example:
reference_audio/ ├── narrator1.wav # 5 seconds of clean speech └── narrator1.txt # Transcript of the audioSee
audiopub/assets/reference_audio/README.mdfor detailed setup instructions.
Change engines by setting the environment variable:
# Use NeuTTS Air (with voice cloning)
export AUDIOPUB_TTS_ENGINE=neutts-air
# Use Supertonic (default)
export AUDIOPUB_TTS_ENGINE=supertonic✅ GPU acceleration is enabled by default in the WebUI with quality-focused settings (16-step inference for balanced quality/speed).
In the WebUI:
- "GPU ACCELERATION" toggle starts as ON
- "INFERENCE STEPS" defaults to 16 (balanced quality)
- Adjust steps with the slider (2-128) anytime:
- 2-5 steps: Real-time/streaming (fastest, lower quality)
- 16 steps: Balanced quality/speed (default)
- 32+ steps: High quality (slower, best audio)
Automatic (Recommended): GPU support is automatically configured on startup if available.
Manual Setup: If you need to manually enable GPU:
# Enable GPU for current shell session
source setup_gpu.sh
# Or add to your ~/.bashrc or ~/.zshrcRequirements:
- NVIDIA GPU with CUDA support
- CUDA 11.8+ or 12.x
onnxruntime-gpuinstalled (installed by default)
Test GPU performance on your hardware:
# CPU-only benchmark
python benchmark_gpu.py
# GPU benchmark (with CUDA setup)
source setup_gpu.sh
python benchmark_gpu.py --gpu --steps 2,5,16,32,64,128
# Save results for comparison
python benchmark_gpu.py --gpu --output gpu_results.jsonReal-World Performance Examples:
RTX 2070 (Tested):
Steps | GPU Speed | CPU Speed | Speedup
-------|----------------|--------------|--------
2 | 1915-3614 c/s | 182-409 c/s | 8.8-10.5x
16 | 534-1091 c/s | 89-163 c/s | 6.0-6.7x
32 | 285-606 c/s | 56-98 c/s | 5.1-6.2x
Expected Performance (RTX4090):
- GPU: ~12,000 chars/sec (2-step) → ~600 chars/sec (16-step)
- CPU: ~1,200 chars/sec (2-step) → ~400 chars/sec (16-step)
See GPU_SETUP.md, GPU_DEFAULTS.md, and GPU_BENCHMARKING.md for detailed setup, configuration, performance tuning, and troubleshooting.
This repository is optimized for AI-assisted development. Before working on Audiopub:
- Read AGENT_NOTES.md – Quick system prompt (one page)
- Review AGENT_GUIDE.md – Operating procedures & testing checklist
- Understand ARCHITECTURE.md – Module boundaries & what's locked
Key constraints:
- ✅ Safe changes: Add voice styles, optimize performance, fix bugs, add tests
- ❌ Unsafe changes: Remove CPU support, break TTS factory, add cloud APIs, remove channel boundaries
- 🔒 Locked files:
tts_base.py(method signatures), some aspects ofaudio.py,epub.py,worker.py
Additional context:
- STRATEGY.md – Project philosophy & non-negotiable constraints
- ENTRYPOINTS.md – Module map & data flow diagrams
- ROADMAP.md – Phase tracking & current blockers
- repo_manifest.json – Machine-readable metadata
These documents live in the repo permanently and are updated as the project evolves.
Built for audiophiles who code.
