A restart-safe Go-based pipeline that turns any PDF into a polished, single-file MP3.
- ✅ Code Standards Compliant: Follows Go Coding Standards, Bash Coding Standards, and Design Principles
- ✅ Fully tested on Fedora 42 (stock repos + RPM Fusion)
⚠️ Untested on Ubuntu – behavior there is unknown- 🔧 TTS layer powered by the project-specific fork:
https://github.com/nnikolov3/book_expert_f5-tts
- PDF → PNG: High-DPI page conversion (
pdf-to-png) - PNG → OCR: Text extraction with Tesseract (
png-to-text-tesseract) - OCR → Enhanced Text: LLM-powered narration enhancement (
png-text-augment) - Text Organization: Merging and structuring (
merge-text) - Text → Audio: TTS synthesis with F5-TTS (
text-to-wav) - Audio Processing: WAV → 48kHz mono → MP3 (
wav-to-mp3)
Architecture: Implemented in Go for performance and reliability, with comprehensive configuration management through project.toml. Follows modern software engineering practices with extensive testing, linting, and quality assurance.
book_expert/
├── bin/ # Compiled Go binaries
├── cmd/ # Go command implementations
│ ├── pdf-to-png/ # PDF → PNG conversion
│ ├── png-to-text-tesseract/ # OCR processing
│ ├── png-text-augment/ # LLM enhancement
│ ├── merge-text/ # Text concatenation
│ ├── text-to-wav/ # TTS synthesis
│ └── wav-to-mp3/ # Audio conversion
├── internal/ # Internal Go packages
│ ├── config/ # Configuration management
│ └── logging/ # Structured logging
├── scripts/ # Development tools
│ ├── test_pipeline.sh # Comprehensive testing
│ └── profile_go.sh # Performance profiling
├── test/ # Integration tests
├── logs/ # Pipeline and test logs
├── data/ # Processing workspace
│ ├── raw/ # Source PDFs (configurable)
│ └── <pdf_name>/ # Per-document processing
│ ├── png/ # Rendered pages
│ ├── text/ # OCR + enhanced text
│ ├── wav/ # TTS audio chunks
│ └── mp3/ # Final audiobook
├── project.toml # ★ Complete pipeline configuration ★
├── Makefile # Build and test automation
├── go.mod # Go module definition
├── DESIGN_PRINCIPLES_GUIDE.md # Development standards
├── GO_CODING_STANDARD.md # Go coding guidelines
└── README.md # This file
Note: All directory paths are configurable through project.toml - nothing is hardcoded.
sudo dnf install \
ghostscript tesseract tesseract-langpack-eng \
poppler-utils ImageMagick jq yq rsync ffmpeg \
shellcheck nproc coreutils awk grep curl flockgit clone https://github.com/nnikolov3/book_expert_f5-tts.git
cd book_expert_f5-tts
python -m venv .venv && source .venv/bin/activate
pip install -e .git clone https://github.com/<your-org>/book_expert.git
cd book_expert
make buildexport GEMINI_API_KEY="sk-…" # Google Gemini
export CEREBRAS_API_KEY="cb-…" # Cerebras inference endpoint
# export NVIDIA_API_KEY="na-…" # OptionalOpen project.toml and adjust:
[paths]/[directories]/[processing_dir]/[logs_dir]– folder layout[settings]– DPI,force, worker counts[google_api]&[cerebras_api]– model names, temps, tokens[f5_tts_settings]– TTS model, worker threads[prompts.*]– full system/user prompts used by each LLM call
| Binary | Purpose | Configuration |
|---|---|---|
pdf-to-png |
PDF → PNG conversion | settings.dpi, settings.* |
png-to-text-tesseract |
OCR text extraction | tesseract.* |
png-text-augment |
LLM enhancement | google_api.*, prompts.* |
merge-text |
Text concatenation | text_concatenation.* |
text-to-wav |
TTS synthesis | f5_tts_settings.* |
wav-to-mp3 |
Audio conversion | Audio processing settings |
# Build all binaries
make build
# Run pipeline stages
./bin/pdf-to-png --input data/raw --output data # PDF → PNG
./bin/png-to-text-tesseract --input data --output data # PNG → OCR + LLM
./bin/merge-text --input data --output data # Text → complete.txt
./bin/text-to-wav --input data --output data # Text → WAV chunks
./bin/wav-to-mp3 --input data --output data # WAV → MP3Key Features:
- 📋 Reads
project.tomlfor all configuration and paths - ❓ Supports
--helpfor detailed usage information - 🔄 Idempotent operations—safe to rerun; use
--forceto overwrite - 📊 Comprehensive logging and error reporting
- ⚡ Parallel processing where applicable
- Drop PDFs into the folder pointed to by
paths.input_dir(defaultdata/raw/). - Run
make buildto compile all binaries. - Execute the pipeline binaries in order.
- Find your audiobook at
<output_dir>/<pdf_name>/mp3/<pdf_name>.mp3.
[paths],[directories],[processing_dir],[logs_dir]– all folder locations[settings]– DPI, worker counts, force rebuild flag[google_api],[cerebras_api]– model, temp, tokens, key var names[prompts.*]– editable multi-paragraph prompts for every LLM stage[f5_tts_settings]– TTS model name and worker threads[retry]– global max-retries & back-off seconds
Dynamic Configuration: All binaries read configurations at runtime, enabling directory restructuring, model switching, and prompt modifications without recompilation.
| Issue | Solution |
|---|---|
| 🔨 Missing binary | Run make build or install system dependencies |
| 🔑 Missing API key | Binary indicates required environment variable |
| 🚫 HTTP 429 errors | Automatic retry with exponential backoff |
| 💥 Partial runs/crashes | Rerun binary; completed outputs skipped unless --force |
| 🐛 Pipeline issues | Check logs in logs/ directory |
| 🔍 Debug mode | Use --verbose flag for detailed output |
make help # Show all available targets
make build # Build all binaries
make test # Run full testing pipeline
make test-quick # Run essential checks only
make lint # Run linters on all code
make fmt # Format all code
make clean # Clean build artifacts
make ci # Full CI pipeline (clean, format, lint, test, build)- ✅ Comprehensive testing: Unit tests, integration tests, and performance benchmarks
- 🔍 Static analysis:
golangci-lint,staticcheck,go vet - 📏 Code formatting:
gofmt,goimports - 🔨 Shell script validation:
shellcheckfor all Bash scripts - 📊 Code coverage: Tracked and reported
- ⚡ Performance profiling: CPU and memory profiling available
make dev # Quick development cycle (format + test-quick)
make test-profile # Run tests with profiling enabled
make metrics # Show code quality metrics- Go Code: Must pass
go fmt,go vet, andgolangci-lint - Bash Scripts: Must pass
shellcheckvalidation - Design Principles: Follow guidelines in
DESIGN_PRINCIPLES_GUIDE.md - Documentation: Update
project.tomldocs for new configuration options
- ✅ All new functionality requires tests in
cmd/*/main_test.go - ✅ Code must follow the established patterns and conventions
- ✅ PRs must pass the full CI pipeline (
make ci) - ✅ Changes should maintain backward compatibility
- Unit tests for all public functions
- Integration tests for pipeline components
- Performance benchmarks for critical paths
- Error case coverage
This project follows modern software engineering practices with comprehensive testing, linting, and quality assurance. All code adheres to established coding standards and design principles for maintainability and reliability.