- Numeric IDs in
bilbo listoutput; all commands now accept a numeric ID instead of a title (e.g.,bilbo info 1) (f9138b0) - Command aliases:
remove→delete,run→process(f9138b0) bilbo helpcommand (89936d2)- Live VAD progress (
EN VAD: 47%) during segmentation stage using Silero'sprogress_tracking_callback(bb266b8)
bilbo processaudio arguments are now optional to allow re-running stages via--from/--towithout re-specifying audio files (89936d2)- Removed per-stage CLI commands (
bilbo transcribe,bilbo segment,bilbo align,bilbo export) in favour ofbilbo process --from N --to N(89936d2) - Default Whisper model changed from
large-v3-turbotolarge-v3(4ce3820) - Transcription now runs sequentially (L1 then L2) rather than in parallel; a single shared model cannot run concurrently (4ce3820)
refine_timestampsnow uses ownership-based VAD matching: a speech region is assigned to the segment that contains >50% of its duration, and both.startand.endare snapped to the matched regions' extremes with ±50 ms padding (bb266b8)
- Assembly no longer redundantly reopens audio files for each pair (23c6729)
- Ported full Bertalign two-pass alignment algorithm (overlap encoding, anchor DP, m:n second pass with margin scoring) replacing the previous anchor+gap-fill approach (2bd8bc1)
- GitHub Actions release workflow: auto-publish to PyPI on tag, TestPyPI
.devNbuilds on main push (4c79154) - Log skipped sentences during alignment (98aab78)
- Cover merging uses vectorized numpy instead of per-row loop (7118f92)
refine_timestampsreuses a single openSoundFileinstead of re-reading per segment (7118f92)find_problematic_regionsuses numpy cumsum for sliding window (7118f92)- Deduplicated chapter-finishing logic in
map_chapters_to_output(7118f92) - Fixed
Library.renameto remap audio paths by directory component, not string replace (7118f92)
- Language auto-detection (omit
--l1/--l2to detect from audio) (6080564) bilbo renamecommand to rename books (6080564)bilbo transcribe,bilbo segment,bilbo alignper-stage commands (6080564)- Slug-based filesystem paths (titles can now contain spaces and special characters) (6080564)
- CLI now accepts human-readable titles instead of slugs (6080564)
- Library index keyed by auto-generated slugs internally (6080564)
--device autonow correctly resolves to CUDA/CPU (f9f2ec8)
- LLM-powered metadata merging (via local ollama) (86f3ff2)
- Diagonal poster/cover merging (d8e80a0)
- Warning tones around misaligned regions (d3d1ac3)
- Better problem passage detection with sliding-window smoothing (d3d1ac3)
- Energy-based segment timestamp extension (d64d16e)
- PyPI release (
pip install bilbo-audiobook) (8b0dfdf)
- Improved logging and progress reporting (4afcf26)
- Refactored CLI with Click (d64d16e)
- Metadata no longer lost on re-export (ab003a0)
- Initial release (dfce168)
- 4-stage pipeline: transcription, segmentation, alignment, assembly (dfce168)
- faster-whisper transcription with word-level timestamps (dfce168)
- pySBD sentence segmentation (dfce168)
- LaBSE cross-lingual alignment (anchor + gap-fill DP) (7475dbe)
- Audio assembly with LUFS normalization, gaps, and crossfades (38f8bee)
- Text export of aligned pairs (dd051fa)
- Metadata extraction and cover art merging (a6ad026)
- Multi-threaded parallel transcription (38f8bee)
- Library management (
bilbo list,bilbo info,bilbo delete) (dfce168)