|
1 | 1 | # Lyrics Transcriber 🎶 |
2 | 2 |
|
3 | 3 |  |
4 | | - |
| 4 | + |
5 | 5 | [](https://github.com/nomadkaraoke/python-lyrics-transcriber/actions/workflows/test-and-publish.yml) |
6 | 6 | [](https://codecov.io/gh/nomadkaraoke/python-lyrics-transcriber) |
7 | 7 |
|
8 | | -Automatically create synchronised lyrics files in ASS and MidiCo LRC formats with word-level timestamps, using OpenAI Whisper and lyrics from Genius and Spotify, for convenience in use cases such as karaoke video production. |
9 | | - |
10 | | -## Features 🌟 |
11 | | - |
12 | | -- Automatically transcribe lyrics with word-level timestamps. |
13 | | -- Outputs lyrics in ASS and MidiCo LRC formats. |
14 | | -- Can fetch lyrics from with Genius and Spotify. |
15 | | -- Command Line Interface (CLI) for easy usage. |
16 | | -- Can be included and used in other Python projects. |
17 | | - |
18 | | -## Installation 🛠️ |
19 | | - |
20 | | -### Prerequisites |
21 | | - |
22 | | -- Python 3.10 or higher |
23 | | -- [Optional] Genius API token if you want to fetch lyrics from Genius |
24 | | -- [Optional] Spotify cookie value if you want to fetch lyrics from Spotify |
25 | | -- [Optional] OpenAI API token if you want to use LLM correction of the transcribed lyrics |
26 | | -- [Optional] AudioShake API token if you want to use a much higher quality (but paid) API for lyrics transcription |
27 | | - |
| 8 | +Create synchronized karaoke assets from an audio file with word‑level timing: fetch lyrics, transcribe audio, auto‑correct against references, review in a web UI, and export ASS, LRC, CDG, and video. |
| 9 | + |
| 10 | +### What this project is now |
| 11 | +- **Modular pipeline** orchestrated by `LyricsTranscriber` with clear configs |
| 12 | +- **Transcription** via AudioShake (preferred) and Whisper on RunPod (fallback) |
| 13 | +- **Lyrics providers**: Genius, Spotify, Musixmatch, or a local file |
| 14 | +- **Rule‑based correction** with optional **LLM‑assisted** gap fixes |
| 15 | +- **Human review** server + frontend for iterative corrections and previews |
| 16 | +- **Outputs**: original/corrected text, corrections JSON, LRC, ASS, CDG(+MP3/ZIP), and video |
| 17 | + |
| 18 | +## Features |
| 19 | +- **Multi-transcriber orchestration** with caching per audio hash |
| 20 | + - AudioShake API (priority 1) |
| 21 | + - Whisper via RunPod + Dropbox upload (priority 2) |
| 22 | +- **Lyrics fetching** with caching per artist/title |
| 23 | + - Genius (token or RapidAPI) • Spotify (cookie or RapidAPI) • Musixmatch (RapidAPI) • Local file |
| 24 | +- **Correction engine** |
| 25 | + - Anchor/gap detection, multiple rule handlers (word count, syllables, relaxed, punctuation, extend‑anchor) |
| 26 | + - Optional LLM handlers (Ollama local, or OpenRouter with `OPENROUTER_API_KEY`) |
| 27 | +- **Review UI** (FastAPI) at `http://localhost:8000` |
| 28 | + - Edit corrections, toggle handlers, add lyrics sources, generate preview video |
| 29 | +- **Rich outputs** |
| 30 | + - Plain text (original/corrected), corrections `JSON`, `*.lrc` (MidiCo), `*.ass` (karaoke), `*.cdg` with `*.mp3` and ZIP, and MP4/MKV video |
| 31 | + - Subtitle offset, line wrapping, styles via JSON |
| 32 | + |
| 33 | +## Install |
28 | 34 | ``` |
29 | 35 | pip install lyrics-transcriber |
30 | 36 | ``` |
31 | 37 |
|
32 | | -> **Warning** |
33 | | -> The package published to PyPI was created by manually editing `poetry.lock` to remove [triton](https://github.com/openai/triton), as it is technically a sub-dependency from openai-whisper but is currently only supported on Linux (whisper still works fine without it, and I want this package to be usable on any platform) |
34 | | -
|
35 | | -## Docker |
36 | | - |
37 | | -You can use the pre-built container image `beveradb/lyrics-transcriber:0.16.0` on Docker hub if you want, here's an example: |
38 | | - |
39 | | -```sh |
40 | | -docker run \ |
41 | | - -v `pwd`/input:/input \ |
42 | | - -v `pwd`/output:/output \ |
43 | | -beveradb/lyrics-transcriber:0.16.0 \ |
44 | | - --log_level debug \ |
45 | | - --output_dir /output \ |
46 | | - --render_video \ |
47 | | - --video_background_image /input/your-background-image.png \ |
48 | | - --video_resolution 360p \ |
49 | | - /input/song.flac |
| 38 | +### System requirements |
| 39 | +- Python 3.10–3.13 |
| 40 | +- FFmpeg (required for audio probe and video rendering) |
| 41 | +- spaCy English model (phrase analyzer used by correction): |
50 | 42 | ``` |
51 | | - |
52 | | -## Usage 🚀 |
53 | | - |
54 | | -### As a standalone CLI |
55 | | - |
56 | | -1. To transcribe lyrics from an audio file: |
57 | | - |
| 43 | +python -m spacy download en_core_web_sm |
58 | 44 | ``` |
59 | | -lyrics-transcriber /path/to/your/audiofile.mp3 |
60 | | -``` |
61 | | - |
62 | | -2. To specify Genius API token, song artist, and song title for auto-correction: |
63 | 45 |
|
| 46 | +## Quick start (CLI) |
| 47 | +Minimal run (transcribe + LRC/ASS, no video/CDG): |
| 48 | +```bash |
| 49 | +lyrics-transcriber /path/to/song.mp3 --skip_video --skip_cdg |
64 | 50 | ``` |
65 | | -lyrics-transcriber /path/to/your/audiofile.mp3 --genius_api_token YOUR_API_TOKEN --artist "Artist Name" --title "Song Title" |
66 | | -``` |
67 | | - |
68 | | -### As a Python package in your project |
69 | 51 |
|
70 | | -1. Import LyricsTranscriber in your Python script: |
| 52 | +Use AudioShake and auto‑fetch lyrics (Genius + artist/title): |
| 53 | +```bash |
| 54 | +export AUDIOSHAKE_API_TOKEN=... # or pass --audioshake_api_token |
| 55 | +export GENIUS_API_TOKEN=... |
| 56 | +lyrics-transcriber /path/to/song.mp3 --artist "Artist" --title "Song" |
| 57 | +``` |
71 | 58 |
|
| 59 | +Use Whisper on RunPod (fallback or standalone): |
| 60 | +```bash |
| 61 | +export RUNPOD_API_KEY=... |
| 62 | +export WHISPER_RUNPOD_ID=... # your RunPod endpoint ID |
| 63 | +lyrics-transcriber /path/to/song.mp3 --skip_cdg --skip_video |
72 | 64 | ``` |
73 | | -from lyrics_transcriber import LyricsTranscriber |
| 65 | + |
| 66 | +Provide a local lyrics file instead of fetching: |
| 67 | +```bash |
| 68 | +lyrics-transcriber /path/to/song.mp3 --lyrics_file /path/to/lyrics.txt |
74 | 69 | ``` |
75 | 70 |
|
76 | | -1. Create an instance and use it: |
| 71 | +Render video/CDG (requires a styles JSON file): |
| 72 | +```bash |
| 73 | +lyrics-transcriber /path/to/song.mp3 \ |
| 74 | + --output_styles_json /path/to/styles.json \ |
| 75 | + --video_resolution 1080p |
| 76 | +``` |
77 | 77 |
|
| 78 | +### Common flags |
| 79 | +- **Song identification**: `--artist`, `--title`, `--lyrics_file` |
| 80 | +- **APIs**: `--audioshake_api_token`, `--genius_api_token`, `--spotify_cookie`, `--runpod_api_key`, `--whisper_runpod_id` |
| 81 | +- **Output**: `--output_dir`, `--cache_dir`, `--output_styles_json`, `--subtitle_offset` |
| 82 | +- **Feature toggles**: `--skip_lyrics_fetch`, `--skip_transcription`, `--skip_correction`, `--skip_plain_text`, `--skip_lrc`, `--skip_cdg`, `--skip_video`, `--video_resolution {4k,1080p,720p,360p}` |
| 83 | + |
| 84 | +Run `lyrics-transcriber --help` for full usage. |
| 85 | + |
| 86 | +## Environment variables |
| 87 | +These are read automatically (CLI flags override): |
| 88 | +- `AUDIOSHAKE_API_TOKEN` |
| 89 | +- `GENIUS_API_TOKEN`, `RAPIDAPI_KEY` |
| 90 | +- `SPOTIFY_COOKIE_SP_DC` |
| 91 | +- `RUNPOD_API_KEY`, `WHISPER_RUNPOD_ID` |
| 92 | +- `WHISPER_DROPBOX_APP_KEY`, `WHISPER_DROPBOX_APP_SECRET`, `WHISPER_DROPBOX_REFRESH_TOKEN` |
| 93 | +- `OPENROUTER_API_KEY` (optional LLM handler) |
| 94 | +- `LYRICS_TRANSCRIBER_CACHE_DIR` (default `~/lyrics-transcriber-cache`) |
| 95 | + |
| 96 | +## Outputs |
| 97 | +Generated files are written to `--output_dir` (default: CWD): |
| 98 | +- `... (Lyrics Corrections).json` — full correction data and audit trail |
| 99 | +- `... (Karaoke).ass` — styled karaoke subtitles (ASS) |
| 100 | +- `... .lrc` — MidiCo compatible LRC |
| 101 | +- `... (original).txt` and `... (corrected).txt` — plain text exports |
| 102 | +- `... .cdg`, `... .mp3`, `... .zip` — CDG package (when enabled) |
| 103 | +- `... (With Vocals).mkv` — video with lyrics overlay (when enabled) |
| 104 | + |
| 105 | +Notes |
| 106 | +- If no `--output_styles_json` is provided, CDG and video are disabled automatically. |
| 107 | +- `--subtitle_offset` shifts all word timings (ms) for late/early subtitles. |
| 108 | + |
| 109 | +## Review server (human‑in‑the‑loop) |
| 110 | +If review is enabled (default), a local server starts during processing and opens the UI at `http://localhost:8000`: |
| 111 | +- Inspect and adjust corrections |
| 112 | +- Toggle correction handlers (rule‑based/LLM) |
| 113 | +- Add another lyrics source (paste plain text) |
| 114 | +- Generate a low‑res preview video on demand |
| 115 | + |
| 116 | +Frontend assets are bundled when installed from PyPI. For local dev, build the frontend once if needed: |
78 | 117 | ``` |
79 | | -transcriber = LyricsTranscriber(audio_filepath='path_to_audio.mp3') |
80 | | -result_metadata = transcriber.generate() |
| 118 | +./scripts/build_frontend.sh |
81 | 119 | ``` |
82 | 120 |
|
83 | | -result_metadata contains values as such: |
84 | | -``` |
85 | | -result_metadata = { |
86 | | - "whisper_json_filepath": str, |
87 | | - "genius_lyrics": str, |
88 | | - "genius_lyrics_filepath": str, |
89 | | - "midico_lrc_filepath": str, |
90 | | - "singing_percentage": int, |
91 | | - "total_singing_duration": int, |
92 | | - "song_duration": int, |
| 121 | +## Styles JSON (for CDG/Video) |
| 122 | +Provide a JSON with at least a `karaoke` section (for video/ASS) and, if generating CDG, a `cdg` section. Example (minimal): |
| 123 | +```json |
| 124 | +{ |
| 125 | + "karaoke": { |
| 126 | + "ass_name": "Karaoke", |
| 127 | + "font": "Oswald SemiBold", |
| 128 | + "font_path": "lyrics_transcriber/output/fonts/Oswald-SemiBold.ttf", |
| 129 | + "font_size": 120, |
| 130 | + "primary_color": "255,165,0", |
| 131 | + "secondary_color": "255,255,255", |
| 132 | + "outline_color": "0,0,0", |
| 133 | + "back_color": "0,0,0", |
| 134 | + "bold": true, |
| 135 | + "italic": false, |
| 136 | + "underline": false, |
| 137 | + "strike_out": false, |
| 138 | + "scale_x": 100, |
| 139 | + "scale_y": 100, |
| 140 | + "spacing": 0, |
| 141 | + "angle": 0, |
| 142 | + "border_style": 1, |
| 143 | + "outline": 3, |
| 144 | + "shadow": 0, |
| 145 | + "margin_l": 0, |
| 146 | + "margin_r": 0, |
| 147 | + "margin_v": 100, |
| 148 | + "encoding": 1, |
| 149 | + "background_color": "black", |
| 150 | + "max_line_length": 36, |
| 151 | + "top_padding": 180 |
| 152 | + }, |
| 153 | + "cdg": { |
| 154 | + "font": "Oswald SemiBold", |
| 155 | + "font_path": "lyrics_transcriber/output/fonts/Oswald-SemiBold.ttf" |
| 156 | + } |
93 | 157 | } |
94 | 158 | ``` |
95 | 159 |
|
96 | | -## Requirements 📋 |
97 | | - |
98 | | - - Python >= 3.9 |
99 | | - - Python Poetry |
100 | | - - Dependencies are listed in pyproject.toml |
101 | | - |
102 | | -## Local Development 💻 |
103 | | - |
104 | | -To work on the Lyrics Transcriber project locally, you need Python 3.9 or higher. It's recommended to create a virtual environment using poetry. |
105 | | - |
106 | | - 1. Clone the repo and cd into it. |
107 | | - 2. Install poetry if you haven’t already. |
108 | | - 3. Run poetry install to install the dependencies. |
109 | | - 4. Run poetry shell to activate the virtual environment. |
110 | | - |
111 | | -## Contributing 🤝 |
112 | | - |
113 | | -Contributions are very much welcome! Please fork the repository and submit a pull request with your changes, and I'll try to review, merge and publish promptly! |
114 | | - |
115 | | -- This project is 100% open-source and free for anyone to use and modify as they wish. |
116 | | -- If the maintenance workload for this repo somehow becomes too much for me I'll ask for volunteers to share maintainership of the repo, though I don't think that is very likely |
117 | | - |
118 | | -## License 📄 |
| 160 | +## Using as a library |
| 161 | +```python |
| 162 | +from lyrics_transcriber import LyricsTranscriber |
| 163 | +from lyrics_transcriber.core.controller import TranscriberConfig, LyricsConfig, OutputConfig |
| 164 | + |
| 165 | +transcriber = LyricsTranscriber( |
| 166 | + audio_filepath="/path/to/song.mp3", |
| 167 | + artist="Artist", # optional |
| 168 | + title="Title", # optional |
| 169 | + transcriber_config=TranscriberConfig( |
| 170 | + audioshake_api_token="...", # or env |
| 171 | + runpod_api_key="...", whisper_runpod_id="..." |
| 172 | + ), |
| 173 | + lyrics_config=LyricsConfig( |
| 174 | + genius_api_token="...", spotify_cookie="...", rapidapi_key="...", |
| 175 | + lyrics_file=None |
| 176 | + ), |
| 177 | + output_config=OutputConfig( |
| 178 | + output_dir="./out", cache_dir="~/lyrics-transcriber-cache", |
| 179 | + output_styles_json="/path/to/styles.json", # required for CDG/video |
| 180 | + video_resolution="1080p", subtitle_offset_ms=0 |
| 181 | + ), |
| 182 | +) |
| 183 | + |
| 184 | +result = transcriber.process() |
| 185 | +print(result.ass_filepath, result.lrc_filepath, result.video_filepath) |
| 186 | +``` |
119 | 187 |
|
120 | | -This project is licensed under the MIT [License](LICENSE). |
| 188 | +## Docker |
| 189 | +Build and run locally (includes FFmpeg and spaCy model): |
| 190 | +```bash |
| 191 | +docker build -t lyrics-transcriber:local . |
| 192 | +docker run --rm -v "$PWD/input":/input -v "$PWD/output":/output \ |
| 193 | + -e AUDIOSHAKE_API_TOKEN -e GENIUS_API_TOKEN -e RUNPOD_API_KEY -e WHISPER_RUNPOD_ID \ |
| 194 | + lyrics-transcriber:local \ |
| 195 | + --output_dir /output --skip_cdg --video_resolution 360p /input/song.mp3 |
| 196 | +``` |
121 | 197 |
|
122 | | -## Credits 🙏 |
| 198 | +## Development |
| 199 | +- Python 3.10–3.13, Poetry |
| 200 | +- Install deps: `poetry install` |
| 201 | +- Run tests: `poetry run pytest` |
| 202 | +- Build frontend (if editing UI): `./scripts/build_frontend.sh` |
123 | 203 |
|
124 | | -- This project uses [OpenAI Whisper](https://github.com/openai/whisper) for transcription, which inspired the entire tool! |
125 | | -- Thanks to @linto-ai for the [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) project which solved a big chunk for me. |
126 | | -- Thanks to Genius for providing an API which makes fetching lyrics easier! |
| 204 | +## License |
| 205 | +MIT. See `LICENSE`. |
127 | 206 |
|
128 | | -## Contact 💌 |
| 207 | +## Credits |
| 208 | +- Audio transcription by AudioShake and Whisper (RunPod) |
| 209 | +- Lyrics via Genius, Spotify, Musixmatch; layout via `karaoke-lyrics-processor` |
| 210 | +- UI/API: FastAPI, Vite/React frontend |
129 | 211 |
|
130 | | -For questions or feedback, please raise an issue or reach out to @beveradb ([Andrew Beveridge](mailto:andrew@beveridge.uk)) directly. |
| 212 | +## Support |
| 213 | +Please open issues or PRs on the repo, or contact @beveradb. |
0 commit comments