Skip to content
This repository was archived by the owner on Jan 19, 2026. It is now read-only.

Commit 960c0e5

Browse files
committed
Updated readme
1 parent 4174d2c commit 960c0e5

1 file changed

Lines changed: 182 additions & 99 deletions

File tree

README.md

Lines changed: 182 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,130 +1,213 @@
11
# Lyrics Transcriber 🎶
22

33
![PyPI - Version](https://img.shields.io/pypi/v/lyrics-transcriber)
4-
![Python Version](https://img.shields.io/badge/python-3.10+-blue)
4+
![Python Version](https://img.shields.io/badge/python-3.10%E2%80%933.13-blue)
55
[![Tests](https://github.com/nomadkaraoke/python-lyrics-transcriber/actions/workflows/test-and-publish.yml/badge.svg)](https://github.com/nomadkaraoke/python-lyrics-transcriber/actions/workflows/test-and-publish.yml)
66
[![Coverage](https://codecov.io/gh/nomadkaraoke/python-lyrics-transcriber/graph/badge.svg?token=SMW2TVPVNT)](https://codecov.io/gh/nomadkaraoke/python-lyrics-transcriber)
77

8-
Automatically create synchronised lyrics files in ASS and MidiCo LRC formats with word-level timestamps, using OpenAI Whisper and lyrics from Genius and Spotify, for convenience in use cases such as karaoke video production.
9-
10-
## Features 🌟
11-
12-
- Automatically transcribe lyrics with word-level timestamps.
13-
- Outputs lyrics in ASS and MidiCo LRC formats.
14-
- Can fetch lyrics from with Genius and Spotify.
15-
- Command Line Interface (CLI) for easy usage.
16-
- Can be included and used in other Python projects.
17-
18-
## Installation 🛠️
19-
20-
### Prerequisites
21-
22-
- Python 3.10 or higher
23-
- [Optional] Genius API token if you want to fetch lyrics from Genius
24-
- [Optional] Spotify cookie value if you want to fetch lyrics from Spotify
25-
- [Optional] OpenAI API token if you want to use LLM correction of the transcribed lyrics
26-
- [Optional] AudioShake API token if you want to use a much higher quality (but paid) API for lyrics transcription
27-
8+
Create synchronized karaoke assets from an audio file with word‑level timing: fetch lyrics, transcribe audio, auto‑correct against references, review in a web UI, and export ASS, LRC, CDG, and video.
9+
10+
### What this project is now
11+
- **Modular pipeline** orchestrated by `LyricsTranscriber` with clear configs
12+
- **Transcription** via AudioShake (preferred) and Whisper on RunPod (fallback)
13+
- **Lyrics providers**: Genius, Spotify, Musixmatch, or a local file
14+
- **Rule‑based correction** with optional **LLM‑assisted** gap fixes
15+
- **Human review** server + frontend for iterative corrections and previews
16+
- **Outputs**: original/corrected text, corrections JSON, LRC, ASS, CDG(+MP3/ZIP), and video
17+
18+
## Features
19+
- **Multi-transcriber orchestration** with caching per audio hash
20+
- AudioShake API (priority 1)
21+
- Whisper via RunPod + Dropbox upload (priority 2)
22+
- **Lyrics fetching** with caching per artist/title
23+
- Genius (token or RapidAPI) • Spotify (cookie or RapidAPI) • Musixmatch (RapidAPI) • Local file
24+
- **Correction engine**
25+
- Anchor/gap detection, multiple rule handlers (word count, syllables, relaxed, punctuation, extend‑anchor)
26+
- Optional LLM handlers (Ollama local, or OpenRouter with `OPENROUTER_API_KEY`)
27+
- **Review UI** (FastAPI) at `http://localhost:8000`
28+
- Edit corrections, toggle handlers, add lyrics sources, generate preview video
29+
- **Rich outputs**
30+
- Plain text (original/corrected), corrections `JSON`, `*.lrc` (MidiCo), `*.ass` (karaoke), `*.cdg` with `*.mp3` and ZIP, and MP4/MKV video
31+
- Subtitle offset, line wrapping, styles via JSON
32+
33+
## Install
2834
```
2935
pip install lyrics-transcriber
3036
```
3137

32-
> **Warning**
33-
> The package published to PyPI was created by manually editing `poetry.lock` to remove [triton](https://github.com/openai/triton), as it is technically a sub-dependency from openai-whisper but is currently only supported on Linux (whisper still works fine without it, and I want this package to be usable on any platform)
34-
35-
## Docker
36-
37-
You can use the pre-built container image `beveradb/lyrics-transcriber:0.16.0` on Docker hub if you want, here's an example:
38-
39-
```sh
40-
docker run \
41-
-v `pwd`/input:/input \
42-
-v `pwd`/output:/output \
43-
beveradb/lyrics-transcriber:0.16.0 \
44-
--log_level debug \
45-
--output_dir /output \
46-
--render_video \
47-
--video_background_image /input/your-background-image.png \
48-
--video_resolution 360p \
49-
/input/song.flac
38+
### System requirements
39+
- Python 3.10–3.13
40+
- FFmpeg (required for audio probe and video rendering)
41+
- spaCy English model (phrase analyzer used by correction):
5042
```
51-
52-
## Usage 🚀
53-
54-
### As a standalone CLI
55-
56-
1. To transcribe lyrics from an audio file:
57-
43+
python -m spacy download en_core_web_sm
5844
```
59-
lyrics-transcriber /path/to/your/audiofile.mp3
60-
```
61-
62-
2. To specify Genius API token, song artist, and song title for auto-correction:
6345

46+
## Quick start (CLI)
47+
Minimal run (transcribe + LRC/ASS, no video/CDG):
48+
```bash
49+
lyrics-transcriber /path/to/song.mp3 --skip_video --skip_cdg
6450
```
65-
lyrics-transcriber /path/to/your/audiofile.mp3 --genius_api_token YOUR_API_TOKEN --artist "Artist Name" --title "Song Title"
66-
```
67-
68-
### As a Python package in your project
6951

70-
1. Import LyricsTranscriber in your Python script:
52+
Use AudioShake and auto‑fetch lyrics (Genius + artist/title):
53+
```bash
54+
export AUDIOSHAKE_API_TOKEN=... # or pass --audioshake_api_token
55+
export GENIUS_API_TOKEN=...
56+
lyrics-transcriber /path/to/song.mp3 --artist "Artist" --title "Song"
57+
```
7158

59+
Use Whisper on RunPod (fallback or standalone):
60+
```bash
61+
export RUNPOD_API_KEY=...
62+
export WHISPER_RUNPOD_ID=... # your RunPod endpoint ID
63+
lyrics-transcriber /path/to/song.mp3 --skip_cdg --skip_video
7264
```
73-
from lyrics_transcriber import LyricsTranscriber
65+
66+
Provide a local lyrics file instead of fetching:
67+
```bash
68+
lyrics-transcriber /path/to/song.mp3 --lyrics_file /path/to/lyrics.txt
7469
```
7570

76-
1. Create an instance and use it:
71+
Render video/CDG (requires a styles JSON file):
72+
```bash
73+
lyrics-transcriber /path/to/song.mp3 \
74+
--output_styles_json /path/to/styles.json \
75+
--video_resolution 1080p
76+
```
7777

78+
### Common flags
79+
- **Song identification**: `--artist`, `--title`, `--lyrics_file`
80+
- **APIs**: `--audioshake_api_token`, `--genius_api_token`, `--spotify_cookie`, `--runpod_api_key`, `--whisper_runpod_id`
81+
- **Output**: `--output_dir`, `--cache_dir`, `--output_styles_json`, `--subtitle_offset`
82+
- **Feature toggles**: `--skip_lyrics_fetch`, `--skip_transcription`, `--skip_correction`, `--skip_plain_text`, `--skip_lrc`, `--skip_cdg`, `--skip_video`, `--video_resolution {4k,1080p,720p,360p}`
83+
84+
Run `lyrics-transcriber --help` for full usage.
85+
86+
## Environment variables
87+
These are read automatically (CLI flags override):
88+
- `AUDIOSHAKE_API_TOKEN`
89+
- `GENIUS_API_TOKEN`, `RAPIDAPI_KEY`
90+
- `SPOTIFY_COOKIE_SP_DC`
91+
- `RUNPOD_API_KEY`, `WHISPER_RUNPOD_ID`
92+
- `WHISPER_DROPBOX_APP_KEY`, `WHISPER_DROPBOX_APP_SECRET`, `WHISPER_DROPBOX_REFRESH_TOKEN`
93+
- `OPENROUTER_API_KEY` (optional LLM handler)
94+
- `LYRICS_TRANSCRIBER_CACHE_DIR` (default `~/lyrics-transcriber-cache`)
95+
96+
## Outputs
97+
Generated files are written to `--output_dir` (default: CWD):
98+
- `... (Lyrics Corrections).json` — full correction data and audit trail
99+
- `... (Karaoke).ass` — styled karaoke subtitles (ASS)
100+
- `... .lrc` — MidiCo compatible LRC
101+
- `... (original).txt` and `... (corrected).txt` — plain text exports
102+
- `... .cdg`, `... .mp3`, `... .zip` — CDG package (when enabled)
103+
- `... (With Vocals).mkv` — video with lyrics overlay (when enabled)
104+
105+
Notes
106+
- If no `--output_styles_json` is provided, CDG and video are disabled automatically.
107+
- `--subtitle_offset` shifts all word timings (ms) for late/early subtitles.
108+
109+
## Review server (human‑in‑the‑loop)
110+
If review is enabled (default), a local server starts during processing and opens the UI at `http://localhost:8000`:
111+
- Inspect and adjust corrections
112+
- Toggle correction handlers (rule‑based/LLM)
113+
- Add another lyrics source (paste plain text)
114+
- Generate a low‑res preview video on demand
115+
116+
Frontend assets are bundled when installed from PyPI. For local dev, build the frontend once if needed:
78117
```
79-
transcriber = LyricsTranscriber(audio_filepath='path_to_audio.mp3')
80-
result_metadata = transcriber.generate()
118+
./scripts/build_frontend.sh
81119
```
82120

83-
result_metadata contains values as such:
84-
```
85-
result_metadata = {
86-
"whisper_json_filepath": str,
87-
"genius_lyrics": str,
88-
"genius_lyrics_filepath": str,
89-
"midico_lrc_filepath": str,
90-
"singing_percentage": int,
91-
"total_singing_duration": int,
92-
"song_duration": int,
121+
## Styles JSON (for CDG/Video)
122+
Provide a JSON with at least a `karaoke` section (for video/ASS) and, if generating CDG, a `cdg` section. Example (minimal):
123+
```json
124+
{
125+
"karaoke": {
126+
"ass_name": "Karaoke",
127+
"font": "Oswald SemiBold",
128+
"font_path": "lyrics_transcriber/output/fonts/Oswald-SemiBold.ttf",
129+
"font_size": 120,
130+
"primary_color": "255,165,0",
131+
"secondary_color": "255,255,255",
132+
"outline_color": "0,0,0",
133+
"back_color": "0,0,0",
134+
"bold": true,
135+
"italic": false,
136+
"underline": false,
137+
"strike_out": false,
138+
"scale_x": 100,
139+
"scale_y": 100,
140+
"spacing": 0,
141+
"angle": 0,
142+
"border_style": 1,
143+
"outline": 3,
144+
"shadow": 0,
145+
"margin_l": 0,
146+
"margin_r": 0,
147+
"margin_v": 100,
148+
"encoding": 1,
149+
"background_color": "black",
150+
"max_line_length": 36,
151+
"top_padding": 180
152+
},
153+
"cdg": {
154+
"font": "Oswald SemiBold",
155+
"font_path": "lyrics_transcriber/output/fonts/Oswald-SemiBold.ttf"
156+
}
93157
}
94158
```
95159

96-
## Requirements 📋
97-
98-
- Python >= 3.9
99-
- Python Poetry
100-
- Dependencies are listed in pyproject.toml
101-
102-
## Local Development 💻
103-
104-
To work on the Lyrics Transcriber project locally, you need Python 3.9 or higher. It's recommended to create a virtual environment using poetry.
105-
106-
1. Clone the repo and cd into it.
107-
2. Install poetry if you haven’t already.
108-
3. Run poetry install to install the dependencies.
109-
4. Run poetry shell to activate the virtual environment.
110-
111-
## Contributing 🤝
112-
113-
Contributions are very much welcome! Please fork the repository and submit a pull request with your changes, and I'll try to review, merge and publish promptly!
114-
115-
- This project is 100% open-source and free for anyone to use and modify as they wish.
116-
- If the maintenance workload for this repo somehow becomes too much for me I'll ask for volunteers to share maintainership of the repo, though I don't think that is very likely
117-
118-
## License 📄
160+
## Using as a library
161+
```python
162+
from lyrics_transcriber import LyricsTranscriber
163+
from lyrics_transcriber.core.controller import TranscriberConfig, LyricsConfig, OutputConfig
164+
165+
transcriber = LyricsTranscriber(
166+
audio_filepath="/path/to/song.mp3",
167+
artist="Artist", # optional
168+
title="Title", # optional
169+
transcriber_config=TranscriberConfig(
170+
audioshake_api_token="...", # or env
171+
runpod_api_key="...", whisper_runpod_id="..."
172+
),
173+
lyrics_config=LyricsConfig(
174+
genius_api_token="...", spotify_cookie="...", rapidapi_key="...",
175+
lyrics_file=None
176+
),
177+
output_config=OutputConfig(
178+
output_dir="./out", cache_dir="~/lyrics-transcriber-cache",
179+
output_styles_json="/path/to/styles.json", # required for CDG/video
180+
video_resolution="1080p", subtitle_offset_ms=0
181+
),
182+
)
183+
184+
result = transcriber.process()
185+
print(result.ass_filepath, result.lrc_filepath, result.video_filepath)
186+
```
119187

120-
This project is licensed under the MIT [License](LICENSE).
188+
## Docker
189+
Build and run locally (includes FFmpeg and spaCy model):
190+
```bash
191+
docker build -t lyrics-transcriber:local .
192+
docker run --rm -v "$PWD/input":/input -v "$PWD/output":/output \
193+
-e AUDIOSHAKE_API_TOKEN -e GENIUS_API_TOKEN -e RUNPOD_API_KEY -e WHISPER_RUNPOD_ID \
194+
lyrics-transcriber:local \
195+
--output_dir /output --skip_cdg --video_resolution 360p /input/song.mp3
196+
```
121197

122-
## Credits 🙏
198+
## Development
199+
- Python 3.10–3.13, Poetry
200+
- Install deps: `poetry install`
201+
- Run tests: `poetry run pytest`
202+
- Build frontend (if editing UI): `./scripts/build_frontend.sh`
123203

124-
- This project uses [OpenAI Whisper](https://github.com/openai/whisper) for transcription, which inspired the entire tool!
125-
- Thanks to @linto-ai for the [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) project which solved a big chunk for me.
126-
- Thanks to Genius for providing an API which makes fetching lyrics easier!
204+
## License
205+
MIT. See `LICENSE`.
127206

128-
## Contact 💌
207+
## Credits
208+
- Audio transcription by AudioShake and Whisper (RunPod)
209+
- Lyrics via Genius, Spotify, Musixmatch; layout via `karaoke-lyrics-processor`
210+
- UI/API: FastAPI, Vite/React frontend
129211

130-
For questions or feedback, please raise an issue or reach out to @beveradb ([Andrew Beveridge](mailto:andrew@beveridge.uk)) directly.
212+
## Support
213+
Please open issues or PRs on the repo, or contact @beveradb.

0 commit comments

Comments
 (0)