This directory contains a Python script (songSplitter.py
) to automatically transcribe an audio file, split it into individual word segments, and generate a JSON mapping file. This is useful for creating datasets for projects that need audio corresponding to specific words.
This one was used as a part of April Fools' day where Twitch chat could play song, word by word with typing their messages.
- Python 3.x
- OpenAI Whisper: For audio transcription.
- pydub: For audio manipulation (splitting).
- ffmpeg: Required by
pydub
for handling various audio formats (like MP3). Ensureffmpeg
is installed and accessible in your system's PATH.
You can install the Python libraries using pip:
pip install -U openai-whisper pydub
The songSplitter.py
script takes the path to an audio file as input and performs the transcription and splitting process.
python extractor/songSplitter.py <audio_file_path> [options]
<audio_file_path>
: (Required) Path to the input audio file (e.g.,rickroll.mp3
).-o
,--output_dir
: Directory to save the segmented audio files (defaults tooutput
).-j
,--json_path
: Path to save the output JSON mapping file (defaults tosplitsong.json
).-m
,--model
: Whisper model name to use for transcription (e.g.,tiny
,base
,small
,medium
,large
). Defaults tomedium
. Larger models are more accurate but require more resources (VRAM/RAM) and time.
Let's say you have the Rick Roll song saved as rickroll.mp3
in the extractor
directory parent directory. To process it using the base
model and save the results in a directory named rickroll_words
:
python extractor/songSplitter.py ../rickroll.mp3 -m base -o rickroll_words -j rickroll_map.json
The script will generate:
- Segmented Audio Files: Inside the specified output directory (
output
or--output_dir
), you will find numerous small MP3 files (e.g.,000.mp3
,001.mp3
,002.mp3
, ...), each corresponding to a word detected in the original audio. - JSON Mapping File: A JSON file (
splitsong.json
or--json_path
) containing a list of objects, where each object maps a detected word (lowercase) to its corresponding audio segment file path.
Example splitsong.json
structure:
[
{
"word": "we're",
"sound": "output/000.mp3"
},
{
"word": "no",
"sound": "output/001.mp3"
},
{
"word": "strangers",
"sound": "output/002.mp3"
},
{
"word": "to",
"sound": "output/003.mp3"
},
// ... more words
]