To make use of the scripts in this directory, you first need to download the podcast episodes by following the instructions in the Episode tools readme. Additionally, you should have:
- Python 3.9+ installed
- Packages listed in
requirements.txtinstalled
After downloading episodes, you can create episode transcriptions by running ./transcribe_episodes.sh.
Prerequisite: OpenAI's Whisper command line interface must be installed.
Example usage:
./transcribe_episodes.sh -e 1 # start transcribing from episode 1transcription_corrections.tsv holds a list of regex patterns to find on every line of every transcript and replace with the target replace string.
Fix transcriptions per this file's find/replace criteria by running python3 correct_transcripts.py.
Prerequisite: This project uses Typesense as its search indexing provider. To start the Typesense server, follow the instructions in the server readme.
Upload transcripts for Typesense to index by running python3 index_transcripts.py.
Example usage:
python3 index_transcripts.py -k '<Typesense API key>' -e dev # upload transcriptions to the local development Typesense serverFor initial transcription, Whisper v20230314 was used through transcription of episode 281. Following episodes were transcribed using v20230918.