Skip to content

Latest commit

 

History

History
42 lines (24 loc) · 1.69 KB

File metadata and controls

42 lines (24 loc) · 1.69 KB

Transcription tools

Prerequisites

To make use of the scripts in this directory, you first need to download the podcast episodes by following the instructions in the Episode tools readme. Additionally, you should have:

  • Python 3.9+ installed
  • Packages listed in requirements.txt installed

Transcribe episodes

After downloading episodes, you can create episode transcriptions by running ./transcribe_episodes.sh.

Prerequisite: OpenAI's Whisper command line interface must be installed.

Example usage:

./transcribe_episodes.sh -e 1 # start transcribing from episode 1

Fix transcription issues

transcription_corrections.tsv holds a list of regex patterns to find on every line of every transcript and replace with the target replace string.

Fix transcriptions per this file's find/replace criteria by running python3 correct_transcripts.py.

Upload transcript search index

Prerequisite: This project uses Typesense as its search indexing provider. To start the Typesense server, follow the instructions in the server readme.

Upload transcripts for Typesense to index by running python3 index_transcripts.py.

Example usage:

python3 index_transcripts.py -k '<Typesense API key>' -e dev # upload transcriptions to the local development Typesense server

Note

For initial transcription, Whisper v20230314 was used through transcription of episode 281. Following episodes were transcribed using v20230918.