A set of tools that work together to turn any long YouTube video or podcast into polished short-form clips ready for YouTube Shorts, Instagram Reels, or TikTok.
Each step is a small app you run on its own. The output of one step becomes the input for the next.
Step 1 Download the transcript from a YouTube video
|
v
Step 2 Use AI to find the most viral and engaging moments
|
v
Step 3 Cut the video at those timestamps and crop to portrait
|
v
Step 4 Generate B-roll search queries for each clip
|
v
Step 5 Burn captions into the final clips
Folder: step1-transcript-downloader
Paste a YouTube URL and it downloads the captions for that video. You can download them in SRT, VTT, TXT, or JSON format. The JSON format is what the rest of the pipeline uses.
Run it:
cd step1-transcript-downloader
pip install -r requirements.txt
streamlit run app.pyOutput: A JSON file with all the captions and their timestamps.
Folder: step2-viral-clip-finder
Upload the JSON transcript from Step 1. The app uses Cerebras AI to read through the transcript and score every segment based on how viral it could be. It looks at things like how strong the opening line is, whether the clip makes sense on its own, and whether it would make someone feel something.
You get a CSV with the top clip timestamps ranked by score.
What you need: A Cerebras API key (enter it in the sidebar when you open the app).
Run it:
cd step2-viral-clip-finder
pip install -r requirements.txt
streamlit run app.pyInput: JSON transcript from Step 1 Output: CSV with start and end timestamps for each top clip.
Folder: step3-video-cutter
Upload your original video file and the CSV from Step 2. This tool cuts the video into clips at those timestamps, then automatically crops each clip from landscape to portrait (9:16 ratio for Shorts). It uses face detection to keep the speaker in frame as they move.
What you need: FFmpeg installed on your machine. Enter the path to it in the sidebar.
Run it:
cd step3-video-cutter
pip install -r requirements.txt
streamlit run app.pyInput: Original video file + CSV from Step 2 Output: A ZIP file with all the cropped short clips.
Folder: step4-broll-queries
Upload the CSV from Step 2 and the JSON from Step 1. For each clip, it reads what is being said and uses an AI model to suggest search terms you can use to find matching B-roll footage on stock video sites.
What you need: A Groq API key (enter it in the sidebar when you open the app).
Run it:
cd step4-broll-queries
pip install -r requirements.txt
streamlit run app.pyInput: CSV from Step 2 + JSON from Step 1 Output: CSV with B-roll search queries for each clip segment.
Folder: step5-caption-burner
Upload the ZIP of clips from Step 3, the CSV from Step 2, and the JSON from Step 1. This tool burns captions directly onto each video, word by word, synced to the timing of the speech.
What you need: ImageMagick installed. Update the path in caption_generator.py line 8 if you installed it somewhere other than the default location.
Run it:
cd step5-caption-burner
pip install -r requirements.txt
streamlit run app.pyInput: ZIP from Step 3 + CSV from Step 2 + JSON from Step 1 Output: A ZIP of the final captioned short-form videos.
- Python 3.8 or above
- FFmpeg (for Step 3) - download from https://ffmpeg.org
- ImageMagick (for Step 5) - download from https://imagemagick.org
- A Cerebras API key for Step 2 - get one at https://cloud.cerebras.ai
- A Groq API key for Step 4 - get one at https://console.groq.com
Steps 1, 3, and 5 are free to run once you have the dependencies installed.
| File | Created by | Used by |
|---|---|---|
captions.json |
Step 1 | Steps 2, 4, 5 |
timestamps.csv |
Step 2 | Steps 3, 4, 5 |
clips.zip |
Step 3 | Step 5 |
broll_queries.csv |
Step 4 | (end product) |
captioned_clips.zip |
Step 5 | (end product) |