Audio Transcription Demo (ClipABit Demo 1)

A simple Streamlit app displaying the audio transcription to search portion of the pipeline for ClipABit -- a semantic search engine for video editors.

How to run it on your own machine

Install the requirements
```
$ pip install -r requirements.txt
```
Run the app
```
$ streamlit run streamlit_app.py
```

Audio Transcription Embeddings & Search Algorithms Demo

Project Purpose

This project demonstrates how to generate audio transcription embeddings using OpenAI Whisper, and explores different search algorithms for retrieval tasks. The goal is to experiment with various fusion and selection strategies for searching among multimodal embeddings.

Workflow

Audio Transcription & Embedding
- Upload audio files via the Streamlit app.
- Transcribe each audio file using Whisper.
- Generate embeddings for each transcription.
Search Algorithms Tested
- Average Fusion: Combine all modality vectors into a single index vector (or average similarity scores) and retrieve the most relevant result.
- Tiny LLM Select: Use a small language model (or classifier) to select the primary modality based on the prompt, then retrieve from that modality's index only.
- Tiny LLM Weighted: Use a small language model to assign weights to each modality and fuse retrieval results using those weights.

Search Algorithm Selection

Users can choose which search algorithm to use for semantic retrieval:

Average Fusion: Combines vectors or similarity scores for retrieval.
Tiny LLM Select: Uses a small language model to select the primary modality for search.
Tiny LLM Weighted: Assigns weights to each modality using a small LLM and fuses results accordingly.

Metrics Displayed in the UI Sidebar

The sidebar provides live data on the following metrics for each search and embedding operation:

Quality: Subjective measure (e.g., by virtue of having eyes, user can judge relevance).
Latency: Time to embed a query, search, and merge results.
Embedding Time: Per-clip embedding time (batch and single).
Storage: Database size per clip and total.
Computational Intensity: Resource usage for each operation.

Live Demo

🚀 Currently Running

Local Development: http://localhost:8501 (Running locally)

🌐 Cloud Deployment

(Previous demo)

Status: ✅ Always Active - Monitored by UptimeRobot to prevent hibernation

New Deployment: Coming soon - deploy this repository to get your own dedicated URL

How to Run

Option 1: Using Management Script (Recommended)

# Start the app in background
./run_streamlit.sh start

# Check status
./run_streamlit.sh status

# View logs
./run_streamlit.sh logs

# Stop the app
./run_streamlit.sh stop

Option 2: Manual Start

Install dependencies:
```
pip install -r requirements.txt
```
Start the app:
```
streamlit run streamlit_app.py
```

Option 3: Deploy to Streamlit Cloud

Push to GitHub:

git add .
git commit -m "Deploy to Streamlit Cloud"
git push origin main

Go to share.streamlit.io and deploy!

🔄 Keeping Apps Active

Prevent Streamlit Cloud Hibernation

Streamlit Cloud hibernates apps after 12 hours of inactivity. To keep apps always running:

Option 1: UptimeRobot (Recommended)

Go to uptimerobot.com
Add monitor for your app URL
Set 5-minute intervals
App stays active forever! ✅

Option 2: GitHub Actions

Use the included workflow: .github/workflows/keep-alive.yml

Option 3: Manual Ping

Run the included script: ./ping_app.sh

Files: See UPTIMEROBOT_SETUP.md for detailed instructions.

Next Steps

Implement and compare the search algorithms listed above.
Experiment with different fusion strategies and LLMs for selection/weighting.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
DEPLOYMENT.md		DEPLOYMENT.md
KEEP_ALIVE_SETUP.md		KEEP_ALIVE_SETUP.md
LICENSE		LICENSE
README.md		README.md
STREAMLIT_SETUP.md		STREAMLIT_SETUP.md
UPTIMEROBOT_SETUP.md		UPTIMEROBOT_SETUP.md
packages.txt		packages.txt
ping_app.sh		ping_app.sh
requirements.txt		requirements.txt
run_streamlit.sh		run_streamlit.sh
streamlit.log		streamlit.log
streamlit.pid		streamlit.pid
streamlit.service		streamlit.service
streamlit_app.py		streamlit_app.py
test_uptimerobot.sh		test_uptimerobot.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcription Demo (ClipABit Demo 1)

How to run it on your own machine

Audio Transcription Embeddings & Search Algorithms Demo

Project Purpose

Workflow

Search Algorithm Selection

Metrics Displayed in the UI Sidebar

Live Demo

🚀 Currently Running

🌐 Cloud Deployment

How to Run

Option 1: Using Management Script (Recommended)

Option 2: Manual Start

Option 3: Deploy to Streamlit Cloud

🔄 Keeping Apps Active

Prevent Streamlit Cloud Hibernation

Option 1: UptimeRobot (Recommended)

Option 2: GitHub Actions

Option 3: Manual Ping

Next Steps

About

Uh oh!

Releases

Packages

Languages

License

ClipABit/audio-transcription

Folders and files

Latest commit

History

Repository files navigation

Audio Transcription Demo (ClipABit Demo 1)

How to run it on your own machine

Audio Transcription Embeddings & Search Algorithms Demo

Project Purpose

Workflow

Search Algorithm Selection

Metrics Displayed in the UI Sidebar

Live Demo

🚀 Currently Running

🌐 Cloud Deployment

How to Run

Option 1: Using Management Script (Recommended)

Option 2: Manual Start

Option 3: Deploy to Streamlit Cloud

🔄 Keeping Apps Active

Prevent Streamlit Cloud Hibernation

Option 1: UptimeRobot (Recommended)

Option 2: GitHub Actions

Option 3: Manual Ping

Next Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages