🎥 Video Retriever - AI-Powered YouTube Search

🌐 Live Web Application: Deploy your own AI-powered video search engine!

🚀 Quick Deploy (Web Application)

One-Click Deploy

Get your app running in 2 minutes:

Click the deploy button above
Connect your GitHub account
Your app will be live at https://your-app-name.onrender.com

Alternative Platforms

Railway: Deploy to Railway - Connect GitHub repo
Heroku: Use the included Procfile for one-click deploy
Docker: docker-compose up for local deployment

See DEPLOYMENT.md for detailed instructions.

✨ Web Application Features

🎥 Multi-Video Search: Search across multiple YouTube videos simultaneously
⏰ Timestamped Results: Get exact timestamps for relevant moments
🎤 Speaker Detection: Identify different speakers in conversations
📱 Modern UI: Beautiful, responsive interface with real-time updates
🔍 Search History: Keep track of your previous searches
🎯 Similarity Scoring: Results ranked by AI-powered relevance scores
⚙️ Configurable: Customize search parameters and models

💻 Local Development

# Clone and start the web interface
git clone https://github.com/vats98754/video-retriever.git
cd video-retriever

# Start web application (includes setup)
./start_web.sh

# Or start with custom options
./start_web.sh --port 8080 --model small --similarity-threshold 0.2

📱 Command Line Usage

One command to get timestamped YouTube URLs:

# Search any YouTube video - everything is handled automatically!
python video_retriever.py "https://youtu.be/VIDEO_ID" "your search query"

# Or use just the video ID
python video_retriever.py "VIDEO_ID" "your search query"

✨ Features

🎯 Complete End-to-End Pipeline: URL/ID + Query → Timestamped YouTube URLs
📋 Smart Transcript Selection: Automatically uses YouTube captions when available, falls back to Whisper
🌍 Multi-Language Support: Supports transcripts in multiple languages with automatic fallback
⚡ Smart File Management: Automatically reuses existing audio/transcripts
🧠 TF-IDF Semantic Search: Fast, local search with no external dependencies
📁 Organized Storage: All files stored in data/VIDEO_ID/ structure
🔗 Direct YouTube Links: Results include clickable https://youtu.be/ID?t=123s URLs

🎬 How It Works

Input: YouTube URL or video ID + search query
Auto-Check: Uses existing files if available (no re-downloading)
Smart Transcript: First tries YouTube captions, then falls back to Whisper if needed
Download: Audio via yt-dlp (only if no transcript available and Whisper needed)
Transcribe: Audio to text via Whisper (only if YouTube captions unavailable)
Search: TF-IDF semantic search with smart chunking

📋 Examples

# Search for interview tips
python video_retriever.py "https://youtu.be/0siE31sqz0Q" "interview preparation"

# List available transcripts for a video
python video_retriever.py "0siE31sqz0Q" --list-transcripts

# Search with language preference
python video_retriever.py "VIDEO_ID" "search terms" --language es

# Get more results
python video_retriever.py "0siE31sqz0Q" "storytelling" --top-k 10

# Use different chunk size for different granularity
python video_retriever.py "VIDEO_ID" "search terms" --chunk-size 4

📁 File Organization

Everything is organized under data/VIDEO_ID/:

data/
├── README.md            # Structure documentation
├── .gitkeep            # Preserves directory in git
└── VIDEO_ID/           # Created automatically per video
    ├── audio/          # Downloaded MP3 files (git ignored)
    ├── transcripts/    # JSON, TXT, SRT transcripts (git ignored)
    ├── vectors/        # Processed chunks for search (git ignored)
    └── searches/       # Search results with timestamps (git ignored)

Note: Only the directory structure is tracked in git. All data files are automatically ignored to keep the repository clean.

🔧 Advanced Usage

from video_retriever import VideoRetriever

# Initialize
retriever = VideoRetriever(model="base")

# End-to-end search
results = retriever.search_video(
    "https://youtu.be/VIDEO_ID", 
    "your query", 
    top_k=5
)

# Results include timestamped URLs
for result in results:
    print(f"🔗 {result['youtube_url']}")
    print(f"📝 {result['text']}")

🛠️ Requirements

Python 3.8+
yt-dlp (audio download)
youtube-transcript-api (YouTube captions)
OpenAI Whisper (fallback transcription)
scikit-learn (TF-IDF search)

💡 Tips

YouTube Captions First: System automatically uses YouTube's captions when available (much faster!)
Language Support: Use --language es for Spanish, --language fr for French, etc.
Check Available Languages: Use --list-transcripts to see what languages are available
Reuse Files: The system automatically detects and reuses existing audio/transcripts
Chunk Size: Use smaller chunks (3-4) for precise search, larger (8-10) for context
Model Size: Use base for speed, large for accuracy (only used when Whisper fallback needed)
Query Tips: Use descriptive phrases rather than single keywords
yt-dlp - YouTube downloading
youtube-transcript-api - YouTube captions extraction
openai-whisper - Speech transcription (fallback)
scikit-learn - TF-IDF vectorization
numpy - Numerical operations

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
data		data
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
WEB_README.md		WEB_README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
download_audio.py		download_audio.py
index.html		index.html
railway.toml		railway.toml
render.yaml		render.yaml
requirements.txt		requirements.txt
runtime.txt		runtime.txt
simple_transcript_extractor.py		simple_transcript_extractor.py
start_production.sh		start_production.sh
start_web.sh		start_web.sh
video_retriever.py		video_retriever.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎥 Video Retriever - AI-Powered YouTube Search

🚀 Quick Deploy (Web Application)

One-Click Deploy

Alternative Platforms

✨ Web Application Features

💻 Local Development

📱 Command Line Usage

✨ Features

🎬 How It Works

📋 Examples

📁 File Organization

🔧 Advanced Usage

🛠️ Requirements

💡 Tips

About

Uh oh!

Releases

Packages

Languages

vats98754/video-retriever

Folders and files

Latest commit

History

Repository files navigation

🎥 Video Retriever - AI-Powered YouTube Search

🚀 Quick Deploy (Web Application)

One-Click Deploy

Alternative Platforms

✨ Web Application Features

💻 Local Development

📱 Command Line Usage

✨ Features

🎬 How It Works

📋 Examples

📁 File Organization

🔧 Advanced Usage

🛠️ Requirements

💡 Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages