A powerful video transcription tool that not only transcribes videos but also generates AI personas that can engage in conversations about the content. Built with Streamlit and powered by Ollama for local AI processing.
-
Start the app (choose one):
# Option A: Docker (recommended) docker compose up -d && open http://localhost:8501 # Option B: Local ollama serve && streamlit run main.py
-
Upload a sample video - Use any of these public domain sources:
- Internet Archive - Free public domain videos
- Pexels - Free stock videos
- Pixabay - Free video clips
- Or use any
.mp4,.mov,.mkv,.avifile you have
-
Watch the magic happen:
- Video uploads β Audio extraction β Whisper transcription
- Click "Generate Persona" β AI analyzes speaking patterns
- Chat with your new AI persona about the content!
πΉ Upload Video β π΅ Extract Audio β π Transcribe with Whisper
β
π¬ Chat with Persona β π€ Generate AI Persona β π Translate (optional)
β
π₯ Export (SRT/VTT/MD/JSON/TXT)
| Step | Time | Description |
|---|---|---|
| Upload | ~5s | Drag & drop any video file |
| Transcription | ~30s-2min | Whisper processes audio |
| Persona Generation | ~10s | AI analyzes speech patterns |
| Chat | Instant | Converse with your new persona |
- π¬ Video to text transcription using Whisper
βΆοΈ Interactive video player with clickable timestamps - Click any timestamp to jump to that position in the video- π Search within transcripts - Find and highlight specific text with instant filtering
- π Multi-language translation support
- π€ AI persona generation from transcripts
- π¬ Interactive chat with generated personas
- π Dynamic model selection from local Ollama installation
- π Client and transcription management
- π Local AI processing with Ollama
- π₯ Multiple export formats (SRT, VTT, Markdown, JSON, TXT)
- π³ Docker Compose for one-command deployment
- NEW: Interactive video player with click-to-seek timestamps
- NEW: Transcript search with text highlighting
- NEW: Auto-scroll to current playback position
- NEW: Docker Compose packaging for one-command deployment
- NEW: Export to SRT, VTT, Markdown, JSON, and TXT formats
- NEW: Quick demo section with 2-minute workflow
- Added dynamic Ollama model selection
- Improved persona generation and chat interface
- Added ability to regenerate personas
- Enhanced error handling and feedback
- Improved database management with migrations
- Python 3.11 or higher
- FFmpeg installed on your system
- Ollama installed and running
- At least one Ollama model pulled (e.g.,
ollama pull mistral:instruct)
- Python 3.10+
- pip or conda
- FFmpeg
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Create a new conda environment
conda create -n video-transcription python=3.11
# Activate the environment
conda activate video-transcription
# Install dependencies
pip install -r requirements.txt
# Optional: Install additional conda-specific packages if needed
conda install ffmpeg- FFmpeg: Required for audio/video processing
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg - Windows: Download from FFmpeg official site
- macOS:
The easiest way to get started is with Docker Compose, which handles all dependencies automatically:
# Clone the repository
git clone https://github.com/marc-shade/VIdeo-Transcription.git
cd VIdeo-Transcription
# Start everything with one command
docker compose up -d
# Wait for services to be ready (first run downloads models)
docker compose logs -f ollama-init
# Access the application
open http://localhost:8501What's included:
- Streamlit application with all Python dependencies
- FFmpeg for audio/video processing
- Whisper model pre-downloaded
- Ollama with mistral:instruct model
- Persistent volumes for data and models
Docker Commands:
# Start services
docker compose up -d
# View logs
docker compose logs -f app
# Stop services
docker compose down
# Rebuild after code changes
docker compose up -d --build
# Reset everything (removes volumes)
docker compose down -vEnvironment Variables:
Create a .env file to customize:
OLLAMA_API_BASE=http://ollama:11434
DEFAULT_MODEL=mistral:instruct- Start the Ollama server:
ollama serve- Run the application:
streamlit run main.py- Access the web interface at
http://localhost:8501
- Upload video files
- Automatic transcription using Whisper
- Optional timestamp inclusion
- Support for multiple video formats
- Translate transcriptions to multiple languages
- Powered by deep-translator
- Maintains formatting and structure
- Analyzes speaking patterns and content
- Creates context-aware personas
- Generates detailed system prompts
- Supports multiple Ollama models
- Regenerate personas as needed
- Chat with generated personas
- Context-aware responses
- Maintains chat history
- Real-time response generation
The application uses several environment variables that can be set in a .env file:
OLLAMA_API_BASE=http://localhost:11434
DEFAULT_MODEL=mistral:instructThe application uses SQLite with the following main tables:
- clients: Store client information
- transcriptions: Store video transcriptions
- persona_prompts: Store generated AI personas
video_transcription/
βββ main.py # Primary Streamlit application
βββ database.py # Database management
βββ utils.py # Audio/video processing utilities
βββ ai_persona.py # AI persona generation
βββ requirements.txt # Project dependencies
βββ Dockerfile # Docker image definition
βββ docker-compose.yml # Multi-container orchestration
βββ .dockerignore # Docker build exclusions
βββ README.md # Project documentation
- main.py: Central Streamlit interface for video transcription
- database.py: SQLite database operations for clients and transcripts
- utils.py: Core utility functions for audio extraction and transcription
- ai_persona.py: AI-powered persona analysis and generation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for transcription
- Ollama for local AI processing
- Streamlit for the web interface
- All other open-source contributors
If you encounter problems installing PyArrow (a Streamlit dependency), try the following:
- Use pre-built wheels:
pip install --only-binary=:all: pyarrow- If you're on an older system or experiencing build errors, you can:
- Upgrade pip and setuptools
- Install build dependencies
- Try specifying a specific version
Example:
pip install --upgrade pip setuptools wheel
pip install "pyarrow[build]"
# Or specify an exact version
pip install pyarrow==14.0.2Note: PyArrow can be sensitive to system configurations and Python versions. The pre-built wheel method is often the most reliable.
