Open Testimony is a powerful video search and indexing tool focused on searching incident videos and organizing them with tags. It combines multiple state-of-the-art AI models for comprehensive video content search:
- Visual Search: Meta's Perception Model (Core) and OpenCLIP for understanding visual content and cross-modal search
- Transcript Search: Qwen3-Embedding-8B for high-quality semantic search over spoken content (transcribed using whisper and its large v3 model)
Watch a quick demo of Open Testimony in action:
demo.mp4
- Text-to-video search using Meta's Perception Model or OpenCLIP
- Image-to-video search for finding similar visual content
- Frame extraction and intelligent indexing
- Cross-modal understanding between text and visual content
- Instant visual results with thumbnails
- Semantic search using Qwen3-Embedding-8B embeddings
- Exact text matching for precise queries
- Filename search: Find videos by their file names
- Multi-language transcript support
- Automatic video transcription (optional)
- Time-aligned transcript results
- Open Testimony branding with incident-focused UI
- Tagging System: Comprehensive video tagging for organizing evidence
- Add tags directly from search results
- Persistent tag storage (automatically saved and loaded)
- Tag dropdown with existing tags for quick selection
- Tags visible in all search results for easy identification
- Modern, responsive UI with dark mode support
- Instant search results with visual previews
- Video playback starting at matched frames/segments
- M3U playlist generation for search results
- FAISS vector similarity search
- FP16 support for efficient memory usage
- Automatic video transcoding when needed
- Intelligent frame filtering (skips dark/black frames)
- Configurable model parameters
- Multi-threaded processing
Quickest way to try it out is to download the OpenTestimony-mac.zip file from the Releases page and run that. You might need to hop through security hoops to launch it despite the fact that I signed and notarized it, but hopefully that isn't too much of a pain.
You will pick the directory tree of videos you want to index and then click 'Start' and it will run for a while indexing. If you have less than 64GB or RAM, I would check the 'fp16' box. The accuracy should be about the same and use less RAM. When indexing is done, you can hit the local webserver at http://127.0.0.1:8002 and search away!
- Clone the repository:
git clone https://github.com/yourusername/VideoIndexer-project.git
cd VideoIndexer-project- Create and activate conda environment:
conda env create -f environment.yml
conda activate video-indexer- Install additional dependencies:
pip install -r requirements.txt- Process videos to extract frames and build the visual search index:
cd src
# Extract frames (add --fp16 to reduce memory usage)
python frame_extractor.py /path/to/your/videos --output-dir video_index_output
# Build visual search index (add --fp16 for reduced memory)
python frame_indexer.py --input-dir video_index_output
# Optional: use OpenCLIP (LAION/WebLI) for visual indexing
python frame_indexer.py --input-dir video_index_output \
--model-family open_clip \
--model-name ViT-H-14 \
--openclip-pretrained laion2b_s32b_b79k- Generate and index transcripts:
# Generate transcripts (add --fp16 for reduced memory)
python transcript_extractor_pywhisper.py /path/to/your/videos --output-dir video_index_output
# Build transcript search index (add --fp16 for reduced memory)
python transcript_indexer.py --input-dir video_index_output- Start the search server:
cd src
# Add --fp16 flag to reduce memory usage during search
python video_search_server_transcode.py --fp16- Open http://localhost:8002 in your browser
- The
--fp16flag can be used with most components to reduce memory usage by about 50% - For large video collections, using FP16 is recommended
- Memory usage is highest during initial indexing and reduces for search operations
- If you encounter memory issues:
- Use the
--fp16flag - Process videos in smaller batches
- Close other memory-intensive applications
- Use the
- Copy the credentials template and fill in your Apple Developer details:
cp store_credentials_template.sh store_credentials.sh
chmod +x store_credentials.sh
# Edit store_credentials.sh with your details
./store_credentials.sh- Build the app:
# Full build (slow)
./build_and_sign.sh build
# Quick code sync (for developers)
./build_and_sign.sh sync
# Sign the app
./build_and_sign.sh sign
```
The built app will be in `./build/OpenTestimony.app`
## Configuration
### Model Configuration
- `index_config.json`: Configure visual search model and embedding settings
- Visual models:
- Meta Perception (default): `--model-family pe` with `PE-Core-*` models
- OpenCLIP (LAION/WebLI): `--model-family open_clip` with `--model-name` and `--openclip-pretrained`
- `transcript_index_config.json`: Configure transcript search settings
- Command line arguments for frame extraction and indexing (see --help)
### Performance Tuning
- Use `--fp16` flag for reduced memory usage
- Adjust frame extraction rate for storage/accuracy tradeoff
- Configure FAISS index type for speed/accuracy tradeoff
- Set maximum result thresholds for faster searches
## License
This project is licensed under the Apache License - see the LICENSE file for details.
## Acknowledgments
- [Meta's Perception Model (Core)](https://github.com/facebookresearch/perception_models) for visual understanding
- [OpenCLIP](https://github.com/mlfoundations/open_clip) for additional visual model support
- [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) for semantic text understanding
- [FAISS](https://github.com/facebookresearch/faiss) for efficient similarity search
- [FFmpeg](https://ffmpeg.org/) for video processing
- [whisper.cpp](https://github.com/ggml-org/whisper.cpp) for multi-lingual transcription