Skip to content

jasontitus/VideoIndexer-project

Repository files navigation

Open Testimony: Incident Video Search & Indexing

Open Testimony is a powerful video search and indexing tool focused on searching incident videos and organizing them with tags. It combines multiple state-of-the-art AI models for comprehensive video content search:

  • Visual Search: Meta's Perception Model (Core) and OpenCLIP for understanding visual content and cross-modal search
  • Transcript Search: Qwen3-Embedding-8B for high-quality semantic search over spoken content (transcribed using whisper and its large v3 model)

Demo

Watch a quick demo of Open Testimony in action:

demo.mp4

Features

Visual Content Search

  • Text-to-video search using Meta's Perception Model or OpenCLIP
  • Image-to-video search for finding similar visual content
  • Frame extraction and intelligent indexing
  • Cross-modal understanding between text and visual content
  • Instant visual results with thumbnails

Transcript Search

  • Semantic search using Qwen3-Embedding-8B embeddings
  • Exact text matching for precise queries
  • Filename search: Find videos by their file names
  • Multi-language transcript support
  • Automatic video transcription (optional)
  • Time-aligned transcript results

User Interface & Tagging

  • Open Testimony branding with incident-focused UI
  • Tagging System: Comprehensive video tagging for organizing evidence
    • Add tags directly from search results
    • Persistent tag storage (automatically saved and loaded)
    • Tag dropdown with existing tags for quick selection
    • Tags visible in all search results for easy identification
  • Modern, responsive UI with dark mode support
  • Instant search results with visual previews
  • Video playback starting at matched frames/segments
  • M3U playlist generation for search results

Technical Features

  • FAISS vector similarity search
  • FP16 support for efficient memory usage
  • Automatic video transcoding when needed
  • Intelligent frame filtering (skips dark/black frames)
  • Configurable model parameters
  • Multi-threaded processing

Quickest way to try it out is to download the OpenTestimony-mac.zip file from the Releases page and run that. You might need to hop through security hoops to launch it despite the fact that I signed and notarized it, but hopefully that isn't too much of a pain.

You will pick the directory tree of videos you want to index and then click 'Start' and it will run for a while indexing. If you have less than 64GB or RAM, I would check the 'fp16' box. The accuracy should be about the same and use less RAM. When indexing is done, you can hit the local webserver at http://127.0.0.1:8002 and search away!

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/VideoIndexer-project.git
cd VideoIndexer-project
  1. Create and activate conda environment:
conda env create -f environment.yml
conda activate video-indexer
  1. Install additional dependencies:
pip install -r requirements.txt

Usage

  1. Process videos to extract frames and build the visual search index:
cd src
# Extract frames (add --fp16 to reduce memory usage)
python frame_extractor.py /path/to/your/videos --output-dir video_index_output

# Build visual search index (add --fp16 for reduced memory)
python frame_indexer.py --input-dir video_index_output

# Optional: use OpenCLIP (LAION/WebLI) for visual indexing
python frame_indexer.py --input-dir video_index_output \
  --model-family open_clip \
  --model-name ViT-H-14 \
  --openclip-pretrained laion2b_s32b_b79k
  1. Generate and index transcripts:
# Generate transcripts (add --fp16 for reduced memory)
python transcript_extractor_pywhisper.py /path/to/your/videos --output-dir video_index_output

# Build transcript search index (add --fp16 for reduced memory)
python transcript_indexer.py --input-dir video_index_output
  1. Start the search server:
cd src
# Add --fp16 flag to reduce memory usage during search
python video_search_server_transcode.py --fp16
  1. Open http://localhost:8002 in your browser

Memory Usage Tips

  • The --fp16 flag can be used with most components to reduce memory usage by about 50%
  • For large video collections, using FP16 is recommended
  • Memory usage is highest during initial indexing and reduces for search operations
  • If you encounter memory issues:
    1. Use the --fp16 flag
    2. Process videos in smaller batches
    3. Close other memory-intensive applications

Building macOS App

Prerequisites

  1. Copy the credentials template and fill in your Apple Developer details:
cp store_credentials_template.sh store_credentials.sh
chmod +x store_credentials.sh
# Edit store_credentials.sh with your details
./store_credentials.sh
  1. Build the app:
# Full build (slow)
./build_and_sign.sh build

# Quick code sync (for developers)
./build_and_sign.sh sync

    # Sign the app
    ./build_and_sign.sh sign
    ```

    The built app will be in `./build/OpenTestimony.app`

## Configuration

### Model Configuration
- `index_config.json`: Configure visual search model and embedding settings
- Visual models:
  - Meta Perception (default): `--model-family pe` with `PE-Core-*` models
  - OpenCLIP (LAION/WebLI): `--model-family open_clip` with `--model-name` and `--openclip-pretrained`
- `transcript_index_config.json`: Configure transcript search settings
- Command line arguments for frame extraction and indexing (see --help)

### Performance Tuning
- Use `--fp16` flag for reduced memory usage
- Adjust frame extraction rate for storage/accuracy tradeoff
- Configure FAISS index type for speed/accuracy tradeoff
- Set maximum result thresholds for faster searches


## License

This project is licensed under the Apache License - see the LICENSE file for details.

## Acknowledgments

- [Meta's Perception Model (Core)](https://github.com/facebookresearch/perception_models) for visual understanding
- [OpenCLIP](https://github.com/mlfoundations/open_clip) for additional visual model support
- [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) for semantic text understanding
- [FAISS](https://github.com/facebookresearch/faiss) for efficient similarity search
- [FFmpeg](https://ffmpeg.org/) for video processing 
- [whisper.cpp](https://github.com/ggml-org/whisper.cpp) for multi-lingual transcription