Egocentric-10K Mini Explorer

What This Repo Does

Download video samples from the Egocentric-10K dataset - a collection of first-person factory worker videos
Run a Streamlit app that analyzes these videos using GPT-4o-mini vision to detect worker actions, tools, objects, and safety gear

Quick Start

# 1. Clone and install dependencies
git clone [email protected]:carsonmulligan/factory-vision.git
cd factory-vision
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 2. Create .env with your API keys (see Setup below)

# 3. Run the Streamlit app (it will help you download clip 00 if needed)
cd analysis && streamlit run app.py

Setup

Configure API keys - Create a .env file with:

OPENAI_API_KEY=your_openai_key_here
HF_TOKEN=your_hf_token_here

Get OpenAI API key: https://platform.openai.com/api-keys
Get HuggingFace token: https://hf.co/settings/tokens
Accept dataset terms: https://huggingface.co/datasets/builddotai/Egocentric-10K

Usage

Interactive Web App - Run the Streamlit app which will guide you through downloading and analyzing clips:

cd analysis
streamlit run app.py

The app will check if clip 00 exists and let you download it if needed. Then choose to analyze the first 30 seconds or the full 7-minute video.

Batch Processing - Process multiple clips from HuggingFace (requires streaming):

cd analysis
python main.py

Manual Download - Download clips directly:

# Just clip 00 (7 min, ~200MB)
python preload_clips_direct.py

# All clips 0-5 (~3GB)
python preload_clips_direct.py --all

Configuration

Edit src/config.py to adjust frame extraction intervals, token limits, and number of clips to process.

Cost Breakdown

Model: GPT-4o-mini ($0.15/1M input tokens, $0.60/1M output tokens)
Per frame: ~$0.0002-0.0005
First 30 seconds (3 frames): <$0.01

Project Structure

.
├── preload_clips_direct.py # MAIN: Download clips (10-15 sec)
├── src/                    # Core library modules
│   ├── config.py          # Configuration and cost tracking
│   ├── stream_sampler.py  # Video frame extraction
│   └── vision_analyzer.py # GPT-4o-mini vision API
├── analysis/              # Analysis tools
│   ├── app.py            # Streamlit web app (tests first 30s)
│   └── main.py           # Batch processing pipeline
├── sample_clips/          # Downloaded videos (gitignored)
├── requirements.txt       # Dependencies
└── .env                   # API credentials (not committed)

License

Apache 2.0

Author

@us_east_3

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
analysis		analysis
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
pig-jarvis.png		pig-jarvis.png
preload_clips.py		preload_clips.py
preload_clips_direct.py		preload_clips_direct.py
preload_clips_fast.py		preload_clips_fast.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Egocentric-10K Mini Explorer

What This Repo Does

Quick Start

Setup

Usage

Configuration

Cost Breakdown

Project Structure

License

Author

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

carsonmulligan/factory-vision

Folders and files

Latest commit

History

Repository files navigation

Egocentric-10K Mini Explorer

What This Repo Does

Quick Start

Setup

Usage

Configuration

Cost Breakdown

Project Structure

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages