video_to_frames/
│
├── extract_frames.py ← Main script (run this)
├── requirements.txt ← Python dependencies
├── README.md ← You are here
│
├── input/ ← DROP YOUR VIDEOS HERE
│ └── (your_video.mp4)
│
├── output/ ← FRAMES SAVED HERE (auto-created)
│ └── your_video/
│ ├── frame_000000.jpg
│ ├── frame_000001.jpg
│ ├── frame_000002.jpg
│ └── metadata.json ← frame index + timestamps (for annotation tools)
│
└── logs/ ← 📋 Per-run log files (auto-created)
└── your_video_20240428_143200.log
# 1. Install dependencies
pip install -r requirements.txt
# 2. Drop your video into input/
cp /path/to/your_video.mp4 input/
# 3. Run with defaults (2 fps — good general-purpose)
python extract_frames.py
# 4. Your frames appear in output/your_video/| Flag | Default | Description |
|---|---|---|
--video |
all videos | Specific video file inside input/ |
--fps |
2 |
Frames per second to extract |
--every_n_sec |
— | 1 frame every N seconds (overrides --fps) |
--format |
jpg |
Output format: jpg or png |
--quality |
95 |
JPEG quality 1–100 (ignored for PNG) |
--resize W H |
original | Resize to width × height |
--prefix |
frame |
Output filename prefix |
# Extract 1 frame per second
python extract_frames.py --fps 1
# Waymo-grade: 10 fps (1 frame every 0.1 sec)
python extract_frames.py --fps 10
# 1 frame every 2 seconds — sparse sampling
python extract_frames.py --every_n_sec 2
# Lossless PNG output for medical / satellite annotation
python extract_frames.py --format png
# Resize all frames to 1280×720 (faster annotation tools)
python extract_frames.py --resize 1280 720
# Process a specific video only
python extract_frames.py --video dashcam.mp4 --fps 5
# Full custom run
python extract_frames.py --fps 10 --format png --resize 1920 1080 --prefix cam_front- Captures sensor data at 10 Hz (10 frames/sec = 1 frame every 0.1s)
- Each 20-second scene → 200 annotated frames per camera
- Uses 5 cameras + 5 LiDAR sensors simultaneously
- Provides 12 million 3D box annotations and 1.2 million 2D annotations
- Reference: Waymo Open Dataset
- Cameras operate at 36 fps (raw capture)
- FSD neural network processes at ~20–36 fps
- For training data labelling, frames are sampled at key intervals
- Uses vision-only (no LiDAR) — camera images are the primary annotation target
- FSD v12 ingested billions of video frames for end-to-end neural net training
| Use Case | Recommended FPS | Frames / 1-min video |
|---|---|---|
| Static object detection (products, signs) | 1 fps | ~60 |
| Pedestrian / general scene understanding | 2 fps ✅ default | ~120 |
| Sports, gesture, action recognition | 5 fps | ~300 |
| Vehicle / dashcam tracking (Tesla-style) | 5–10 fps | 300–600 |
| Full autonomous driving (Waymo-grade) | 10 fps | ~600 |
| Medical imaging (endoscopy, surgery) | 1–2 fps | ~60–120 |
| Drone / aerial surveillance | 2–5 fps | 120–300 |
** Rule of thumb:**
- Slow-moving or static scenes → 1–2 fps
- Normal human activity → 2–5 fps
- Fast-moving vehicles / sports → 5–10 fps
Each output folder contains a metadata.json you can feed directly into annotation tools (CVAT, Label Studio, Roboflow, etc.):
{
"source_video": "dashcam.mp4",
"target_fps": 2,
"total_saved": 240,
"video_info": {
"native_fps": 30,
"width": 1920,
"height": 1080,
"duration_sec": 120.0
},
"frames": [
{ "frame_index": 0, "native_frame": 0, "timestamp_sec": 0.0, "filename": "frame_000000.jpg" },
{ "frame_index": 1, "native_frame": 15, "timestamp_sec": 0.5, "filename": "frame_000001.jpg" },
...
]
}opencv-python-headless
tqdm
Install:
pip install -r requirements.txtcomments to Start this package:
unzip video_to_frames.zip
cd video_to_frames
python3 setup.py # creates venv + installs deps
bash run.sh # run with defaults
cd video_to_frames
source venv/bin/activate #to run the package in virtual environment
python3 extract_frames.py