Pedestrian Detection and Tracking with YOLO

A comprehensive system for detecting pedestrians in video footage and mapping their movement patterns using YOLO object detection and homography transformation.

Overview

This project combines computer vision techniques to:

Extract frames from video footage
Create spatial transformations using homography mapping
Detect pedestrians using YOLO (You Only Look Once)
Transform detection coordinates to top-down view
Generate heatmaps showing pedestrian movement patterns

Features

Modular Architecture: Clean separation of concerns with dedicated modules for each task
Configurable Parameters: All settings managed through YAML configuration
YOLO Integration: Uses state-of-the-art YOLOv8 for accurate pedestrian detection
Homography Transformation: Maps camera perspective to satellite/top-down view
Outlier Detection: Statistical filtering of erroneous detections
Rich Visualizations: Scatter plots, heatmaps, and transformed views

Project Structure

pedestrian-detection-tracking-YOLO/
├── config/
│   └── config.yaml              # Configuration parameters
├── src/
│   ├── __init__.py
│   ├── utils.py                 # Utility functions
│   ├── video_processor.py       # Video frame extraction
│   ├── homography_mapper.py     # Homography transformation setup
│   └── detector.py              # YOLO detection and tracking
├── data/
│   ├── input/
│   │   ├── videos/              # Input video files
│   │   └── images/              # Reference images (video frame, satellite)
│   ├── output/
│   │   └── images/              # Output visualizations
│   ├── frames/                  # Extracted video frames
│   └── detections/              # Detection results (CSV)
├── models/                      # YOLO model weights
├── tests/                       # Unit tests
├── docs/                        # Additional documentation
├── README.md
└── requirements.txt

Installation

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (optional, for faster processing)

Setup

Clone the repository:

git clone https://github.com/yourusername/pedestrian-detection-tracking-YOLO.git
cd pedestrian-detection-tracking-YOLO

Install dependencies:

pip install -r requirements.txt

Download YOLO weights:

# YOLOv8 models will be automatically downloaded on first run
# Or manually download and place in models/ directory

Usage

Step 1: Extract Frames from Video

Extract frames from your video file at specified intervals:

from src.video_processor import VideoFrameExtractor

extractor = VideoFrameExtractor()
extractor.extract_frames(
    video_path="data/input/videos/your_video.mp4",
    output_folder="data/frames",
    interval_sec=1  # Extract 1 frame per second
)

Or run directly:

python -m src.video_processor

Step 2: Create Homography Transformation Matrix

Set up the transformation between camera view and top-down view:

from src.homography_mapper import HomographyMapper

mapper = HomographyMapper()
matrix = mapper.create_homography_matrix(
    source_image_path="data/input/images/video_frame.jpg",
    dest_image_path="data/input/images/satellite_view.jpg"
)
mapper.save_matrix("data/homography_matrix.npy")

Or run interactively:

python -m src.homography_mapper

Instructions:

Click on 4 corresponding points in the source image (video frame)
Click on 4 matching points in the destination image (satellite/top-down view)
The transformation matrix will be calculated and saved

Step 3: Detect and Map Pedestrians

Run the complete detection and tracking pipeline:

import numpy as np
from src.detector import PedestrianTrackingPipeline

# Load homography matrix
matrix = np.load("data/homography_matrix.npy")

# Run pipeline
pipeline = PedestrianTrackingPipeline(matrix)
results = pipeline.run(images_folder="data/frames")

Or run directly:

python -m src.detector

Complete Workflow

# 1. Extract frames
python -m src.video_processor

# 2. Create homography transformation (interactive)
python -m src.homography_mapper

# 3. Run detection and tracking
python -m src.detector

Configuration

All parameters can be configured in config/config.yaml:

# Example configuration
paths:
  input_video: "data/input/videos/sample_video.mp4"
  yolo_weights: "models/yolov8l.pt"

video_processing:
  frame_interval_sec: 1

detection:
  model_size: "yolov8l"
  confidence_threshold: 0.5

visualization:
  heatmap:
    bins_x: 20
    bins_y: 35

Output

The pipeline generates several outputs:

Extracted Frames: data/frames/frame_XXXX.png
Homography Visualizations:
- data/output/images/source_image_with_points.jpg
- data/output/images/destination_image_with_points.jpg
- data/output/images/transformed_image.jpg
Detection Data: data/detections/detections_data.csv
Visualizations:
- data/output/images/center_points_original.jpg
- data/output/images/transformed_points_scatter.jpg
- data/output/images/pedestrian_heatmap.jpg

Sample Data Structure

To use this project, prepare your data as follows:

data/
├── input/
│   ├── videos/
│   │   └── your_video.mp4          # Your video file
│   └── images/
│       ├── video_frame.jpg         # A sample frame from your video
│       └── satellite_view.jpg      # Top-down/satellite view of the area

Algorithm Details

Homography Transformation

Homography is a projective transformation that maps points from one plane to another. In this project:

Source: Camera perspective view (angled ~15 degrees)
Destination: Top-down satellite view
Method: Point correspondence (4+ points)