Edge Vision is an offline-first computer-vision pipeline built for Raspberry Pi 5–class hardware. It captures frames from a live camera or video file, runs Ultralytics detectors (YOLOv8 or RT-DETR), and emits structured JSON to stdout. Optional GPIO alerts, frame capture, Docker packaging, CI, and regression tests are included so you can treat it like a production-ready service.
- Dual detection backends: Ultralytics YOLOv8 (default) and RT-DETR for transformer-based accuracy
- Live camera or file-based sources with automatic fallback demo clip
- JSON logs enriched with confidence scores, per-frame video timestamp, and first-seen timing
- Optional annotated frame export and GPIO LED alerts
- Makefile automation, Docker image, GitHub Actions CI/CD, and pytest suite
- Sample-video workflow for laptop development (5s/10s/20s/30s curated clips)
src/edge_vision/– detection pipelines, CLI, configuration and utilitiesconfig/default.yaml– all runtime settings (camera, model paths, logging, alerts)requirements*.txt– runtime and dev dependenciesscripts/– helper utilities (load_weights.py,generate_sample_clips.py)data/– generated samples (data/samples/) and annotated framesmodels/– expected location for Ultralytics weight files (yolov8n.pt,rtdetr-l.pt)tests/– pytest coverage for config, CLI, logging, and inference utilities
- Python 3.11+ with
venv curl(or equivalent) for downloading sample videos/weightsmake(optional but recommended)- GPU not required; CPU inference is fine for the provided clips
- (Optional)
gpiozeroand physical LED if you intend to exercise GPIO alerts on Raspberry Pi
make setup # create .venv/ and install runtime + dev dependencies
make demo-videos # download raw sources (~180 MB) and build curated clipsThis populates:
.venv/– isolated Python environmentdata/source_videos/– Intel and Big Buck Bunny source footagedata/samples/– 5s/10s/20s/30s trimmed clips tailored to the detection tasks
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt
mkdir -p models data/annotated
curl -L -o models/yolov8n.pt https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt
curl -L -o models/rtdetr-l.pt https://github.com/ultralytics/assets/releases/download/v8.1.0/rtdetr-l.pt
make demo-videos # optional but handy on laptops/WSLAll commands below assume the virtualenv is active (source .venv/bin/activate). When launching from a cloned repo, remember to expose the source package:
export PYTHONPATH=src# laptop/WSL with a demo clip
python -m edge_vision --headless --no-alerts --video data/samples/humans-5s.mp4
# machine with a real webcam (GUI window will open)
python -m edge_vision --no-alerts# use a clip
python -m edge_vision --backend rtdetr --headless --no-alerts --video data/samples/humans-5s.mp4
# use the camera with on-screen display
python -m edge_vision --backend rtdetr --no-alerts--config PATH– use an alternate YAML config--video PATH– override the video/camera source--headless– disable any GUI windows (typical for servers or WSL)--no-alerts– suppress GPIO outputs even if enabled in config--log-level DEBUG– bump verbosity--backend {yolov8,rtdetr}– choose model family at runtime--camera-backend {opencv,pi}– select OpenCV capture or Picamera2 (Pi only)--save-annotated– persist annotated frames tosystem.annotated_output_dir
Each detection frame prints a line similar to:
{
"timestamp": "2025-10-30T06:55:37.562858+00:00",
"objects_detected": ["person"],
"count": 1,
"confidence": [0.902],
"video_time_sec": 4.0,
"first_seen_video_sec": [2.5]
}video_time_sec– wall-clock position inside the video/camera stream (seconds)first_seen_video_sec– when each class was first observed in the clip- Fields are omitted when timestamps are unavailable (e.g., headless camera drivers without POS_MSEC support)
Running on a Pi 5 with the official camera uses the dedicated Picamera2 pipeline:
-
Install Picamera2 (Raspberry Pi OS Bullseye or newer):
sudo apt update sudo apt install -y python3-picamera2
-
Either update
config/default.yaml:system: camera_backend: pi
or launch with
--camera-backend pi. -
Run YOLOv8 on the Pi camera:
python -m edge_vision --camera-backend pi --no-alerts
RT-DETR works the same way:
python -m edge_vision --backend rtdetr --camera-backend pi --no-alerts
The Pi pipeline streams frames via Picamera2, still prints JSON logs with timings, and honours display/save/alert options. When the Pi camera is unavailable the application raises a clear error.
Key sections:
systemcamera_index,video_source,demo_video,camera_backendframe_width,frame_heightdisplay,headless,save_annotated,annotated_output_dir
detectionbackend(yolov8orrtdetr)model_path,yolov8_model_path,rtdetr_model_pathtargets(labels to keep),confidence_threshold
logginglevel,json_indent,utc
alertsenabled,gpio_pin,active_high,flash_duration_ms
profiling(flag placeholder for future FPS logging)
All relative paths are normalized relative to the config file. Override any value via CLI flags, alternate YAML, or editing this file.
make demo-videosdownloads:face-demographics-walking.mp4(people)person-bicycle-car-detection.mp4(mixed street)BigBuckBunny.mp4(animals)
scripts/generate_sample_clips.pytrims them to quick-to-run clips:data/samples/humans-5s.mp4data/samples/humans-10s.mp4data/samples/humans-vehicles-20s.mp4data/samples/animals-30s.mp4
These clips are ideal for headless testing on laptops/WSL where /dev/video0 might not exist. Update or extend the script if you want other durations or scenarios.
- Tests:
PYTHONPATH=src python -m pytest -q - Linting:
make lint(Black + Flake8) - Full automation:
make test - Docker image:
make buildthenmake docker-run(mounts/dev/video0, uses--headlessby default) - GitHub Actions: CI runs formatting and tests; CD builds/pushes a Docker image to GHCR on
main
Scripts worth noting:
scripts/load_weights.py– quick sanity check that the configured backend weights can be loaded (readsdetection.backend)scripts/generate_sample_clips.py– reproducible sample clip creation
- “No module named edge_vision”: ensure
PYTHONPATH=srcor install the project as a package (pip install -e .) - Camera index errors: adjust
system.camera_index, or use--video path/to/file.mp4 - Missing Ultralytics modules: re-run
pip install -r requirements.txtinside the virtualenv - GPIO warnings: expected on laptops; disable via
--no-alerts - Timestamp missing: some drivers don’t expose
CAP_PROP_POS_MSEC. Logs will omitvideo_time_secin that case
- Swap
targetsin the config to include additional classes - Point
model_pathto specialized YOLO weights (e.g., custom-trained models) - Extend the JSON payload (e.g., bounding boxes) within
run_inferenceandemit_json_log - Add new backends by following the RT-DETR pattern (
src/edge_vision/rtdetr_app.py)
Happy hacking! If you notice missing assets or want to automate more, the Makefile and scripts directory are the best places to extend.