This project implements an intelligent, depth-aware robotic system for autonomous firefighting. The system leverages a dual-YOLOv8 architecture for simultaneous fire and obstacle detection, a Vision Transformer (ViT) model for real-time depth perception, and a priority-based navigation algorithm to safely and effectively respond to fire incidents.
Model demonstration:
- How It Works
- System Architecture
- Technology Stack
- Directory Structure
- Prerequisites
- Setup Instructions
- Running the Project
- Acknowledgments
The robot's operation is a continuous loop of Perception, Decision-making, and Action. It processes video frames in real-time to understand its environment and make intelligent navigation choices.
-
Perception - Seeing the World:
- The system captures a video frame.
- A Depth Estimation Model (
Intel/dpt-swinv2-tiny-256) analyzes the frame to create a detailed depth map, calculating how far away every point in the scene is. - Simultaneously, two YOLOv8 models run in parallel:
- Fire/Smoke Model (
yolov8n-200e-v0.2.pt): A custom-trained model that specifically identifies fire and smoke. - Obstacle Model (
yolov8n.pt): A standard YOLOv8 model that detects general obstacles like people, furniture, etc.
- Fire/Smoke Model (
-
Analysis - Understanding the Dangers:
- The system combines the results from all three models. For every object detected (whether fire or obstacle), it uses the depth map to calculate its real-world distance from the robot.
- All detected objects are compiled into a single list, sorted by distance (closest first).
-
Decision - Choosing the Next Move:
- A priority-based navigation algorithm analyzes the sorted list of objects to issue a command:
- Priority 1 (Avoidance): If the closest object is an obstacle and is within the
safe_distance_threshold, the robot will turn left or right to avoid it. - Priority 2 (Targeting): If there are no immediate obstacle threats, the robot will navigate towards the closest detected fire. It moves forward, left, or right to keep the fire centered in its view.
- Priority 3 (Exploration): If no fire is detected but the path ahead is clear of obstacles, the robot moves forward to search for threats.
- Priority 4 (Halt): If the path is blocked or a fire is too close, the robot stops to ensure safety.
- Priority 1 (Avoidance): If the closest object is an obstacle and is within the
- A priority-based navigation algorithm analyzes the sorted list of objects to issue a command:
-
Action & Monitoring - Executing the Command:
- The chosen command (e.g.,
forward,left,stop) is sent to the Firebase Realtime Database. - The physical robot (controlled by an Arduino or similar microcontroller) listens to this database path and executes the corresponding motor command.
- This entire process repeats, allowing for continuous, autonomous operation.
- The chosen command (e.g.,
The system is composed of three main modules:
-
Vision & Perception Module
- Dual YOLOv8 Engine: For simultaneous fire, smoke, and obstacle detection.
- Depth Estimation Engine: Uses
Intel/dpt-swinv2-tiny-256for dense depth mapping. - Data Fusion: Combines detection boxes with depth data to locate objects in 3D space.
-
Navigation & Control Module
- Priority-Based Algorithm: Determines the robot's next move based on a clear set of safety and mission rules.
- Command Generation: Translates the decision into a simple command code (0: stop, 1: forward, 2: left, 3: right).
-
Backend & Communication Module
- Firebase Realtime Database: Acts as the communication bridge between the Python brain and the robot's hardware.
- Remote Monitoring: The system provides a real-time visual feed including the main camera view, a depth map visualization, and a top-down tactical map.
- Object Detection: Ultralytics YOLOv8
- Used for real-time detection of fire, smoke, and obstacles using two parallel models.
- Depth Estimation: Hugging Face Transformers
- Employs the
Intel/dpt-swinv2-tiny-256model, a state-of-the-art Depth Prediction Transformer.
- Employs the
- Backend & Control: Firebase Realtime Database
- Provides a simple, low-latency channel for sending navigation commands to the robot hardware.
robotic-firebot/
├── python/
│ ├── navigate2depth.py # Main application script
│ ├── config.yaml # All configuration settings
│ ├── requirements.txt # Python dependencies
│ ├── check_gpu.py # Utility to verify CUDA setup
│ └── run_robot.bat # Windows script for easy execution
│
├── checkpoints/ # Directory for model weights
│ ├── yolov8n.pt # Standard obstacle detection model
│ └── yolov8n-200e-v0.2.pt # Custom fire/smoke detection model
│
├── arduino/ # Arduino firmware for motor control
│ └── ...
│
├── ets-eas_document/ # Project proposal and presentation files
│ └── ...
│
├── ipynb/ # Jupyter notebooks for experimentation
│ └── ...
│
├── serviceAccountKey.json # Firebase service account key (place in root)
│
└── README.md # This documentation file
- Python 3.8+
- Git
- A Firebase project with Realtime Database enabled
- (Recommended) An NVIDIA GPU with CUDA and cuDNN installed for real-time performance
git clone https://github.com/kyrozepto/robotic-firebot
cd robotic-firebotIt is highly recommended to use a virtual environment.
# Navigate into the python script directory
cd python
# Create and activate a virtual environment
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install the required packages
pip install -r requirements.txtFor real-time performance, a CUDA-enabled GPU is essential.
- Install the NVIDIA CUDA Toolkit that matches your driver version.
- Install cuDNN.
- Install PyTorch with CUDA support. Check the PyTorch website for the correct command for your specific CUDA version. For CUDA 11.8, the command is:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Verify the setup by running:
python check_gpu.py
Download the required YOLOv8 model weights and place them in the checkpoints/ directory at the root of the project.
yolov8n.pt(Standard model for obstacles)yolov8n-200e-v0.2.pt(Your custom model for fire/smoke)
Your checkpoints folder should look like this:
robotic-firebot/
└── checkpoints/
├── yolov8n.pt
└── yolov8n-200e-v0.2.pt
- Go to the Firebase Console and create a new project.
- In your project, go to Build > Realtime Database, click "Create database", and start in test mode (you can secure the rules later).
- In Project Settings (⚙️) > Service Accounts, click "Generate new private key".
- A JSON file will be downloaded. Rename it to
serviceAccountKey.jsonand place it in the rootrobotic-firebot/directory.
All settings are managed in the python/config.yaml file. Review it before running the script.
| Setting | Description |
|---|---|
gpu.enabled |
1 to use GPU, 0 for CPU. |
yolo.obstacle_model_path |
Path to the standard YOLOv8 model for obstacle detection. |
yolo.fire_smoke_model_path |
Path to your custom YOLOv8 model for fire/smoke detection. |
yolo.model_confidence |
The minimum confidence score (0.0 to 1.0) to consider a detection valid. |
depth_model.name |
The Hugging Face name of the depth estimation model. |
depth_model.safe_distance_threshold |
The distance (in meters) at which the robot will prioritize obstacle avoidance. |
firebase.cred_path |
Path to your Firebase service account key. The default ../serviceAccountKey.json is correct if you follow the setup steps. |
firebase.db_url |
The URL of your Firebase Realtime Database. |
firebase.command_path |
The specific path within the database where commands will be written. |
class_ids.fire / .smoke |
The class IDs for fire and smoke from your custom dataset. |
video_source |
0 for the default webcam, or a path to a video file (e.g., "path/to/video.mp4"). |
map_view.enabled |
1 to show the top-down map visualization, 0 to hide it. |
Ensure you are in the python/ directory and your virtual environment is activated.
Simply double-click or run the batch file:
run_robot.batpython navigate2depth.pyPress the ESC key in the display window to exit the program.
- Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun, available at https://arxiv.org/abs/2103.13413
- Ultralytics
- Hugging Face
- Google Firebase
- PyTorch
