Background
Algal blooms (e.g., Red Tide) pose a threat to the health of humans, marine life, and [aquatic] ecosystems. These blooms, often fueled by nutrient runoff and warmer temperatures, are increasing in prevalence and can negatively impact water quality and oxygen levels, hence the need to keep track of harmful algae / algal blooms (i.e., collect natural water sample(s) for analysis).
An expensive and cumbersome microscope is often needed to view samples/slides in high-resolution. While offering very high visual fidelity, these types of microscopes do not offer a solution that can be used in the field. Conversely, affordable and light microscopes come with limitations as well, such as subpar resolution and focus. The manual nature of detection, quantification, and classification further compounds the drawbacks, resulting in time-consuming and labor-intensive procedures.
Although it certainly isn't a 1 : 1 comparison, I like to think of the camera(s) as the system's eyes and the detection model as its brain: - This project applies computer vision (subfield of AI) techniques to fetch visual data from the camera(s) - The type of model being used (i.e., CNN, which is a subset of DNN) is loosely inspired by the human brain - In both cases, eyes / camera(s) see / get the input and send it to the brain / model for processing
The following boards are compatible with this project
Important
Refer to boards section of ESP32CAM-RTSP's README.md
(credit: rzeldent) for most up-to-date information.
Board | MCU | SRAM | Flash | PSRAM | Camera | Microphone |
---|---|---|---|---|---|---|
Espressif ESP32-Wrover CAM | ESP32 | 520 KB | 4 Mb | 4 MB | OV2640 | No |
AI Thinker ESP32-CAM | ESP32-S | 520 KB | 4 Mb | 4 MB | OV2640 | No |
Espressif ESP-EYE | ESP32 | 520 KB | 4 Mb | 4 MB | OV2640 | No |
Espressif ESP-S3-EYE | ESP32-S3 | 520 KB | 4 Mb | 4 MB | OV2640 | No |
LilyGo camera module | ESP32 Wrover | 520 KB | 4 Mb | 4 MB | OV2640 / OV5640 | No |
LilyGo Simcam | ESP32-S3R8 | OV2640 | No | |||
LilyGo TTGO-T Camera | OV2640 | No | ||||
M5Stack ESP32CAM | ESP32 | 520 Kb | 4 Mb | OV2640 | Yes | |
M5Stack UnitCam | ESP32-WROOM-32E | 520 KB | 4 Mb | OV2640 | No | |
M5Stack Camera | ESP32 | 520 Kb | 4 Mb | OV2640 | No | |
M5Stack Camera PSRAM | ESP32 | 520 Kb | 4 Mb | 4 Mb | OV2640 | No |
M5Stack UnitCamS3 | ESP32-S3-WROOM-1-N16R8 | 520 Kb | 16 Mb | 8 Mb | OV2640 | No |
M5Stack M5PoECAM-W | ESP32-D0WDQ6-V3 | 520 kB | 16 MB | 8 MB | OV2640 | No |
Seeed studio Xiao ESP32S3 Sense | ESP32-S3R8 | 520 KB | 8 Mb | 8 MB | OV2640 | Yes |
Model Performance
[Pre-Trained] Model | Confusion Matrix (Normalized) | Precision-Confidence Curve | Precision-Recall Curve | Recall-Confidence Curve | F1-Confidence Curve | Training Results | Validation Output | Example Prediction |
---|---|---|---|---|---|---|---|---|
YOLOv8 Nano | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
YOLOv8 Extra-Large | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
YOLOv8 Nano with SAHI | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Dataset
Repository Structure
. ├── assets/ │ ├── algae/ │ │ ├── closterium.jpg │ │ ├── microcystis.jpg │ │ ├── nitzschia.jpg │ │ ├── non-algae.jpg │ │ └── oscillatoria.jpg │ ├── diagrams/ │ │ ├── drawio/ │ │ │ ├── Camera_uml.drawio │ │ │ ├── dataset_flowchart.drawio │ │ │ ├── esp32_sys_design.drawio │ │ │ └── streaming_uml.drawio │ │ ├── dataset_flowchart.png │ │ ├── detection_uml.png │ │ ├── esp32_sys_des.png │ │ ├── iphone_sys_des.png │ │ ├── saft_framework.png │ │ ├── sahi_framework.png │ │ ├── streaming_uml.png │ │ └── yolov8_architecture.jpg │ ├── esp32/ │ │ ├── ai_thinker.jpg │ │ ├── ap_popup.png │ │ ├── board_port.png │ │ ├── build_upload_monitor.png │ │ ├── choose_ap.png │ │ ├── config.png │ │ ├── disconnect.png │ │ ├── esp32_ip.png │ │ ├── index.png │ │ ├── init_config.png │ │ ├── open_streaming.png │ │ └── platformio_folder.png │ ├── misc/ │ │ ├── demo.gif │ │ ├── iphone_ui_connect.png │ │ ├── microscope.jpg │ │ └── user_interface.png │ └── models/ │ ├── custom_yolov8n/ │ │ ├── confusion_matrix_normalized.png │ │ ├── confusion_matrix.png │ │ ├── example.jpg │ │ ├── F1_curve.png │ │ ├── P_curve.png │ │ ├── PR_curve.png │ │ ├── R_curve.png │ │ ├── results.png │ │ ├── val_label.jpg │ │ ├── val_pred.jpg │ │ └── validation.png │ ├── custom_yolov8x/ │ │ ├── confusion_matrix_normalized.png │ │ ├── confusion_matrix.png │ │ ├── example.jpg │ │ ├── F1_curve.png │ │ ├── P_curve.png │ │ ├── PR_curve.png │ │ ├── R_curve.png │ │ ├── results.png │ │ └── validation.png │ └── sahi_yolov8n/ │ ├── confusion_matrix_normalized.png │ ├── confusion_matrix.png │ ├── example.jpg │ ├── F1_curve.png │ ├── P_curve.png │ ├── PR_curve.png │ ├── R_curve.png │ ├── results.png │ └── validation.png ├── docs/ │ ├── appendix.md │ ├── CONTRIBUTING.md │ ├── manual.md │ ├── README.md │ └── test_samples.pdf ├── src/ │ ├── detection/ │ │ ├── camera.py │ │ └── esp32.py │ └── streaming/ │ ├── boards/ │ │ ├── esp32cam_ai_thinker.json │ │ ├── esp32cam_espressif_esp_eye.json │ │ ├── esp32cam_espressif_esp32s2_cam_board.json │ │ ├── esp32cam_espressif_esp32s2_cam_header.json │ │ ├── esp32cam_espressif_esp32s3_cam_lcd.json │ │ ├── esp32cam_espressif_esp32s3_eye.json │ │ ├── esp32cam_freenove_s3_wroom_n8r8.json │ │ ├── esp32cam_freenove_wrover_kit.json │ │ ├── esp32cam_m5stack_camera_psram.json │ │ ├── esp32cam_m5stack_camera.json │ │ ├── esp32cam_m5stack_esp32cam.json │ │ ├── esp32cam_m5stack_unitcam.json │ │ ├── esp32cam_m5stack_unitcams3.json │ │ ├── esp32cam_m5stack_wide.json │ │ ├── esp32cam_seeed_xiao_esp32s3_sense.json │ │ ├── esp32cam_ttgo_t_camera.json │ │ └── esp32cam_ttgo_t_journal.json │ ├── html/ │ │ └── index.min.html │ ├── include/ │ │ ├── format_duration.h │ │ ├── format_number.h │ │ ├── lookup_camera_effect.h │ │ ├── lookup_camera_frame_size.h │ │ ├── lookup_camera_gainceiling.h │ │ ├── lookup_camera_wb_mode.h │ │ └── settings.h │ ├── lib/ │ │ └── rtsp_server/ │ │ ├── library.json │ │ ├── rtsp_server.cpp │ │ └── rtsp_server.h │ ├── src/ │ │ └── main.cpp │ └── platformio.ini ├── weights/ │ └── custom_yolov8n.pt ├── .gitattributes ├── .gitignore ├── environment.yml └── LICENSE.md
Due to its modular, generalizable design, this project can be easily adapted and used to detect any and as many object(s) of your choosing (i.e., it's not limited to harmful algae).
Depending on how you want to use this program, you may forgo any or all of the unchecked requirements in README.md
(remember, Conda is the only hard requirement). Additional information can be found in manual.md
.
If you want to use your own dataset (i.e. images of the object(s) you want your custom model to detect):
- Follow instructions for creating new, custom object detection models
- Save/download the resulting model once finished
- Follow instructions to use the model with camera(s) for real-time detection
Refer to this section if you want to use an ESP32-CAM as camera.
Requirements (alongside Conda)
- Any of these boards
- Micro-USB cable (to connect board to computer)
- PlatformIO plugin for Visual Studio Code
Instructions
Sample usage of inference
library in camera.py
:
from cv2 import imshow
from cv2.typing import MatLike
from inference import get_model
from supervision import Detections, BoundingBoxAnnotator, LabelAnnotator
API_KEY = "ROBOFLOW_API_KEY"
PROJECT_NAME = "algae-detection-1opyx"
VERSION = 22
def _process_frame(self, frame: MatLike) -> None:
"""
1. Run inference (Roboflow detection model) on frame.
2. Annotate frame with result(s).
3. Show annotated frame in new window.
Args:
frame (MatLike): Frame to run inference on.
"""
# Annotators
label, bbox = LabelAnnotator(), BoundingBoxAnnotator()
# Load model via Roboflow
model = get_model(model_id = f"{PROJECT_NAME}/{VERSION}", api_key = API_KEY)
# Run inference on (i.e. process) frame
for result in model.infer(frame):
# Get detected object(s)
detection = Detections.from_inference(result)
# Annotate the frame with its result, then show in window
imshow(self._args.title, label.annotate(scene = bbox.annotate(frame, detection), detections = detection))
See 'Deploy custom model' section in the Colab notebook used to train the model for further details.
-
Increase dataset and improve model versatility by taking quality images of various types of algae
- At least 1000 images per class
- All classes are balanced (i.e., have roughly the same amount of images)
- Dr. Schonna R. Manning and/or Mr. Q may [or may not] be able to help with categorizing any algae in new images
-
Increase model accuracy
-
Connect to ESP32 without a server (e.g., via USB, etc.) OR use RTSP instead of HTTP
- Attempted — but unable — to use RTSP
- See this GitHub Issue for further details
- Use Roboflow Inference with video, webcam, or RTSP stream
-
Improve + optimize model for inference on ESP32 (aka edge device instead of computer); TFLite Edge TPU?
- Convert to a C binary
- Use standard tools to store it in a read-only program memory on device for TF Lite
- Use DeepSea library for PyTorch
- Use TF Lite Micro API's C++ Library to run inference
- Convert to a C binary
-
Heatsink for ESP32 to prevent overheating
-
Update microscope's 3D printed lens attachment by making it adjustable AND/OR create multiple ones for different devices, e.g., iPhone, Android, etc.
-
Add camera settings to UI (C++ instead of Python for OpenCV?)
-
Add Android compatibility (if applicable and/or necessary)
-
Write cross-platform script to automate ESP32 setup
-
Use
roboflow.js
to integrate project + streaming (which has its own web UI)?- Realtime on-device inference available via
roboflow.js
- This will load your model to run realtime inference directly in your users' web-browser using WebGL instead of passing images to the server-side
- Realtime on-device inference available via
-
Save streaming URL after entering it once in CLI?
-
If calling model via Roboflow API, incorporate auth (API key or login creds/token?)
-
Add option / args for running model locally (i.e., without internet aka default) vs hosted API (i.e., with internet)
-
Active learning to improve model performance?
-
Add tutorial for using PlatformIO's CLI (i.e.
clio
) instead of just PlatformIO VS Code extension (shell script?)
- Cyanobacteria (Blue-Green Algae)
- Cyanobacteria of the 2016 Lake Okeechobee and Okeechobee Waterway Harmful Algal Bloom
- Computer Vision Based Deep Learning Approach for the Detection and Classification of Algae Species Using Microscopic Images
- Research "toxic cyanobacteria"
- Access Point (AP): Networking device that allows wireless-capable devices to connect to a WLAN; in this case, it provides WiFi to ESP32
- Algae: Group of mostly aquatic, photosynthetic, and nucleus-bearing organisms that lack many features of larger multicellular plants
- Anaconda: Open-source platform for managing and installing various Python packages
- Artificial Intelligence (AI): Simulation of human intelligence in machines that can perform tasks like problem-solving, decision-making, learning, etc.
- Closterium: Type of algae identified by their elongated or crescent shape
- Computer Vision (CV): Field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos
- Confusion Matrix: Visualizes model performance (i.e., number of correct and incorrect predictions per class), where the x-axis is the true value and y-axis is the model's predicted value; diagonal elements represent the number of points for which the predicted label is equal to the true label (higher diagonal values are better since it indicates many correct predictions), off-diagonal elements are those mislabeled by the model (lower off-diagonal elements are better since it indicates lack of incorrect predictions)
- Convolutional Neural Network (CNN): Type of DNN specifically designed for image recognition and processing
- Deep Neural Network (DNN): ML method inspired by the human brain's neural structure that can recognize complex patterns in data (e.g., pictures, text, sounds, etc.) to produce accurate insights and predictions
- Epoch: One complete iteration of the entire training dataset through the ML algorithm
- ESP32: Series of low-cost, low-power system-on-chip microcontrollers with integrated WiFi and Bluetooth capabilities
- Espressif: Manufacturer of ESP32 microcontrollers
- Fine-Tuning: Process that takes a model (architecture + weights) already trained for one given task and tunes/tweaks the model to make it perform a second similar task
- Google Colab: Hosted Jupyter Notebook service that provides free and paid access to computing resources, including GPUs and TPUs, and requires no setup to use
- Graphics Processing Unit (GPU): Specialized electronic circuit that can perform mathematical calculations at high speed; useful for training AI and DNNs
- Inference: Process of using a trained ML model to make predictions, classifications, and/or detections on new data
- Local Access Network (LAN): Group of connected computing devices within a limited area (usually sharing a centralized Internet connection) that can communicate and share resources amongst each other
- Machine Learning (ML): Subfield of AI that involves training computer systems to learn from data and make decisions or predictions without being explicitly programmed
- Microcystis: Very toxic genus of cyanobacteria which look like clusters of small dots and is known for forming harmful algal blooms in bodies of water
- Motion JPEG (MJPEG): Video compression format where each frame of a digital video sequence is compressed separately as a JPEG image
- Nitzschia: Type of thin, elongated algae that can cause harmful algal blooms
- Normalize: Within the context of confusion matrices, it means the matrix elements are displayed as a percentage
- Oscillatoria: Genus of filamentous cyanobacteria that forms blue-green algal blooms
- PlatformIO: Cross-platform, cross-architecture, multi-framework tool for embedded system engineers and software engineers who write embedded applications
- Python: High-level programming language widely used for data analysis and ML
- PyTorch: ML library used for various applications, including CV
- Red Tide: Event which occurs on Florida’s coastline where algae grows uncontrollably
- Roboflow: CV developer framework for better data collection, dataset preprocessing, dataset augmentation, model training techniques, model deployment, and more
- Slicing Aided Fine Tuning (SAFT): Novel approach that augments the fine-tuning dataset by dividing images into overlapping patches, thus providing a more balanced representation of small objects and overcoming the bias towards larger objects in the original pre-training datasets
- Slicing Aided Hyper Inference (SAHI): Common method of improving the detection accuracy of small objects, which involves running inference over portions of an image then accumulating the results
- System-on-Chip (SoC): Integrated circuit that compresses all of a(n) computer/electronic system's required components onto one piece of silicon
- Tensor Processing Unit (TPU): Google’s application-specific integrated circuit (ASIC) used to accelerate ML workloads; useful for training AI and DNNs
- Ultralytics: Company that aims to make AI model development accessible, efficient to train, and easy to deploy
- Weights: Numbers associated with the connections between neurons/nodes across different layers of a DNN
- Wireless Local Area Network (WLAN): Computer network that links two or more devices using wireless communication to form a LAN
- You Only Look Once (YOLO): High performance, real-time object detection and image segmentation model developed by Ultralytics