This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, cat detection, and image classification on ESP32 devices.
I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work.
FaceDetector: Detects faces in images and provides bounding boxes and facial featuresFaceRecognizer: Recognizes enrolled faces and manages a face databaseHumanDetector: Detects people in images and provides bounding boxesCatDetector: Detects cats in images and provides bounding boxesImageNet: Classifies images into predefined categoriesCocoDetector: Detects objects in images using COCO dataset categories
- ESP-IDF:
- Version 5.4.2 with MicroPython >=1.26.0
- Make sure you have the complete ESP32 build environment set up
You can find precompiled images in two ways:
- In the Actions section for passed workflows under artifacts
- By forking the repo and manually starting the action
- Clone the required repositories:
git clone --recursive https://github.com/cnadler86/mp_esp_dl_models.git
git clone https://github.com/cnadler86/micropython-camera-API.git
git clone https://github.com/cnadler86/mp_jpeg.git- Build the firmware: There are two ways to enable the different models:
a) Using mpconfigvariant files (recommended): The models can be enabled in the board's mpconfigvariant files (e.g., mpconfigvariant_FLASH_16M.cmake). The following flags are available:
- MP_DL_FACE_DETECTOR_ENABLED
- MP_DL_FACE_RECOGNITION_ENABLED
- MP_DL_PEDESTRIAN_DETECTOR_ENABLED
- MP_DL_IMAGENET_CLS_ENABLED
- MP_DL_COCO_DETECTOR_ENABLED
- MP_DL_CAT_DETECTOR_ENABLED
b) Using command line flags: You can enable models directly through the idf.py command using -D flags:
idf.py -D MP_DL_FACE_RECOGNITION_ENABLED=1 -D MP_DL_CAT_DETECTOR_ENABLED=1 [other flags...]Basic build command:
cd mp_esp_dl_models/boards/
idf.py -D MICROPY_DIR=<micropython-dir> -D MICROPY_BOARD=<BOARD_NAME> -D MICROPY_BOARD_VARIANT=<BOARD_VARIANT> -B build-<your-build-name> build
cd build-<your-build-name>
python ~/micropython/ports/esp32/makeimg.py sdkconfig bootloader/bootloader.bin partition_table/partition-table.bin micropython.bin firmware.bin micropython.uf2All models support various input pixel formats including RGB888 (default), RGB565, and others supported by ESP-DL. You can use mp_jpeg to decode camera images to the correct format.
The pixel format can be set through the constructor's pixel_format parameter. This value matches the ESP-DL image format definitions.
espdl.RGB888(default)espdl.RGB565espdl.GRAYSCALE
The FaceDetector module detects faces in images and can optionally provide facial feature points.
FaceDetector(width=320, height=240, pixel_format=espdl.RGB888, features=True)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240pixel_format(int, optional): Input image pixel format. Default: espdl.RGB888features(bool, optional): Whether to return facial feature points. Default: True
-
run(framebuffer)
Detects faces in the provided image.
Parameters:
framebuffer: image data (required)
Returns: List of dictionaries with detection results, each containing:
score: Detection confidence (float)box: Bounding box coordinates [x1, y1, x2, y2]features: Facial feature points [(x,y) coordinates for: left eye, right eye, nose, left mouth, right mouth] if enabled, None otherwise
The FaceRecognizer module manages a database of faces and can recognize previously enrolled faces.
FaceRecognizer(width=320, height=240, pixel_format=espdl.RGB888, features=True, db_path="face.db", model=None)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240pixel_format(int, optional): Input image pixel format. Default: espdl.RGB888features(bool, optional): Whether to return facial feature points. Default: Truedb_path(str, optional): Path to the face database file. Default: "face.db"model(str, optional): Feature extraction model to use ("MBF" or "MFN"). Default: None (uses default model)
-
run(framebuffer)
Detects and recognizes faces in the provided image.
Parameters:
framebuffer: image data (required)
Returns: List of dictionaries with recognition results, each containing:
score: Detection confidencebox: Bounding box coordinates [x1, y1, x2, y2]features: Facial feature points (if enabled)person: Recognition result containing:id: Face IDsimilarity: Match confidence (0-1)name: Person name (if provided during enrollment)
-
enroll(framebuffer, validate=False, name=None)
Enrolls a new face in the database.
Parameters:
framebuffer: image datavalidate(bool, optional): Check if face is already enrolled. Default: Falsename(str, optional): Name to associate with the face. Default: None
Returns:
- ID of the enrolled face
-
delete_face(id)
Deletes a face from the database.
Parameters:
id(int): ID of the face to delete
-
print_database()
Prints the contents of the face database.
The HumanDetector module detects people in images. The CatDetector does it for cats. Both modules provide bounding boxes for detected objects.
HumanDetector(width=320, height=240, pixel_format=espdl.RGB888) #For cats use CatDetectorParameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240pixel_format(int, optional): Input image pixel format. Default: espdl.RGB888
-
run(framebuffer)
Detects people in the provided image.
Parameters:
framebuffer: image data
Returns: List of dictionaries with detection results, each containing:
score: Detection confidencebox: Bounding box coordinates [x1, y1, x2, y2]
The ImageNet module classifies images into predefined categories.
ImageNet(width=320, height=240, pixel_format=espdl.RGB888)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240pixel_format(int, optional): Input image pixel format. Default: espdl.RGB888
-
run(framebuffer)
Classifies the provided image.
Parameters:
framebuffer: image data
Returns: List alternating between class names and confidence scores:
[class1, score1, class2, score2, ...]
The COCO detect module detects objects in images using the COCO dataset.
COCODetector(width=320, height=240, pixel_format=espdl.RGB888, model=CONFIG_DEFAULT_COCO_DETECT_MODEL)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240pixel_format(int, optional): Input image pixel format. Default: espdl.RGB888model(int, optional): COCO detection model to use. Default: CONFIG_DEFAULT_COCO_DETECT_MODEL
-
run(framebuffer)
Detects objects in the provided image.
Parameters:
framebuffer: image data
Returns: List of dictionaries with detection results, each containing:
score: Detection confidencebox: Bounding box coordinates [x1, y1, x2, y2]category: Detected object class id
from espdl import FaceDetector
import camera
from jpeg import Decoder
# Initialize components
cam = camera.Camera()
decoder = Decoder(pixel_format="RGB888")
face_detector = FaceDetector()
# Capture and process image
img = cam.capture()
framebuffer = decoder.decode(img) # Convert to RGB888
results = face_detector.run(framebuffer)
if results:
for face in results:
print(f"Face detected with confidence: {face['score']}")
print(f"Bounding box: {face['box']}")
if face['features']:
print(f"Facial features: {face['features']}")from espdl import FaceRecognizer
import camera
from jpeg import Decoder
# Initialize components
cam = camera.Camera()
decoder = Decoder(pixel_format="RGB888")
recognizer = FaceRecognizer(db_path="/faces.db")
# Enroll a face
img = cam.capture()
framebuffer = decoder.decode(img)
face_id = recognizer.enroll(framebuffer, name="John")
print(f"Enrolled face with ID: {face_id}")
# Later, recognize faces
img = cam.capture()
framebuffer = decoder.decode(img)
results = recognizer.run(framebuffer)
if results:
for face in results:
if face['person']:
print(f"Recognized {face['person']['name']} (ID: {face['person']['id']})")
print(f"Similarity: {face['person']['similarity']}")The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.
| Frame Size | FaceDetector | HumanDetector |
|---|---|---|
| QQVGA | 14.5 | 6.6 |
| R128x128 | 21 | 6.6 |
| QCIF | 19.7 | 6.5 |
| HQVGA | 18 | 6.3 |
| R240X240 | 16.7 | 6.1 |
| QVGA | 15.2 | 6.6 |
| CIF | 13 | 5.5 |
| HVGA | 11.9 | 5.3 |
| VGA | 8.2 | 4.4 |
| SVGA | 6.2 | 3.8 |
| XGA | 4.1 | 2.8 |
| HD | 3.6 | 2.6 |
-
Image Format: Always ensure input images are in the right format. Use mp_jpeg for JPEG decoding from camera.
-
Memory Management:
- Close/delete detector objects when no longer needed
- Consider memory constraints when choosing image dimensions
-
Face Recognition:
- Enroll faces in good lighting conditions
- Multiple enrollments of the same person can improve recognition
- Use
validate=Trueduring enrollment to avoid duplicates
-
Storage:
- Face database is persistent across reboots
- Consider backing up the face database file