This repository demonstrates real-time facial landmark detection using MediaPipe Face Mesh. It allows processing:
- Static images
- Video files
- Live webcam feed
The project uses MediaPipe, a Google library for cross-platform, real-time face, hand, and body tracking.
Link : Official MediaPipe Face Mesh documentation
├── README.md
├──requirements.txt
├── facemesh_processor.py # Shared processor class for face landmarks
├── image_process.py # Script to process a single image
├── video_process.py # Script to process a video file
├── webcam_process.py # Script to process webcam feed
├── extractLandmarks.py # Landmarks Extractor from face
├── landmarks2txt.py # Landmarks → txt
├── exemple_image.jpg/ # Example input image
└── exemple_video.mp4/ # Example input videoEnsure Python 3.7+ is installed. Install the dependencies:
pip install opencv-python mediapipe The FaceMeshProcessor class contains the main logic:
-
Converts images to RGB format (required by MediaPipe).
-
Detects facial landmarks using
mp.solutions.face_mesh. -
Draws landmarks (tessellation, contours, and irises) on images.
-
Returns processed images for display or saving.
-
Reads a single image.
-
Uses FaceMeshProcessor to detect and draw landmarks.
-
Saves the annotated image.
python image_process.py
-
Reads video frame by frame.
-
Uses FaceMeshProcessor for each frame.
-
Writes processed frames to a new video file.
python video_process.py
-
Captures live feed from the webcam.
-
Processes each frame in real-time.
-
Displays annotated frames.
python webcam_process.py
Purpose:
-
Extract 3D facial landmarks from images using Mediapipe Face Mesh.
-
Provides a convenient class FaceLandmarkExtractor that returns the landmarks in a flattened array format.
-
Can be used for single images or video frames.
Key Functions:
class FaceLandmarkExtractor:
def __init__(...):
# Initialize Mediapipe Face Mesh
pass
def extract(self, image, normalize=True):
# Extract facial landmarks from an image
# Returns a flat 1D list: [x0, y0, z0, x1, y1, z1, ...]
# If normalize=True, coordinates are shifted to start from 0
passHow to use:
from extractLandmarks import FaceLandmarkExtractor
import cv2
extractor = FaceLandmarkExtractor()
image = cv2.imread("happy_face.jpg")
landmarks = extractor.extract(image) # landmarks is a list of 1404 floats (468 points x 3)→ Landmarks can now be used as input features for an ML model.
Purpose:
-
Loops through a dataset of images organized by emotion folders.
-
Uses FaceLandmarkExtractor to extract landmarks for all images.
-
Adds the emotion label to the end of each landmark vector.
-
Saves the result as a text file
data.txtfor training ML models.
Expected folder structure:
data/
happy/
img1.jpg
img2.jpg
sad/
img1.jpg
img2.jpg- Each row in
data.txthas 1405 values → 1404 landmarks + 1 label.
-
Collect dataset of faces organized by emotion.
-
Run
landmarks2txt.py→ generatesdata.txt. -
Load
data.txt→ features = landmarks, labels = emotion. -
Train ML model (Dense NN or other) on this data.
For live prediction:
Capture frame → extract landmarks → feed to model → get emotion.
-
Supported image formats:
.jpg.png -
Supported video formats:
.mp4.avi
Make sure you replace input.jpg and input.mp4 with your own files.
