- STEP 1 — Load & Display an Image
- Libraries Used
- Key Concepts Explanation
- Recap & Next Steps
- STEP 2 — Detect Faces with MediaPipe
- STEP 3 — Load a Pre-trained Emotion Recognition Model
- STEP 4 — Predict Emotions from Images
This script demonstrates how to load an image using OpenCV and display it using Matplotlib.

import cv2 # BGR
import matplotlib.pyplot as plt # RGB
# Load an image (replace with your photo path) [OpenCV]
image_path = "/home/im_ane/AI_emotion_recognition/data/test_images/ana.jpg"
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # Convert to RGB for display
# Display the image [displaying]
plt.imshow(image) # show the image
plt.axis('off')
plt.show() # showing the window-
What it is: OpenCV (Open Source Computer Vision Library) — a powerful library for real-time computer vision and image processing.
-
Why we use it: Load, manipulate, and process images (e.g., reading an image file, converting color spaces, detecting faces).
-
Key functions:
cv2.imread(): Reads an image from a file.cv2.cvtColor(): Converts an image from one color space to another (e.g., BGR → RGB).
cv2is the Python module name that provides access to OpenCV.- Historically
cv→cv2as OpenCV evolved; todaycv2is standard. - Use
import cv2to access OpenCV functions.
-
What it is: Matplotlib is a plotting library.
pyplotprovides a MATLAB-like plotting interface. -
Why we use it: To display images and visualize results during development.
-
Key functions:
plt.imshow(): Display an image.plt.axis('off'): Hide axis ticks and labels for a clean view.plt.show(): Render the image window.
- Purpose: Load an image from the specified path.
- Color space: OpenCV reads images in BGR by default.
- Purpose: Convert image from BGR → RGB.
- Why: Matplotlib and most visualization tools expect RGB; without conversion colors appear swapped (blue/red inverted).
- A constant (color conversion code) used by
cv2.cvtColor()to specify the conversion type.
-
cv2.cvtColor(image, cv2.COLOR_BGR2RGB)takes:- The input image (
image). - The conversion code (
cv2.COLOR_BGR2RGB).
- The input image (
Example
image = cv2.imread("path/to/image.jpg") # BGR
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # RGB- Displays the image using Matplotlib.
- Hides axis labels and ticks for a cleaner display.
- Renders the image window — without this the image may not appear.
What the code does
- Loads an image from a specified path using OpenCV.
- Converts the image from BGR → RGB for correct display.
- Displays the image with Matplotlib.
Next steps
- STEP this script with your own images.
- Move on to face detection using MediaPipe.
- Implement emotion recognition with a pre-trained model.
import cv2
import matplotlib.pyplot as plt
import mediapipe as mp
# Load the image
image_path = "/home/im_ane/AI_emotion_recognition/data/test_images/ana.jpg" # Replace with your image path
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Initialize MediaPipe Face Detection
mp_face_detection = mp.solutions.face_detection # calling the tool that we want to use from MediaPipe since MediaPipe contains many tools
face_detection = mp_face_detection.FaceDetection(min_detection_confidence=0.5) # create an object using the class FaceDetection
# Detect faces
results = face_detection.process(image_rgb) # start processing the image to detect faces using the object we created
# Draw face detections
if results.detections: # the list of detected faces
for detection in results.detections:
mp.solutions.drawing_utils.draw_detection(image_rgb, results.detections) # draw each detected face
# If you are sure that you only have one face in your image, do this:
# mp.solutions.drawing_utils.draw_detection(image_rgb, results.detections[0])
# Display the image with face detections
plt.imshow(image_rgb)
plt.axis('off')
plt.title("Face Detection")
plt.show()-
mp_face_detection = mp.solutions.face_detection- Access the face detection module from MediaPipe.
mp.solutionsorganizes MediaPipe tools (face detection, hand tracking, etc.).
- Access the face detection module from MediaPipe.
-
face_detection = mp_face_detection.FaceDetection(min_detection_confidence=0.5)- Create a face detection object.
min_detection_confidence=0.5sets the minimum confidence threshold (50%) to consider a detection valid.
- Create a face detection object.
-
results = face_detection.process(image_rgb)- Run face detection on the RGB image.
process()returns a results object containing detected faces (or an empty list if none).
- Run face detection on the RGB image.
-
if results.detections:- Check whether any faces were detected.
results.detectionsis a list of detection objects.
- Check whether any faces were detected.
-
for detection in results.detections:- Loop through each detected face (there may be multiple).
-
mp.solutions.drawing_utils.draw_detection(image_rgb, detection)- Use MediaPipe's
drawing_utilsto draw bounding boxes on the image for visualization.
- Use MediaPipe's
If we don’t create the face_detection object, we can’t run face detection. If we don’t call .process(), we won’t get any results. If we don’t check if results.detections:, we might try to draw boxes when there are no faces, causing an error. If we don’t loop through results.detections, we’ll only draw a box around the first face (if there are multiple faces). If we don’t use drawing_utils, we won’t see the boxes around the faces.
from tensorflow.keras.models import load_model
# Load the pre-trained model
model = load_model("../models/emotion_model.h5")
print("Model loaded successfully!")- TensorFlow includes the Keras API as
tf.keras. Importingload_modelfromtensorflow.keras.modelsensures you are using TensorFlow's integrated, optimized Keras implementation. - If you
import tensorflow as tf, you would usetf.keras.models.load_model(). Explicit import keeps code cleaner and focuses on what you need.
- Loads a complete model architecture + weights + training configuration from an HDF5 (
.h5) or SavedModel file. - Reconstructs the model exactly as it was when saved, enabling inference with
model.predict().
- TensorFlow is a comprehensive machine learning framework for building and training neural networks.
- Keras was originally a standalone high-level neural networks API, but was later integrated into TensorFlow as
tf.keras. - When you use
tensorflow.keras, you're using TensorFlow's optimized Keras API.
from tensorflow.keras.models import load_model- This is the recommended way to import Keras when using TensorFlow 2.x.
- Ensures you're using TensorFlow's integrated and optimized implementation of Keras.
- The hierarchy is:
TensorFlow → Keras API → models module → load_model function
If you only did:
import tensorflow as tfThen you would need to use:
tf.keras.models.load_model()- Makes code cleaner and easier to read.
- Follows Python's best practice: import only what you need.
- Avoids long nested namespaces everywhere.
Keras is a high-level neural networks API that:
- Provides a user-friendly interface for building and training models
- Acts as a front-end for TensorFlow
- Simplifies deep learning with intuitive APIs
- Modular: Build models by stacking configurable blocks
- User-friendly: Designed for fast experimentation
- Extensible: Easy to add custom layers, losses, etc.
The models module in Keras contains:
Sequentialclass: For linear stacks of layersModelclass: For complex architectures using the Functional APIload_modelfunction: Load a saved modelsave_modelfunction: Save a model
Essentially, this module provides all tools related to creating, saving, and loading models.
model = load_model("../models/emotion_model.h5")- The model architecture (layers and connections)
- The trained weights
- The training configuration (optimizer, loss, metrics)
- The state of the optimizer (if training was interrupted)
- Avoid retraining from scratch
- Use pre-trained models for inference
- Restore models exactly as they were when saved
An HDF5 model file contains:
- The entire model architecture
- All trained weights
- Training configuration
- Model state for resuming training
This makes .h5 files extremely useful for saving complete Keras models.
When you run:
model = load_model("../models/emotion_model.h5")
print("Model loaded successfully!")- The
.h5file is read - The computational graph is reconstructed
- All weights are loaded into memory
- Optimizer state is restored (if included)
- A complete Keras model object
- All layers and trained weights
- Ready-to-use inference functions like
model.predict()
- Confirms the model was loaded without errors
- Helps with debugging
- Signals that inference can start
Keras itself does not include many pre-trained models, but it does provide:
- Sequential and Functional APIs
- Common layers like
Dense,Conv2D,LSTM - Loss functions, optimizers, and metrics
- TensorFlow Hub
- Keras Applications (e.g., VGG, ResNet, MobileNet)
- Research paper implementations
- Your own trained models (like
emotion_model.h5)
- Sequential models: Simple linear stacks
- Functional API models: Complex, multi-branch architectures
- Subclassed models: Fully custom models using Python class inheritance
Why Use TensorFlow and Keras?
TensorFlow:
What it is: An open-source machine learning framework developed by Google for building and training neural networks. Why we use it: TensorFlow provides a comprehensive ecosystem for machine learning, including tools for building models, training them, and deploying them in production. What it provides: TensorFlow is the backbone—it provides the low-level infrastructure for building, training, and running machine learning models. This includes:
Core libraries for defining and executing computational graphs. GPU/CPU optimization and hardware acceleration. Tools for distributed training and deployment.
Analogy: Think of TensorFlow as the engine of a car. It handles all the complex mechanics under the hood.
Keras:
What it is: A high-level neural networks API, written in Python and capable of running on top of TensorFlow. Why we use it: Keras simplifies the process of building and training deep learning models. It provides a user-friendly interface that makes it easier to define models, add layers, and compile models for training.
What it provides: Keras is a high-level API built on top of TensorFlow. It provides user-friendly tools to:
Define, train, and evaluate neural networks with minimal code. Load and save models (like your emotion_model.h5). Preprocess data and make predictions easily.
Analogy: Keras is like the steering wheel and dashboard of the car. It makes driving (using machine learning) much easier without needing to understand the engine's inner workings.
Pre-trained Models:
Does Keras contain pre-trained models?: Keras itself does not come with pre-trained models for specific tasks like emotion recognition. However, it provides tools to load and use pre-trained models. Where to find pre-trained models?: You can find pre-trained models on platforms like GitHub, Kaggle, or TensorFlow Hub. For emotion recognition, you might need to download a pre-trained model file (like emotion_model.h5) from one of these sources or train your own model.
After loading the model (STEP 3), typical steps:
- Preprocess images (resize, normalize, convert to grayscale if required).
- Pass images through the model with
model.predict(). - Interpret output probabilities and map them to emotion labels (e.g., happy, sad, etc.).
The model is the "brain"; MediaPipe is the "eyes" that find faces.
import cv2
import numpy as np
from tensorflow.keras.models import load_model
# Load the model
model = load_model("../models/emotion_model.h5")
# Load and preprocess the image
image_path = "../data/test_images/your_photo.jpg" # Replace with your image path
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) # Load as grayscale
image = cv2.resize(image, (48, 48)) # Resize to 48x48 (common input size for emotion models)
image = np.expand_dims(image, axis=0) # Add batch dimension
image = np.expand_dims(image, axis=-1) # Add channel dimension
# Predict emotion
emotion_prediction = model.predict(image)
emotion_label = np.argmax(emotion_prediction)
emotion_labels = ["Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"]
print(f"Predicted Emotion: {emotion_labels[emotion_label]}")import cv2
import numpy as np
from tensorflow.keras.models import load_model
# Load the model
model = load_model("/home/im_ane/AI_emotion_recognition/models/emotion_model.h5")
print("✅ Model loaded successfully!")
# Load and preprocess the image
image_path = "/home/im_ane/AI_emotion_recognition/data/test_images/ana.jpg"
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
if image is None:
print("❌ Error: Could not load image!")
exit()
print(f"📐 Original image size: {image.shape}")
# Resize to 64x64 (what the model expects)
image = cv2.resize(image, (64, 64))
print(f"📐 Resized image size: {image.shape}")
# Normalize pixel values
image = image.astype('float32') / 255.0
# Add dimensions for model input
image = np.expand_dims(image, axis=0) # Add batch dimension
image = np.expand_dims(image, axis=-1) # Add channel dimension
print(f"🎯 Final input shape: {image.shape}")
# Predict emotion
print("🧠 Making prediction...")
emotion_prediction = model.predict(image)
emotion_label = np.argmax(emotion_prediction)
confidence = np.max(emotion_prediction)
emotion_labels = ["Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"]
print("\n🎭 EMOTION RECOGNITION RESULT:")
print("=" * 40)
print(f"✅ Predicted Emotion: {emotion_labels[emotion_label]}")
print(f"📊 Confidence: {confidence:.2%}")
print("\n📈 All probabilities:")
for i, emotion in enumerate(emotion_labels):
prob = emotion_prediction[0][i]
print(f" {emotion:9}: {prob:.4f} ({prob:.1%})")
```
What does .h5 mean?
.h5 files are Hierarchical Data Format files - they're used to store:
Model architecture (layer configurations)
Model weights (learned parameters)
Training configuration (optimizer, loss function)
Optimizer state (for continuing training)
Think of it as a complete saved model package that you can load and use immediately without retraining.
---
## Notes & Tips
* Verify your model input shape (use `model.summary()` to check). Common formats:
* `48×48 grayscale` with shape `(1, 48, 48, 1)` (FER2013-style models).
* `224×224 RGB` for transfer-learning models (e.g., MobileNet).
* Normalize pixel values if the model expects `0–1` inputs: `image = image / 255.0`.
* Use MediaPipe to crop face regions first, then pass the face patch to the emotion model.
---
## Final Recap
This README covers:
* Loading and displaying images (OpenCV + Matplotlib).
* Using MediaPipe to detect faces.
* Loading a saved Keras model.
* An end-to-end predict example for emotion classification.
## STEP 5 : using open cv + haarcascade_frontalface to draw a rectangle on the face

```python
import cv2
import os
# Load cascade with full path
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
# Check if cascade loaded
if face_cascade.empty():
print("Error loading cascade classifier")
exit()
# Check if image exists
if not os.path.exists('/home/im_ane/AI_emotion_recognition/data/test_images/ana.jpg'):
print("Image file not found!")
exit()
# Read image
img = cv2.imread('/home/im_ane/AI_emotion_recognition/data/test_images/ana.jpg')
if img is None:
print("Error reading image")
exit()
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
print(f"Found {len(faces)} face(s)")
# Draw rectangles
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
# Display
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()import cv2
# Initialize webcam (index 0)
cap = cv2.VideoCapture(0)
# Check if camera opened successfully
if not cap.isOpened():
print("Error: Could not open camera.")
exit()
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to grab frame.")
break
cv2.imshow("Webcam Feed", frame)
# Exit on 'q' key press
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Cleanup
cap.release()
cv2.destroyAllWindows()Detailed Explanation:
import cv2:
Imports the OpenCV (Open Source Computer Vision Library) module, which provides tools for real-time computer vision.
cap = cv2.VideoCapture(0):
Creates a video capture object to access the webcam. 0 is the default camera index. If you have multiple cameras, you can try 1, 2, etc.
if not cap.isOpened()::
Checks if the camera opened successfully. If not, it prints an error message and exits.
while True::
Starts an infinite loop to continuously capture and display frames.
ret, frame = cap.read():
cap.read() reads a frame from the video capture. ret is a boolean indicating if the frame was read successfully. frame contains the actual image data (a NumPy array).
if not ret::
Checks if the frame was read successfully. If not, it prints an error message and breaks the loop.
cv2.imshow("Webcam Feed", frame):
Displays the frame in a window titled "Webcam Feed".
if cv2.waitKey(1) & 0xFF == ord('q')::
cv2.waitKey(1) waits for 1 millisecond for a keyboard event. & 0xFF is a bitwise operation to handle different OS key representations. ord('q') gets the ASCII value of 'q'. If 'q' is pressed, the loop breaks and the program exits.
cap.release():
Releases the video capture object and frees resources.
cv2.destroyAllWindows():
Closes all OpenCV windows.
# Initialize MediaPipe Face Detection
mp_face_detection = mp.solutions.face_detection
face_detection = mp_face_detection.FaceDetection(min_detection_confidence=0.5) # Process the frame to detect faces
results = face_detection.process(rgb_frame)DeepFace is a lightweight face recognition and facial attribute analysis framework for Python. It's built on top of popular deep learning frameworks like TensorFlow and Keras, and it uses OpenCV for image processing. Key Features:
Face Detection: Uses MTCNN, OpenCV's Haar cascades, Dlib, or SSD to detect faces in images. Face Recognition: Uses FaceNet, VGGFace, OpenFace, or DeepID to recognize faces. Facial Attribute Analysis: Can detect emotions, age, gender, and racial ethnicity. Easy-to-Use API: Provides simple functions for complex tasks. How DeepFace Works Internally:
Face Detection:
Uses OpenCV's Haar cascades by default (but can use others). Detects faces in an image and extracts face regions.
Face Alignment:
Aligns detected faces to a standard format.
Feature Extraction:
Uses deep learning models to extract facial features.
Analysis:
For emotion analysis, it uses a pre-trained CNN model. The model outputs probabilities for different emotions.
Does DeepFace Contain OpenCV and TensorFlow? Yes, DeepFace is built on top of:
OpenCV: For image processing and face detection. TensorFlow/Keras: For deep learning models. Other libraries: Such as NumPy, Pandas, etc. When you install DeepFace, it automatically installs these dependencies.
Use DeepFace if:
You want a quick and easy solution. You don't need real-time performance. You want multiple features (emotion, age, gender) in one package.
Use MediaPipe + TensorFlow if:
You need real-time performance. You want more control over the process. You're building a custom solution or need to integrate with other systems.
TensorFlow adds emotion recognition capabilities to the face detection pipeline:
1-Model Loading:Loads a pre-trained neural network model for emotion classification
model = load_model("/home/im_ane/AI_emotion_recognition/models/emotion_model.h5")2-Emotion Labels:Defines the possible emotion categories
emotion_labels = ["Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"]3-Face Preprocessing:
->Converts the face region to grayscale
->Resizes to match the model's expected input (64x64)
->Normalizes pixel values to [0, 1] range
face_roi_gray = cv2.cvtColor(face_roi, cv2.COLOR_BGR2GRAY)
face_roi_processed = cv2.resize(face_roi_gray, (64, 64))
face_roi_processed = np.expand_dims(face_roi_processed, axis=0)
face_roi_processed = np.expand_dims(face_roi_processed, axis=-1)
face_roi_processed = face_roi_processed.astype('float32') / 255.04-Emotion Prediction:
->Uses the model to predict emotion probabilities
->Gets the most likely emotion and its confidence score
emotion_prediction = model.predict(face_roi_processed)
emotion_label = np.argmax(emotion_prediction)
confidence = emotion_prediction[0][emotion_label]5-Visualization:
Displays the predicted emotion and confidence on the video frame
cv2.putText(frame, f"{emotion_text} ({confidence:.2f})", (x, y - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)Haar Cascades are object detection algorithms used to identify faces in images or video. They work by:
1-Training: Using positive (face) and negative (non-face) images to create a cascade of classifiers.
2-Detection: Sliding a window across the image and applying the cascade of classifiers to detect faces.

-
Capture video frame
-
Convert to grayscale
-
Haar Cascade detects faces For each face:
-
Extract face region
-
Preprocess (resize, normalize)
-
H5 Model predicts emotion
-
Draw rectangle and emotion label




