LipBuddy - Lip Reading Application

Status: Ongoing Project

Introduction

LipBuddy is an advanced lip-reading application built upon the LipNet deep learning model. The primary objective of this project is to translate visual lip movements into text, enhancing accessibility for individuals with hearing impairments and offering potential applications in various fields like security, silent communication, and more.

This project is still in progress, and the current implementation is a full-stack application that takes video input, processes it using deep learning, and outputs the predicted text.

Features

Video Input Processing: Converts input videos to grayscale frames, focusing on lip movements.
Deep Learning Integration: Utilizes a custom-built deep neural network for processing lip movements.
Real-time Predictions: Provides real-time predictions of spoken words from lip movements.
Streamlit Integration: The user interface is built with Streamlit, offering a smooth and interactive experience.

Technical Overview

Data Loading: The application loads and processes video files, extracting frames and applying transformations to prepare the data for the neural network.
Neural Network Architecture:
- Convolutional Layers: Three 3D convolutional layers to capture spatio-temporal features.
- Bidirectional LSTM: Two Bidirectional LSTM layers to capture the temporal dependencies in the lip movements.
- Dense Layer: The output layer generates predictions for each time step, corresponding to the characters in the vocabulary.
Training Process:
- Custom loss function using Connectionist Temporal Classification (CTC) to handle variable-length sequences.
- Learning rate scheduler and checkpoints to ensure efficient training.
Inference and Prediction: The trained model is used to predict the spoken text from new video inputs. The predictions are decoded and displayed in the application interface.

Streamlit Application

The user interface allows users to:

Select a video file for processing.
View the raw video and the processed frames seen by the model.
Obtain the predicted text from the lip movements in the video.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
_pycache_		_pycache_
LipNet (2).ipynb		LipNet (2).ipynb
README.md		README.md
animation.gif		animation.gif
modelutil.py		modelutil.py
streamlitapp.py		streamlitapp.py
test_video.mp4		test_video.mp4
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LipBuddy - Lip Reading Application

Introduction

Features

Technical Overview

Streamlit Application

About

Uh oh!

Releases

Packages

Languages

Vivvek09/LipReader

Folders and files

Latest commit

History

Repository files navigation

LipBuddy - Lip Reading Application

Introduction

Features

Technical Overview

Streamlit Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages