Automated Lip Reading system using Deep Learning based transformer model

Project Code :TCMAPY2255

Objective

The objective of this project is to develop an automated lip-reading system that uses deep learning techniques to enhance the accuracy of visual speech recognition (VSR). The system aims to address current challenges in lip-reading, particularly in noisy environments and for individuals with hearing impairments. By utilizing Convolutional Neural Networks (CNN) for feature extraction and Long Short-Term Memory (LSTM) networks for sequential learning, the project focuses on improving the model's capability to understand and predict spoken words based on lip movements. The goal is to create a robust, efficient, and real-time solution for automated visual speech recognition in various applications.

Abstract

Lip reading classification has gained considerable attention in recent years due to its potential in enhancing communication, particularly in noisy environments and for individuals with hearing impairments. This field of study is crucial for visual speech recognition (VSR) as it bridges the gap between verbal communication and visual input. Despite significant progress, challenges such as efficient feature extraction and model capability still persist, limiting the performance of existing systems. This paper proposes an automated lip reading system using a deep learning-based transformer model to address these issues. The system leverages Convolutional Neural Networks (CNN) for extracting spatial features from lip movements and Long Short-Term Memory (LSTM) networks for sequential pattern recognition, enabling the system to understand and predict spoken words from visual cues accurately. The combination of CNN and LSTM allows for better feature extraction and improves the model’s ability to recognize dynamic visual speech patterns, providing a significant advancement in visual speech recognition systems.

Keywords: Lip Reading, Deep Learning, Transformer Model, Visual Speech Recognition, Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Feature Extraction, Hearing Impairments, Communication Enhancement.

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.