The objective of this project is to design a robust Speech Emotion Recognition (SER) system that accurately detects emotions in speech using raw audio data while incorporating speaker gender information to enhance recognition accuracy. This system aims to overcome the limitations of traditional SER models, which depend heavily on pre-selected acoustic features and often overlook subtle emotional cues. By utilizing a Residual Convolutional Neural Network (R-CNN), the model will directly extract meaningful emotional patterns from the raw speech signal, reducing the need for manual feature selection and capturing nuanced emotional expressions.
ABSTRACT
Recent advancements in speech emotion recognition (SER) have primarily centered on effective feature selection from acoustic data. This study introduces a novel SER algorithm that leverages raw speech data combined with gender information to enhance recognition accuracy, eliminating the need for manually selected acoustic features. Our approach integrates a Residual Convolutional Neural Network (R-CNN) model to detect emotions directly from raw speech signals and a Random Forest classifier to determine speaker gender. The R-CNN model processes the raw audio, extracting emotional cues for accurate classification without relying on pre-selected acoustic features, thus capturing subtle emotion-driven nuances that traditional methods may overlook. Simultaneously, the Random Forest classifier processes speech data to identify the speakerβs gender, providing contextual information that strengthens the emotion recognition process. Evaluated across three public datasets in multiple languages, the proposed model demonstrates a notable improvement in accuracy and interpretability by leveraging both emotion and gender information. This approach highlights the benefits of a dual-model framework that combines deep learning and ensemble methods, pushing the boundaries of affective computing through a more holistic understanding of speech data.
Keywords: Affective Computing, Speech Emotion Recognition, Gender Classifier, Deep Learning, Interpretability, Random Forest, Residual CNN.
NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.
