سامانه پژوهشی دانشگاه ملایر | Enhancing speech emotion recognition: a deep learning approach with self‐attention and acoustic features

عنوان	Enhancing speech emotion recognition: a deep learning approach with self‐attention and acoustic features
نوع پژوهش	مقاله چاپ شده
کلیدواژه‌ها	Speech emotion recognition · MFCC · Mel-spectrogram · Deep learning · Self-attention mechanism
چکیده	Speech emotion recognition (SER), which involves detecting and classifying emo-tions from speech signals, plays a crucial role in human–computer interaction. However, challenges such as variability in emotional expression and limited labeled data have hindered progress in this area. To address these issues, we propose a novel deep learning framework that combines multiple acoustic features, including MFCCs, Mel-spectrograms, and temporal-frequency domain features. Our model leverages three parallel CNN-LSTM branches for sequential feature extraction, followed by a self-attention mechanism to integrate the extracted representations. A final LSTM layer, along with dense layers, refines the classification process. This innovative fusion of features and attention mechanisms significantly enhances emotion recognition performance. Experimental evaluations demonstrate the effectiveness of our approach in improving classification accuracy.
پژوهشگران	مه بانو زهره وندی (نفر دوم)، خدیجه آقاجانی (نفر اول)