Enhancing speech emotion recognition: a deep learning approach with self‐attention and acoustic features

Research

Title	Enhancing speech emotion recognition: a deep learning approach with self‐attention and acoustic features
Type	JournalPaper
Keywords	Speech emotion recognition · MFCC · Mel-spectrogram · Deep learning · Self-attention mechanism
Year	2025
Journal	The Journal of Supercomputing
DOI
Researchers	Khadijeh Aghajani ، mahbanoo zohrevandi

Abstract

Speech emotion recognition (SER), which involves detecting and classifying emo-tions from speech signals, plays a crucial role in human–computer interaction. However, challenges such as variability in emotional expression and limited labeled data have hindered progress in this area. To address these issues, we propose a novel deep learning framework that combines multiple acoustic features, including MFCCs, Mel-spectrograms, and temporal-frequency domain features. Our model leverages three parallel CNN-LSTM branches for sequential feature extraction, followed by a self-attention mechanism to integrate the extracted representations. A final LSTM layer, along with dense layers, refines the classification process. This innovative fusion of features and attention mechanisms significantly enhances emotion recognition performance. Experimental evaluations demonstrate the effectiveness of our approach in improving classification accuracy.

mahbanoo zohrevandi

Research

Abstract