Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network
- aSECOMUCI Research Group, Escuela de Ingenierías Industrial e Informática, Universidad de León, Campus de Vegazana s/n, C.P. 24071 León, Spain
- bSALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, Campus of Vegazana s/n, León, 24071 León, Spain
- cArtificial Intelligence Department, Xeridia S.L., Av. Padre Isla 16, 24002 León, Spain
Received 2 April 2021, Revised 14 May 2021, Accepted 28 June 2021, Available online 8 July 2021, Version of Record 8 July 2021.
Highlights
- A new method based on fully convolutional neural network has been proposed for speech emotion recognition.
- Our proposal can process variable input lengths enabling near real time sentiment analysis.
- Mel-frequency cepstral coefficients makes it easier to identify emotions in audio signals.
- Fully convolutional neural networks have shown better performance than other machine learning models considering EmoDB, Ravdess and TESS data sets.
Abstract
In this work, a sentiment analysis method that is capable of accepting audio of any length, without being fixed a priori, is proposed. Mel spectrogram and Mel Frequency Cepstral Coefficients are used as audio description methods and a Fully Convolutional Neural Network architecture is proposed as a classifier. The results have been validated using three well known datasets: EMODB, RAVDESS and TESS. The results obtained were promising, outperforming the state-of–the-art methods. Also, thanks to the fact that the proposed method admits audios of any size, it allows a sentiment analysis to be made in near real time, which is very interesting for a wide range of fields such as call centers, medical consultations or financial brokers.