Identification of feeling in voice by means of combination of intermediate classifications of the signals in excitation, valence and quadrant

  • Guilherme B.S Gering UFES
  • Patrick M. Ciarelli UFES
  • Evandro O. T. Salles UFES

Abstract


Speech emotion recogntion is commonly performed in categorical classes, such as “sadness” or “joy”. According to Russell’s map of affection, emotions can also be classified by arousal (excitation), valence, and quadrants. In this work is proposed a method to increase the performance of speech emo- tion recogntion in categorical classes using classifiers that perform intermediate classification in the classes of valence, excitation and quadrants using a multi- view approach. To combine these results and obtain the final classification, a decision tree is proposed and that increases F1 metrics from 0.73 by Ensemble of three kinds of classifiers to 0.87 in a public database.

References

Badshah, A. M., Ahmad, J., Rahim, N., and Baik, S. W. (2017). Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. 2017 International Conference on Platform Technology and Service (PlatCon), pages 1–5. http://dx.doi.org/10.1109/PlatCon.2017.7883728

Bestelmeyer, P. E., Kotz, S. A., and Belin, P. (2017). Effects of emotional valence and arousal on the voice perception network. Social Cognitive and Affective Neuroscience, 12(8):1351–1358. http://dx.doi.org/10.1093/scan/nsx059

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., and Weiss, B. (2005). A Database of German Emotional Speech. Interspeech, (January):1517–1520.

Fayek, H. M., Lech, M., and Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92:60–68. http://dx.doi.org/10.1016/j.neunet.2017.02.013

Gadhe, R. P., Nilofer, S., Waghmare, V. B., Shrishrimal, P. P., and Deshmukh, R. R. (2015). Emotion Recognition from Speech: A Survey. International Journal of Scientific & Engineering Research, 6(4):632–635.

Lee, C. C., Mower, E., Busso, C., Lee, S., and Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9-10):1162–1171.

Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234(December 2016):11–26.

Livingstone, S. R. and Russo, F. A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english, volume 13. http://dx.doi.org/10.1371/journal.pone.0196391

MAO, Q., WANG, X., and ZHAN, Y. (2010). Speech Emotion Recognition Method Based on Improved Decision Tree and Layered Feature Selection. International Journal of Humanoid Robotics, 07(02):245–261. http://dx.doi.org/10.1142/S0219843610002088

Parthasarathy, S. and Busso, C. (2017). Jointly predicting arousal, valence and dominance with multi-Task learning. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017-Augus:1103–1107. http://dx.doi.org/10.21437/Interspeech.2017-1494

Pathak, S. and Kolhe, V. (2016). A Survey on Emotion Recognition from Speech Signal. International Journal of Advanced Research in Computer and Communication Engineering, 5(7):447–450.

Reddy, A. P. and Vijayarajan, V. (2017). Extraction of Emotions from Speech - A Survey. International Journal of Applied Engineering Research ISSN, 12(16):973–4562.

Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178. http://dx.doi.org/10.1037/h0077714

Shen, P., Changjun, Z., and Chen, X. (2011). Automatic Speech Emotion Recognition using Support Vector Machine. Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 International Conference, 2:621–625. http://dx.doi.org/10.1109/EMEIT.2011.6023178

Shih, P.-Y., Chen, C.-P., and Wu, C.-H. (2017). SPEECH EMOTION RECOGNITION WITH ENSEMBLE LEARNING METHODS Po-Yuan. pages 2756–2760. http://dx.doi.org/10.1109/ICASSP.2017.7952658

Tuarob, S., Tucker, C. S., Salathe, M., and Ram, N. (2014). An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics, 49:255–268. http://dx.doi.org/10.1016/j.jbi.2014.03.005

Xia, R. and Liu, Y. (2017). A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space. IEEE Transactions on Affective Computing, 8(1):3–14. http://dx.doi.org/10.1109/TAFFC.2015.2512598

Zhang, S., Zhang, S., Huang, T., and Gao, W. (2018). Convolutional Neural Network and Discriminant. 20(6):1576–1590.

Zhao, J., Mao, X., and Chen, L. (2019). Speech emotion recognition using deep 1D-2D CNN LSTM networks. Biomedical Signal Processing and Control, 47:312–323. http://dx.doi.org/10.1016/j.bspc.2018.08.035
Published
2019-06-11
GERING, Guilherme B.S; CIARELLI, Patrick M.; SALLES, Evandro O. T. . Identification of feeling in voice by means of combination of intermediate classifications of the signals in excitation, valence and quadrant. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 19. , 2019, Niterói. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 152-163. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2019.6250.