Identification of feeling in voice by means of combination of intermediate classifications of the signals in excitation, valence and quadrant
Abstract
Speech emotion recogntion is commonly performed in categorical classes, such as “sadness” or “joy”. According to Russell’s map of affection, emotions can also be classified by arousal (excitation), valence, and quadrants. In this work is proposed a method to increase the performance of speech emo- tion recogntion in categorical classes using classifiers that perform intermediate classification in the classes of valence, excitation and quadrants using a multi- view approach. To combine these results and obtain the final classification, a decision tree is proposed and that increases F1 metrics from 0.73 by Ensemble of three kinds of classifiers to 0.87 in a public database.
References
Bestelmeyer, P. E., Kotz, S. A., and Belin, P. (2017). Effects of emotional valence and arousal on the voice perception network. Social Cognitive and Affective Neuroscience, 12(8):1351–1358. http://dx.doi.org/10.1093/scan/nsx059
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., and Weiss, B. (2005). A Database of German Emotional Speech. Interspeech, (January):1517–1520.
Fayek, H. M., Lech, M., and Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92:60–68. http://dx.doi.org/10.1016/j.neunet.2017.02.013
Gadhe, R. P., Nilofer, S., Waghmare, V. B., Shrishrimal, P. P., and Deshmukh, R. R. (2015). Emotion Recognition from Speech: A Survey. International Journal of Scientific & Engineering Research, 6(4):632–635.
Lee, C. C., Mower, E., Busso, C., Lee, S., and Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9-10):1162–1171.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234(December 2016):11–26.
Livingstone, S. R. and Russo, F. A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english, volume 13. http://dx.doi.org/10.1371/journal.pone.0196391
MAO, Q., WANG, X., and ZHAN, Y. (2010). Speech Emotion Recognition Method Based on Improved Decision Tree and Layered Feature Selection. International Journal of Humanoid Robotics, 07(02):245–261. http://dx.doi.org/10.1142/S0219843610002088
Parthasarathy, S. and Busso, C. (2017). Jointly predicting arousal, valence and dominance with multi-Task learning. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017-Augus:1103–1107. http://dx.doi.org/10.21437/Interspeech.2017-1494
Pathak, S. and Kolhe, V. (2016). A Survey on Emotion Recognition from Speech Signal. International Journal of Advanced Research in Computer and Communication Engineering, 5(7):447–450.
Reddy, A. P. and Vijayarajan, V. (2017). Extraction of Emotions from Speech - A Survey. International Journal of Applied Engineering Research ISSN, 12(16):973–4562.
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178. http://dx.doi.org/10.1037/h0077714
Shen, P., Changjun, Z., and Chen, X. (2011). Automatic Speech Emotion Recognition using Support Vector Machine. Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 International Conference, 2:621–625. http://dx.doi.org/10.1109/EMEIT.2011.6023178
Shih, P.-Y., Chen, C.-P., and Wu, C.-H. (2017). SPEECH EMOTION RECOGNITION WITH ENSEMBLE LEARNING METHODS Po-Yuan. pages 2756–2760. http://dx.doi.org/10.1109/ICASSP.2017.7952658
Tuarob, S., Tucker, C. S., Salathe, M., and Ram, N. (2014). An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics, 49:255–268. http://dx.doi.org/10.1016/j.jbi.2014.03.005
Xia, R. and Liu, Y. (2017). A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space. IEEE Transactions on Affective Computing, 8(1):3–14. http://dx.doi.org/10.1109/TAFFC.2015.2512598
Zhang, S., Zhang, S., Huang, T., and Gao, W. (2018). Convolutional Neural Network and Discriminant. 20(6):1576–1590.
Zhao, J., Mao, X., and Chen, L. (2019). Speech emotion recognition using deep 1D-2D CNN LSTM networks. Biomedical Signal Processing and Control, 47:312–323. http://dx.doi.org/10.1016/j.bspc.2018.08.035
