Identificação de sentimento em voz por meio da combinação de classificações intermediárias dos sinais em excitação, valência e quadrante
Resumo
A identificação de sentimento em voz é comumente realizada em clas- ses categóricas como “tristeza” ou “alegria”. De acordo com o mapa de afeto de Russell, sentimentos também podem ser classificados por excitação, valência e quadrantes. Neste trabalho é proposto um método para incrementar o desem- penho de identificação de sentimentos em classes categóricas utilizando clas- sificadores que realizam classificação intermediária nas classes de excitação valência e quadrantes usando uma abordagem multi-visão. Para combinar es- ses resultados e obter a classificação final é proposta uma árvore de decisão que aumentou o desempenho F1 de 0,73 do Ensemble de três tipos de classificadores para 0,87 sobre uma base de dados pública.
Referências
Bestelmeyer, P. E., Kotz, S. A., and Belin, P. (2017). Effects of emotional valence and arousal on the voice perception network. Social Cognitive and Affective Neuroscience, 12(8):1351–1358. http://dx.doi.org/10.1093/scan/nsx059
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., and Weiss, B. (2005). A Database of German Emotional Speech. Interspeech, (January):1517–1520.
Fayek, H. M., Lech, M., and Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92:60–68. http://dx.doi.org/10.1016/j.neunet.2017.02.013
Gadhe, R. P., Nilofer, S., Waghmare, V. B., Shrishrimal, P. P., and Deshmukh, R. R. (2015). Emotion Recognition from Speech: A Survey. International Journal of Scientific & Engineering Research, 6(4):632–635.
Lee, C. C., Mower, E., Busso, C., Lee, S., and Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9-10):1162–1171.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234(December 2016):11–26.
Livingstone, S. R. and Russo, F. A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english, volume 13. http://dx.doi.org/10.1371/journal.pone.0196391
MAO, Q., WANG, X., and ZHAN, Y. (2010). Speech Emotion Recognition Method Based on Improved Decision Tree and Layered Feature Selection. International Journal of Humanoid Robotics, 07(02):245–261. http://dx.doi.org/10.1142/S0219843610002088
Parthasarathy, S. and Busso, C. (2017). Jointly predicting arousal, valence and dominance with multi-Task learning. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017-Augus:1103–1107. http://dx.doi.org/10.21437/Interspeech.2017-1494
Pathak, S. and Kolhe, V. (2016). A Survey on Emotion Recognition from Speech Signal. International Journal of Advanced Research in Computer and Communication Engineering, 5(7):447–450.
Reddy, A. P. and Vijayarajan, V. (2017). Extraction of Emotions from Speech - A Survey. International Journal of Applied Engineering Research ISSN, 12(16):973–4562.
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178. http://dx.doi.org/10.1037/h0077714
Shen, P., Changjun, Z., and Chen, X. (2011). Automatic Speech Emotion Recognition using Support Vector Machine. Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 International Conference, 2:621–625. http://dx.doi.org/10.1109/EMEIT.2011.6023178
Shih, P.-Y., Chen, C.-P., and Wu, C.-H. (2017). SPEECH EMOTION RECOGNITION WITH ENSEMBLE LEARNING METHODS Po-Yuan. pages 2756–2760. http://dx.doi.org/10.1109/ICASSP.2017.7952658
Tuarob, S., Tucker, C. S., Salathe, M., and Ram, N. (2014). An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics, 49:255–268. http://dx.doi.org/10.1016/j.jbi.2014.03.005
Xia, R. and Liu, Y. (2017). A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space. IEEE Transactions on Affective Computing, 8(1):3–14. http://dx.doi.org/10.1109/TAFFC.2015.2512598
Zhang, S., Zhang, S., Huang, T., and Gao, W. (2018). Convolutional Neural Network and Discriminant. 20(6):1576–1590.
Zhao, J., Mao, X., and Chen, L. (2019). Speech emotion recognition using deep 1D-2D CNN LSTM networks. Biomedical Signal Processing and Control, 47:312–323. http://dx.doi.org/10.1016/j.bspc.2018.08.035