Identificação de sentimento em voz por meio da combinação de classificações intermediárias dos sinais em excitação, valência e quadrante

Guilherme B.S Gering; Patrick M. Ciarelli; Evandro O. T.  Salles

doi:10.5753/sbcas.2019.6250

Guilherme B.S Gering UFES
Patrick M. Ciarelli UFES
Evandro O. T. Salles UFES

DOI: https://doi.org/10.5753/sbcas.2019.6250

Resumo

A identificação de sentimento em voz é comumente realizada em clas- ses categóricas como “tristeza” ou “alegria”. De acordo com o mapa de afeto de Russell, sentimentos também podem ser classificados por excitação, valência e quadrantes. Neste trabalho é proposto um método para incrementar o desem- penho de identificação de sentimentos em classes categóricas utilizando clas- sificadores que realizam classificação intermediária nas classes de excitação valência e quadrantes usando uma abordagem multi-visão. Para combinar es- ses resultados e obter a classificação final é proposta uma árvore de decisão que aumentou o desempenho F1 de 0,73 do Ensemble de três tipos de classificadores para 0,87 sobre uma base de dados pública.

Referências

Badshah, A. M., Ahmad, J., Rahim, N., and Baik, S. W. (2017). Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. 2017 International Conference on Platform Technology and Service (PlatCon), pages 1–5. http://dx.doi.org/10.1109/PlatCon.2017.7883728

Bestelmeyer, P. E., Kotz, S. A., and Belin, P. (2017). Effects of emotional valence and arousal on the voice perception network. Social Cognitive and Affective Neuroscience, 12(8):1351–1358. http://dx.doi.org/10.1093/scan/nsx059

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., and Weiss, B. (2005). A Database of German Emotional Speech. Interspeech, (January):1517–1520.

Fayek, H. M., Lech, M., and Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92:60–68. http://dx.doi.org/10.1016/j.neunet.2017.02.013

Gadhe, R. P., Nilofer, S., Waghmare, V. B., Shrishrimal, P. P., and Deshmukh, R. R. (2015). Emotion Recognition from Speech: A Survey. International Journal of Scientific & Engineering Research, 6(4):632–635.

Lee, C. C., Mower, E., Busso, C., Lee, S., and Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9-10):1162–1171.

Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234(December 2016):11–26.

Livingstone, S. R. and Russo, F. A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english, volume 13. http://dx.doi.org/10.1371/journal.pone.0196391

MAO, Q., WANG, X., and ZHAN, Y. (2010). Speech Emotion Recognition Method Based on Improved Decision Tree and Layered Feature Selection. International Journal of Humanoid Robotics, 07(02):245–261. http://dx.doi.org/10.1142/S0219843610002088

Parthasarathy, S. and Busso, C. (2017). Jointly predicting arousal, valence and dominance with multi-Task learning. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017-Augus:1103–1107. http://dx.doi.org/10.21437/Interspeech.2017-1494

Pathak, S. and Kolhe, V. (2016). A Survey on Emotion Recognition from Speech Signal. International Journal of Advanced Research in Computer and Communication Engineering, 5(7):447–450.

Reddy, A. P. and Vijayarajan, V. (2017). Extraction of Emotions from Speech - A Survey. International Journal of Applied Engineering Research ISSN, 12(16):973–4562.

Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178. http://dx.doi.org/10.1037/h0077714

Shen, P., Changjun, Z., and Chen, X. (2011). Automatic Speech Emotion Recognition using Support Vector Machine. Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 International Conference, 2:621–625. http://dx.doi.org/10.1109/EMEIT.2011.6023178

Shih, P.-Y., Chen, C.-P., and Wu, C.-H. (2017). SPEECH EMOTION RECOGNITION WITH ENSEMBLE LEARNING METHODS Po-Yuan. pages 2756–2760. http://dx.doi.org/10.1109/ICASSP.2017.7952658

Tuarob, S., Tucker, C. S., Salathe, M., and Ram, N. (2014). An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics, 49:255–268. http://dx.doi.org/10.1016/j.jbi.2014.03.005

Xia, R. and Liu, Y. (2017). A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space. IEEE Transactions on Affective Computing, 8(1):3–14. http://dx.doi.org/10.1109/TAFFC.2015.2512598

Zhang, S., Zhang, S., Huang, T., and Gao, W. (2018). Convolutional Neural Network and Discriminant. 20(6):1576–1590.

Zhao, J., Mao, X., and Chen, L. (2019). Speech emotion recognition using deep 1D-2D CNN LSTM networks. Biomedical Signal Processing and Control, 47:312–323. http://dx.doi.org/10.1016/j.bspc.2018.08.035