Impacto do Ajuste Fino na Redução de Dimensionalidade para Reconhecimento Multimodal de Emoções na Fala
Resumo
O objetivo desse trabalho é avaliar o impacto do ajuste fino na redução da dimensionalidade do embedding de sentença MiniLM L3, para a tarefa de reconhecimento dimensional de emoções na fala, através de uma abordagem bimodal que combina informações acústicas e textuais. O ajuste fino resultou em um aumento de 3x no Coeficiente de Correlação de Concordância para a dimensão de valência.Referências
Atmaja, B. and Akagi, M. (2020). Dimensional speech emotion recognition from speech features and word embeddings by using multitask learning. APSIPA Transactions on Signal and Information Processing, 9.
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., and Narayanan, S. S. (2008). IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335–359.
de Lope, J. and Graña, M. (2023). An ongoing review of speech emotion recognition. Neurocomputing, 528:1–11.
Ekman, P. (1999). Basic emotions. In Dalgleish, T. and Powers, M. J., editors, Handbook of Cognition and Emotion, pages 4–5. Wiley.
Geetha, A., Mala, T., Priyanka, D., and Uma, E. (2024). Multimodal emotion recognition with deep learning: Advancements, challenges, and future directions. Information Fusion, 105.
Guder, L., Aires, J., Meneguzzi, F., and Griebler, D. (2024). Dimensional Speech Emotion Recognition from Bimodal Features. In Anais do XXIV Simpósio Brasileiro de Computação Aplicada à Saúde, pages 579–590, Porto Alegre, RS, Brasil. SBC.
Lieskovská, E., Jakubec, M., Jarina, R., and Chmulík, M. (2021). A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10.
Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament. Current Psychology, 14:261–292.
Russell, J. (1980). A circumplex model of affect. Journal of personality and social psychology, 39:1161–1178.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment tree-bank. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S., editors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., and Narayanan, S. S. (2008). IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335–359.
de Lope, J. and Graña, M. (2023). An ongoing review of speech emotion recognition. Neurocomputing, 528:1–11.
Ekman, P. (1999). Basic emotions. In Dalgleish, T. and Powers, M. J., editors, Handbook of Cognition and Emotion, pages 4–5. Wiley.
Geetha, A., Mala, T., Priyanka, D., and Uma, E. (2024). Multimodal emotion recognition with deep learning: Advancements, challenges, and future directions. Information Fusion, 105.
Guder, L., Aires, J., Meneguzzi, F., and Griebler, D. (2024). Dimensional Speech Emotion Recognition from Bimodal Features. In Anais do XXIV Simpósio Brasileiro de Computação Aplicada à Saúde, pages 579–590, Porto Alegre, RS, Brasil. SBC.
Lieskovská, E., Jakubec, M., Jarina, R., and Chmulík, M. (2021). A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10.
Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament. Current Psychology, 14:261–292.
Russell, J. (1980). A circumplex model of affect. Journal of personality and social psychology, 39:1161–1178.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment tree-bank. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S., editors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
Publicado
12/11/2025
Como Citar
GUDER, Larissa; GRIEBLER, Dalvan; MENEGUZZI, Felipe.
Impacto do Ajuste Fino na Redução de Dimensionalidade para Reconhecimento Multimodal de Emoções na Fala. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 61-64.
DOI: https://doi.org/10.5753/eramiars.2025.16644.