Multimodal Audio Emotion Recognition with Graph-based Consensus Pseudolabeling

  • Gabriel Natal Coutinho Universidade de São Paulo
  • Artur de Vlieger Lima Universidade de São Paulo
  • Juliano Yugoshi Universidade de São Paulo
  • Marcelo Isaias de Moraes Junior Universidade de São Paulo
  • Marcos Paulo Silva Gôlo Universidade de São Paulo
  • Ricardo Marcondes Marcacini Universidade de São Paulo


This paper presents a novel method called Multimodal Graph-based Consensus Pseudolabeling (MGCP) for unsupervised emotion recognition in audio. The goal is to determine the emotion of audio segments using the circumplex model of emotions. The method combines pre-trained unimodal models for audio and text and follows a three-step process. First, audio segments are represented using embeddings from unimodal models. Then, modality-specific graphs are constructed based on similarity and integrated into a multimodal graph. Finally, pseudolabels are generated by measuring consensus between modalities, and a graph regularization framework is introduced to estimate the final emotion coordinates. Experimental evaluation shows the effectiveness of the MGCP method, surpassing both unimodal and traditional multimodal models, enabling audio emotion recognition without labeled data specific to the target domain.

Palavras-chave: Audio Emotion Recognition, Pseudolabeling, Graph Learning


COUTINHO, Gabriel Natal; LIMA, Artur de Vlieger; YUGOSHI, Juliano; MORAES JUNIOR, Marcelo Isaias de; GÔLO, Marcos Paulo Silva; MARCACINI, Ricardo Marcondes. Multimodal Audio Emotion Recognition with Graph-based Consensus Pseudolabeling. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG.