Song Emotion Recognition: A Study of the State of the Art

  • Arthur Nicholas dos Santos Universidade Estadual de Campinas
  • Karen Gissell Rosero Jácome Universidade Estadual de Campinas
  • Bruno Sanches Masiero Universidade Estadual de Campinas


Music is art, and art is a form of expression. Often, when a song is composed or performed, there may be an intent by the singer/songwriter of expressing some feeling or emotion through it, and, by the time the music gets in touch with an audience, a spectrum of emotional reactions can be provoked. For humans, matching the intended emotion in a musical composition or performance with the subjective perceptiveness of different listeners can be quite challenging, in account that this process is highly intertwined with people’s life experiences and cognitive capacities. Fortunately, the machine learning approach for this problem is simpler. Usually, it takes a data-set, from which features are extracted to present this data to a model, that will train to predict the highest probability of an input matching a target. In this paper, we studied the most common features and models used in recent publications to tackle music emotion recognition, revealing which ones are best suited for songs (particularly acapella).

Palavras-chave: Artificial Intelligence, A-Life and Evolutionary Music Systems, Digital Sound Processing, Music Analysis and Synthesis, Music Expressiveness, Music Information Retrieval, Music Perception, Psychoacoustics, and Cognition, Music, Society, and Technology


Youngmoo E Kim, Erik M Schmidt, Raymond Migneco, Brandon G Morton, Patrick Richardson, Jeffrey Scott, Jacquelin A Speck, and Douglas Turnbull. Music emotion recognition: A state of the art review. In Proc. ismir, volume 86, pages 937–952, 2010.

Renato Panda, Ricardo Manuel Malheiro, and Rui Pedro Paiva. Audio features for music emotion recognition: a survey. IEEE Transactions on Affective Computing, pages 1–1, 2020.

Beatriz Flamia Azevedo and Glaucia Bressan. A comparison of classifiers for musical genres classification and music emotion recognition. pages 241–262, January 2018.

Zijing Gao, Lichen Qiu, Peng Qi, and Yan Sun. A novel music emotion recognition model for scratch-generated music. In 2020 International Wireless Communications and Mobile Computing (IWCMC), pages 1794–1799, June 2020.

Laugs Casper. Creating a speech and music emotion recognition system for mixed source audio. Master’s thesis, August 2020.

Mladen Russo, Luka Kraljević, Maja Stella, and Marjan Sikora. Cochleogram-based approach for detecting perceived emotions in music. Information Processing & Management, 57(5):102270, September 2020.

Wooyeon Kim. Musemo: Express musical emotion based on neural network. Master’s thesis, February 2020.

Ana Gabriela Pandrea, Juan Sebastián Gómez-Cañón, and Perfecto Herrera. Cross-Dataset Music Emotion Recognition: an End-to-End Approach, 2020.

Pengfei Du, Xiaoyong Li, and Yali Gao. Dynamic music emotion recognition based on cnn-bilstm. In 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), pages 1372–1376, 2020.

Yesid Ospitia Medina, José Ramón Beltrán Blázquez, and Sandra Baldassarri. Emotional classification of music using neural networks with the mediaeval dataset. Personal and Ubiquitous Computing, April 2020.

Sangeetha Rajesh and N J Nalini. Musical instrument emotion recognition using deep recurrent neural network. Procedia Computer Science, 167:16–25, 2020. International Conference on Computational Intelligence and Data Science.

Stuart Cunningham, Harrison Ridley, Jonathan Weinel, and Richard Picking. Supervised machine learning for audio emotion recognition. Personal and Ubiquitous Computing, April 2020.

Steven R. Livingstone and Frank A. Russo. The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE, 13(5), 2018.

Stuart Cunningham, Jonathan Weinel, and Richard Picking. High-level analysis of audio features for identifying emotional valence in human singing. In In: Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion, pages 1–4, September 2018.

Ashima Yadav and Dinesh Kumar Vishwakarma. A multilingual framework of cnn and bi-lstm for emotion classification. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pages 1–6, 2020.

Bagus Tris Atmaja and Masato Akagi. On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers. In 2020 IEEE REGION 10 CONFERENCE (TENCON), pages 968–972, 2020.
SANTOS, Arthur Nicholas dos; ROSERO JÁCOME, Karen Gissell; MASIERO, Bruno Sanches. Song Emotion Recognition: A Study of the State of the Art. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO MUSICAL (SBCM), 18. , 2021, Recife. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 209-212. DOI: