Investigating the Emergent Reasoning Abilities of Large Language Models in Sentiment Analysis of Codified Music
Resumo
The perception of positive or negative sentiment in a piece of music is influenced by musical features such as tempo, notes, chords, and rhythms, collectively shaping its emotional character. This paper examines the ability of Large Language Models (LLMs) to account for these features and infer sentiment from symbolically encoded music, exploring both zero-shot and fine-tuned approaches. Our results show that while LLMs exhibit some capability in processing symbolic representation of musical elements, their ability to associate music with sentiment reliably remains limited. The models struggle to align their predictions with human-assigned labels, with accuracy hovering around 0.6. These results suggest that current text-based approaches may not fully capture the complex interplay between musical structure and emotional expression.Referências
Abdin, M., Aneja, J., Awadalla, H., Awadallah, A., Awan, A. A., Bach, N., Bahree, A., Bakhtiari, A., Bao, J., Behl, H., Benhaim, A., Bilenko, M., Bjorck, J., Bubeck, S., Cai, M., Cai, Q., Chaudhary, V., Chen, D., Chen, D., Chen, W., Chen, Y.-C., Chen, Y.-L., Cheng, H., Chopra, P., Dai, X., Dixon, M., Eldan, R., Fragoso, V., Gao, J., Gao, M., Gao, M., Garg, A., Giorno, A. D., Goswami, A., Gunasekar, S., Haider, E., Hao, J., Hewett, R. J., Hu, W., Huynh, J., Iter, D., and et al., S. A. J. (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219.
Alves, C. F., Mozart, T. G., and Kowada, L. A. B. (2024). Emotion Recognition in Instrumental Music Using AI. In Proceedings of the 34th Brazilian Conference on Intelligent Systems (BRACIS), Belém, Brasil. To appear.
Barrett, F. S. and Janata, P. (2016). Neural responses to nostalgia-evoking music modeled by elements of dynamic musical structure and individual differences in affective traits. Neuropsychologia, 91:234–246.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
Copet, J., Kreuk, F., Gat, I., Remez, T., Kant, D., Synnaeve, G., Adi, Y., and Défossez, A. (2023). Simple and controllable music generation. In Thirty-seventh Conference on Neural Information Processing Systems.
Davitz, J. R. (1964). The Communication of Emotional Meaning. McGraw Hill, New York.
Ferreira, L. N. and Whitehead, J. (2019). Learning to Generate Music with Sentiment. Proceedings of the Conference of the International Society for Music Information Retrieval.
Fónagy, I. and Magdics, K. (1963). Emotional Patterns in Intonation and Music. STUF - Language Typology and Universals, 16(1-4):293–326.
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., and et al., A. S. (2024). The Llama 3 Herd of Models. arXiv:2407.21783.
Han, D., Kong, Y., Han, J., and Wang, G. (2022). A survey of music emotion recognition. Frontiers of Computer Science, 16(6):166335.
Helmholtz, H. v. (1954). On the Sensations of Tone as a Physiological Basis for the Theory of Music. Dover Publications, New York.
Huang, Y.-S. and Yang, Y.-H. (2020). Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 1180–1188, New York, NY, USA. Association for Computing Machinery.
Hung, H.-T., Ching, J., Doh, S., Kim, N., Nam, J., and Yang, Y.-H. (2021). EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference.
Hurst, A., Lerer, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., Mądry, A., and et al., A. B.-W. (2024). Gpt-4o system card. arXiv:2410.21276.
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2023). Mistral 7B. arXiv:2310.06825.
Loveday, C. (2022). Why are minor chords sad and major chords happy? Lu, P., Xu, X., Kang, C., Yu, B., Xing, C., Tan, X., and Bian, J. (2023). Musecoco: Generating Symbolic Music from Text. arXiv:2306.00110.
Oliveira, A., Carvalho, L., Campos, D., and Mantovani, R. (2024). Music Genre Recognition with Handcrafted Audio Features. In Anais do XXI Encontro Nacional de Inteligência Artificial e Computacional, pages 541–552, Porto Alegre, RS, Brasil. SBC.
Qin, L., Chen, Q., Feng, X., Wu, Y., Zhang, Y., Li, Y., Li, M., Che, W., and Yu, P. S. (2024). Large Language Models Meet NLP: A Survey. arXiv:2405.12819.
Riviere, M., Pathak, S., Sessa, P. G., Hardin, C., Bhupatiraju, S., Hussenot, L., Mesnard, T., Shahriari, B., Ramé, A., Ferret, J., Liu, P., Tafti, P., Friesen, A., Casbon, M., Ramos, S., Kumar, R., Lan, C. L., Jerome, S., Tsitsulin, A., Vieillard, N., Stanczyk, P., Girgin, S., Momchev, N., Hoffman, M., Thakoor, S., and et al., J.-B. G. (2024). Gemma 2: Improving Open Language Models at a Practical Size. arXiv:2408.00118.
Russell, J. A. (1980). A Circumplex Model of Affect. Journal of Personality and Social Psychology, 39(6):1161.
Santos, A., Jácome, K. R., and Masiero, B. (2021). Song Emotion Recognition: A Study of the State of the Art. In Anais do XVIII Simpósio Brasileiro de Computação Musical, pages 209–212. SBC.
Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychol Bull, 99(2):143–165.
Seufitelli, D. and Moro, M. (2023). From exploration to exploitation: Understanding the evolution of music careers through a data-driven approach. In Anais do XXXVIII Simpósio Brasileiro de Bancos de Dados, pages 244–255. SBC.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. (2020). Transformers: State-of-the-Art Natural Language Processing. In Liu, Q. and Schlangen, D., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui, Z., Zhang, Z., and Qiu, Z. (2024). Qwen2.5 Technical Report.
Yepez, J., Tavares, B., Peres, F., and Becker, K. (2024). Na batida do funk: modelagem de tópicos combinando llm, engenharia de prompt e bertopic. In Anais do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 613–625, Porto Alegre, RS, Brasil. SBC.
Yu, D., Song, K., Lu, P., He, T., Tan, X., Ye, W., Zhang, S., and Bian, J. (2023). MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models. In Feng, Y. and Lefever, E., editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 246–255, Singapore. Association for Computational Linguistics.
Yuan, R., Lin, H., Wang, Y., Tian, Z., Wu, S., Shen, T., Zhang, G., Wu, Y., Liu, C., Zhou, Z., Xue, L., Ma, Z., Liu, Q., Zheng, T., Li, Y., Ma, Y., Liang, Y., Chi, X., Liu, R., Wang, Z., Lin, C., Liu, Q., Jiang, T., Huang, W., Chen, W., Fu, J., Benetos, E., Xia, G., Dannenberg, R., Xue, W., Kang, S., and Guo, Y. (2024). ChatMusician: Understanding and generating music intrinsically with LLM. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Findings of the Association for Computational Linguistics: ACL 2024, pages 6252–6271, Bangkok, Thailand. Association for Computational Linguistics.
Alves, C. F., Mozart, T. G., and Kowada, L. A. B. (2024). Emotion Recognition in Instrumental Music Using AI. In Proceedings of the 34th Brazilian Conference on Intelligent Systems (BRACIS), Belém, Brasil. To appear.
Barrett, F. S. and Janata, P. (2016). Neural responses to nostalgia-evoking music modeled by elements of dynamic musical structure and individual differences in affective traits. Neuropsychologia, 91:234–246.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
Copet, J., Kreuk, F., Gat, I., Remez, T., Kant, D., Synnaeve, G., Adi, Y., and Défossez, A. (2023). Simple and controllable music generation. In Thirty-seventh Conference on Neural Information Processing Systems.
Davitz, J. R. (1964). The Communication of Emotional Meaning. McGraw Hill, New York.
Ferreira, L. N. and Whitehead, J. (2019). Learning to Generate Music with Sentiment. Proceedings of the Conference of the International Society for Music Information Retrieval.
Fónagy, I. and Magdics, K. (1963). Emotional Patterns in Intonation and Music. STUF - Language Typology and Universals, 16(1-4):293–326.
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., and et al., A. S. (2024). The Llama 3 Herd of Models. arXiv:2407.21783.
Han, D., Kong, Y., Han, J., and Wang, G. (2022). A survey of music emotion recognition. Frontiers of Computer Science, 16(6):166335.
Helmholtz, H. v. (1954). On the Sensations of Tone as a Physiological Basis for the Theory of Music. Dover Publications, New York.
Huang, Y.-S. and Yang, Y.-H. (2020). Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 1180–1188, New York, NY, USA. Association for Computing Machinery.
Hung, H.-T., Ching, J., Doh, S., Kim, N., Nam, J., and Yang, Y.-H. (2021). EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference.
Hurst, A., Lerer, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., Mądry, A., and et al., A. B.-W. (2024). Gpt-4o system card. arXiv:2410.21276.
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2023). Mistral 7B. arXiv:2310.06825.
Loveday, C. (2022). Why are minor chords sad and major chords happy? Lu, P., Xu, X., Kang, C., Yu, B., Xing, C., Tan, X., and Bian, J. (2023). Musecoco: Generating Symbolic Music from Text. arXiv:2306.00110.
Oliveira, A., Carvalho, L., Campos, D., and Mantovani, R. (2024). Music Genre Recognition with Handcrafted Audio Features. In Anais do XXI Encontro Nacional de Inteligência Artificial e Computacional, pages 541–552, Porto Alegre, RS, Brasil. SBC.
Qin, L., Chen, Q., Feng, X., Wu, Y., Zhang, Y., Li, Y., Li, M., Che, W., and Yu, P. S. (2024). Large Language Models Meet NLP: A Survey. arXiv:2405.12819.
Riviere, M., Pathak, S., Sessa, P. G., Hardin, C., Bhupatiraju, S., Hussenot, L., Mesnard, T., Shahriari, B., Ramé, A., Ferret, J., Liu, P., Tafti, P., Friesen, A., Casbon, M., Ramos, S., Kumar, R., Lan, C. L., Jerome, S., Tsitsulin, A., Vieillard, N., Stanczyk, P., Girgin, S., Momchev, N., Hoffman, M., Thakoor, S., and et al., J.-B. G. (2024). Gemma 2: Improving Open Language Models at a Practical Size. arXiv:2408.00118.
Russell, J. A. (1980). A Circumplex Model of Affect. Journal of Personality and Social Psychology, 39(6):1161.
Santos, A., Jácome, K. R., and Masiero, B. (2021). Song Emotion Recognition: A Study of the State of the Art. In Anais do XVIII Simpósio Brasileiro de Computação Musical, pages 209–212. SBC.
Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychol Bull, 99(2):143–165.
Seufitelli, D. and Moro, M. (2023). From exploration to exploitation: Understanding the evolution of music careers through a data-driven approach. In Anais do XXXVIII Simpósio Brasileiro de Bancos de Dados, pages 244–255. SBC.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. (2020). Transformers: State-of-the-Art Natural Language Processing. In Liu, Q. and Schlangen, D., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui, Z., Zhang, Z., and Qiu, Z. (2024). Qwen2.5 Technical Report.
Yepez, J., Tavares, B., Peres, F., and Becker, K. (2024). Na batida do funk: modelagem de tópicos combinando llm, engenharia de prompt e bertopic. In Anais do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 613–625, Porto Alegre, RS, Brasil. SBC.
Yu, D., Song, K., Lu, P., He, T., Tan, X., Ye, W., Zhang, S., and Bian, J. (2023). MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models. In Feng, Y. and Lefever, E., editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 246–255, Singapore. Association for Computational Linguistics.
Yuan, R., Lin, H., Wang, Y., Tian, Z., Wu, S., Shen, T., Zhang, G., Wu, Y., Liu, C., Zhou, Z., Xue, L., Ma, Z., Liu, Q., Zheng, T., Li, Y., Ma, Y., Liang, Y., Chi, X., Liu, R., Wang, Z., Lin, C., Liu, Q., Jiang, T., Huang, W., Chen, W., Fu, J., Benetos, E., Xia, G., Dannenberg, R., Xue, W., Kang, S., and Guo, Y. (2024). ChatMusician: Understanding and generating music intrinsically with LLM. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Findings of the Association for Computational Linguistics: ACL 2024, pages 6252–6271, Bangkok, Thailand. Association for Computational Linguistics.
Publicado
29/09/2025
Como Citar
ASSIS, Gabriel; ALVARENGA, Laura; MORAES, João Vitor de; AZEVEDO, Lívia de; PAES, Aline.
Investigating the Emergent Reasoning Abilities of Large Language Models in Sentiment Analysis of Codified Music. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 415-426.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.12548.
