LLMusic: Topic Modeling for lyrics Combining LLM, Prompt Engineering, and BERTopic

Abstract


Song lyrics impose additional challenges to topic modeling, as the discourse is often implicit and must be understood within its context, using figurative and poetic language, and slangs. This paper proposes LLMusic, a new topic modeling approach that leverages Large Language Models (LLMs) to analyze lyrics. We use LLMs and prompting to summarize song excerpts into central themes in an iterative and unsupervised process applied to a corpus representative of the genre. These themes are grouped into a lean, coherent set of topics using BERTopic. Through zero-shot prompts, one can classify new lyrics based on these topics. In the case study, LLMusic captures the social phenomena at the base of Brazilian funk, demonstrating its potential for large-scale analysis.

Keywords: LLMs, engenharia de prompt, modelagem de tópicos, BERTopic

References

Betti, L., Abrate, C., and Kaltenbrunner, A. (2023). Large scale analysis of gender bias and sexism in song lyrics. EPJ Data Science, 12(1):10.

Calcina, Erik e Novak, E. (2022). Measuring the similarity of song artists using topic modelling. In Proc. of the 25th Intl. Multiconference Information Society - Data Mining and Data Warehouses (SiKDD), page 103–106.

Devi, M. D. and Saharia, N. (2020). Exploiting topic modelling to classify sentiment from lyrics. In Proc. of the 2nd Intl. Conferemce on Machine Learning, Image Processing, Network Security and Data Sciences (MIND), pages 411–423.

Junior, J. S., Rossi, R., and Lobato, F. (2019). Uma abordagem baseada em letras para a descoberta de conhecimento da música brasileira: o sertanejo como um estudo de caso. In Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pages 949–960, Porto Alegre, RS, Brasil. SBC.

Lopes, A. C. and Facina, A. (2012). Cidade do funk: expressões da diáspora negra nas favelas cariocas. Revista do Arquivo Geral da Cidade do Rio de Janeiro, 6:193–206.

Oramas, S., Espinosa-Anke, L., Gómez, F., and Serra, X. (2018). Natural language processing for music knowledge discovery. Journal of New Music Research, 47:365–382.

Pengfei Liu, Weizhe Yuan, J. F. Z. J. H. H. and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey ofprompting methods in natural language processing. ACMCom-put., 55(9):35.

Peres, F. C. (2023). Puta ou santa: as relações com mulheres enquanto elemento constituinte das masculinidades do funk brasileiro? In IV Encontro Anual de Antropologia do Mercosul.

Pham, C. M., Hoyle, A., Sun, S., Resnik, P., and Iyyer, M. (2024). Topicgpt: A prompt-based topic modeling framework. DOI: 10.48550/arXiv.2311.01449.

Ramon Pires, Hugo Abonizio, T. S. A. and Nogueira, R. (2023). Sabía: Portuguese large language models. Anais da XII Brazilian Conference on Intelligent Systems, 12(1):15.

Smiler, A. P., Shewmaker, J. W., and Hearon, B. (2017). From “i want to hold your hand” to “promiscuous”: Sexual stereotypes in popular music lyrics, 1960–2008. Sexuality & Culture, 21(4):1083–1105.

Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K., and Hashimoto, T. B. (2024). Benchmarking Large Language Models for News Summarization. Transactions of the Association for Computational Linguistics, 12:39–57.
Published
2024-10-14
ROJAS, Jesus Daniel Yepez; BECKER, Karin. LLMusic: Topic Modeling for lyrics Combining LLM, Prompt Engineering, and BERTopic. In: WORKSHOP ON THESIS AND DISSERTATION (WTDBD) - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 158-164. DOI: https://doi.org/10.5753/sbbd_estendido.2024.243767.