On the Beat of Funk: Topic Modeling Combining LLM, Prompt Engineering, and BERTopic

  • Jesus Yepez Federal University of Rio Grande do Sul (UFRGS)
  • Bruno Tavares Federal University of Rio Grande do Sul (UFRGS)
  • Fabíola Peres Federal University of Rio Grande do Sul (UFRGS)
  • Karin Becker Federal University of Rio Grande do Sul (UFRGS) https://orcid.org/0000-0003-4967-1027

Abstract


Song lyrics impose additional challenges to topic modeling, as the discourse is often implicit and must be understood within its context, using figurative and poetic language, and slangs. This paper proposes LLMusic, a new topic modeling approach that leverages Large Language Models (LLMs) to analyze lyrics, using Brazilian funk as a case study. Funk is a rich social portrait of the periphery. We use LLMs and prompting to summarize song excerpts into central themes in an iterative and unsupervised process applied to a corpus representative of the genre. These themes are grouped into a lean, coherent set of topics using BERTopic. Through zero-shot prompts, one can classify new lyrics based on these topics. We applied LLMusic to analyze the discourse in the 100 most popular funks, showing its potential for large-scale analysis.
Keywords: Topic Modeling, Large Language Model, Prompting Engineer, BERTopic, Braziliam Funk

References

Betti, L., Abrate, C., and Kaltenbrunner, A. (2023). Large scale analysis of gender bias and sexism in song lyrics. EPJ Data Science, 12(1):10.

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022.

Brilhante, A. V. M., Giaxa, R. R. B., Branco, J. G. d. O., and Vieira, L. J. E. d. S. (2019). Cultura do estupro e violência ostentação: uma análise a partir da artefactualidade do funk. Interface-Comunicação, Saúde, Educação, 23:e170621.

Calcina, Erik e Novak, E. (2022). Measuring the similarity of song artists using topic modelling. In Proc. of the 25th Intl. Multiconference Information Society - Data Mining and Data Warehouses (SiKDD), page 103–106.

Devi, M. D. and Saharia, N. (2020). Exploiting topic modelling to classify sentiment from lyrics. In Proc. of the 2nd Intl. Conferemce on Machine Learning, Image Processing, Network Security and Data Sciences (MIND), pages 411–423.

Grootendorst, M. (2022). BERTopic: Leveraging bert and topic modeling for efficient document clustering. [link].

Junior, J. S., Rossi, R., and Lobato, F. (2019). Uma abordagem baseada em letras para a descoberta de conhecimento da música brasileira: o sertanejo como um estudo de caso. In Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pages 949–960.

Lopes, A. C. (2011). Funk-se Quem Quiser: No Batidão Negro Da Cidade Carioca. Bom Texto FAPERJ.

Oramas, S., Espinosa-Anke, L., Gómez, F., and Serra, X. (2018). Natural language processing for music knowledge discovery. Journal of New Music Research, 47:365–382.

Pengfei Liu, Weizhe Yuan, J. F. Z. J. H. H. and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey ofprompting methods in natural language processing. ACMCom-put., 55(9):35.

Peres, F. C. (2023). Puta ou santa: as relações com mulheres enquanto elemento constituinte das masculinidades do funk brasileiro? In Anais do IV Encontro Anual de Antropologia do Mercosul.

Pham, C. M., Hoyle, A., Sun, S., Resnik, P., and Iyyer, M. (2024). Topicgpt: A prompt-based topic modeling framework. DOI: 10.48550/arXiv.2311.01449.

Ramon Pires, Hugo Abonizio, T. S. A. and Nogueira, R. (2023). Sabía: Portuguese large language models. Anais da XII Brazilian Conference on Intelligent Systems, 12(1):15.

Röder, M., Both, A., and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining, pages 399–408.

Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K., and Hashimoto, T. B. (2024). Benchmarking Large Language Models for News Summarization. Transactions of the Association for Computational Linguistics, 12:39–57.
Published
2024-10-14
YEPEZ, Jesus; TAVARES, Bruno; PERES, Fabíola; BECKER, Karin. On the Beat of Funk: Topic Modeling Combining LLM, Prompt Engineering, and BERTopic. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 613-625. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.243148.