A Música Brasileira na Ditadura Militar: uma análise de tópicos com BERTopic e GSDMM
Resumo
Durante a ditadura militar no Brasil, artistas recorreram à música como forma de expressão, valendo-se de uma linguagem poética e metafórica. Nesse contexto, a análise sociopolítica dessas letras apresenta-se como um desafio interpretativo. Este trabalho busca analisar letras de músicas brasileiras lançadas durante este período, utilizando BERTopic e GSDMM para modelagem de tópicos, a fim de identificar temas-chave que refletem aspectos sociais, políticos e históricos do período. Os modelos analisados revelaram uma linguagem simples e cotidiana, com pouco uso de vocabulário erudito ou político, sugerindo que os artistas optaram por uma expressão direta e emocional, possivelmente para ampliar o diálogo com o público. Particularmente, o BERTopic destacou-se por mapear a diversidade temática com pouca redundância, enquanto o GSDMM segmentou temas dominantes em subtópicos altamente coesos. Este trabalho mostra como a música brasileira, por meio de uma linguagem poética e acessível, retratou o cotidiano da época e revelou o potencial da modelagem de tópicos como ferramenta para a análise cultural.Referências
Amorim, A., Murrugarra-Llerena, N., Silva, V., de Oliveira, D., and Paes, A. (2022). Modelagem de tópicos em textos curtos: uma avaliação experimental. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 254–266. SBC.
Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol, CA.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Cavalcanti, I. L. L. (2018). Censura moral e música na ditadura militar no brasil: o regime contra a transgressão da família e dos bons costumes. Working Paper, (75). Acesso em: 10 ago. 2023.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273–297.
Dalmora, A. and Tavares, T. (2019). Identifying narrative contexts in brazilian popular music lyrics using sparse topic models: A comparison between human-based and machine-based classification. In Simpósio Brasileiro de Computação Musical (SBCM), pages 17–21. SBC.
de Araújo Lima, R., de Sousa, R. C. C., Lopes, H., and Barbosa, S. D. J. (2020). Brazilian lyrics-based music genre classification using a blstm network. In International Conference on Artificial Intelligence and Soft Computing, pages 525–534. Springer.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
Dieng, A. B., Ruiz, F. J. R., and Blei, D. M. (2020). Topic modeling in embedding spaces. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS).
Fernandes Tavares, T. and José Ayres, F. (2025). Multi-label cross-lingual automatic music genre classification from lyrics with sentence bert. arXiv e-prints, pages arXiv–2501.
Google DeepMind (2024). Gemini 2.5 Pro. Modelo multimodal avançado de linguagem (LLM) com contexto de até 1 milhão de tokens. Disponível via Google AI Studio, Vertex AI e API.
Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5-6):602–610.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.
Harris, Z. S. (1954). Distributional structure. Word, 10(2-3):146–162.
Lim, D.-H. and Benson, A. (2021). Expertise dynamics in online annotation communities: The case of genius.com. In Proceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 480–491.
Maia, A. V. (2015). A música popular brasileira e a ditadura militar: vozes de coragem como manifestações de enfrentamento aos instrumentos de repressão.
McInnes, L., Healy, J., Astels, S., et al. (2017). hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2(11):205.
McInnes, L., Healy, J., and Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830.
Qiang, J., Zhenyu, Q., Yunhao, Y., and Xindong, W. (2019). Short text topic modeling techniques, applications, and performance: A survey. arXiv preprint arXiv:1904.07695.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Ribeiro, R. D. S. F. M., Ramos, P. d. P. N., et al. (2023). Sentiment analysis and topic modeling of portuguese and brazilian song lyrics through the years. Master’s thesis, iscte.
Röder, M., Both, A., and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining, pages 399–408. ACM.
Rosenberg, T. (2013). The soundtrack of revolution: Memory, affect, and the power of protest songs. Culture Unbound, 5(2):175–188.
Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21.
Tagg, P. (1982). Analysing popular music: theory, method and practice. Popular music, 2:37–67.
Webber, W., Moffat, A., and Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS), 28(4):1–38.
Wukkadada, S. (2025). Decoding spotify hits: Statistical and predictive analysis of track features driving song popularity. Academy of Marketing Studies Journal, 29(1).
Yepez, J., Tavares, B., Peres, F., and Becker, K. (2024). Na batida do funk: modelagem de tópicos combinando llm, engenharia de prompt e bertopic. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 613–625. SBC.
Yin, J. and Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 233–242.
Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol, CA.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Cavalcanti, I. L. L. (2018). Censura moral e música na ditadura militar no brasil: o regime contra a transgressão da família e dos bons costumes. Working Paper, (75). Acesso em: 10 ago. 2023.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273–297.
Dalmora, A. and Tavares, T. (2019). Identifying narrative contexts in brazilian popular music lyrics using sparse topic models: A comparison between human-based and machine-based classification. In Simpósio Brasileiro de Computação Musical (SBCM), pages 17–21. SBC.
de Araújo Lima, R., de Sousa, R. C. C., Lopes, H., and Barbosa, S. D. J. (2020). Brazilian lyrics-based music genre classification using a blstm network. In International Conference on Artificial Intelligence and Soft Computing, pages 525–534. Springer.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
Dieng, A. B., Ruiz, F. J. R., and Blei, D. M. (2020). Topic modeling in embedding spaces. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS).
Fernandes Tavares, T. and José Ayres, F. (2025). Multi-label cross-lingual automatic music genre classification from lyrics with sentence bert. arXiv e-prints, pages arXiv–2501.
Google DeepMind (2024). Gemini 2.5 Pro. Modelo multimodal avançado de linguagem (LLM) com contexto de até 1 milhão de tokens. Disponível via Google AI Studio, Vertex AI e API.
Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5-6):602–610.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.
Harris, Z. S. (1954). Distributional structure. Word, 10(2-3):146–162.
Lim, D.-H. and Benson, A. (2021). Expertise dynamics in online annotation communities: The case of genius.com. In Proceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 480–491.
Maia, A. V. (2015). A música popular brasileira e a ditadura militar: vozes de coragem como manifestações de enfrentamento aos instrumentos de repressão.
McInnes, L., Healy, J., Astels, S., et al. (2017). hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2(11):205.
McInnes, L., Healy, J., and Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830.
Qiang, J., Zhenyu, Q., Yunhao, Y., and Xindong, W. (2019). Short text topic modeling techniques, applications, and performance: A survey. arXiv preprint arXiv:1904.07695.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Ribeiro, R. D. S. F. M., Ramos, P. d. P. N., et al. (2023). Sentiment analysis and topic modeling of portuguese and brazilian song lyrics through the years. Master’s thesis, iscte.
Röder, M., Both, A., and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining, pages 399–408. ACM.
Rosenberg, T. (2013). The soundtrack of revolution: Memory, affect, and the power of protest songs. Culture Unbound, 5(2):175–188.
Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21.
Tagg, P. (1982). Analysing popular music: theory, method and practice. Popular music, 2:37–67.
Webber, W., Moffat, A., and Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS), 28(4):1–38.
Wukkadada, S. (2025). Decoding spotify hits: Statistical and predictive analysis of track features driving song popularity. Academy of Marketing Studies Journal, 29(1).
Yepez, J., Tavares, B., Peres, F., and Becker, K. (2024). Na batida do funk: modelagem de tópicos combinando llm, engenharia de prompt e bertopic. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 613–625. SBC.
Yin, J. and Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 233–242.
Publicado
29/09/2025
Como Citar
PICENI, Henry R.; ALEXANDRE, Pedro V.; BALREIRA, Dennis G..
A Música Brasileira na Ditadura Militar: uma análise de tópicos com BERTopic e GSDMM. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 16. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 349-360.
DOI: https://doi.org/10.5753/stil.2025.37837.
