Topic Modeling in Feminist Debates on Instagram: A Generative AI Approach
Resumo
This study investigates the identification of feminist themes addressed by Brazilian profiles on Instagram using Natural Language Processing (NLP) techniques and topic modeling. To analyze recurring discussions and their variations, we apply the BERTopic technique for topic modeling, while we use the large language model (LLM) LLaMA for labeling the identified themes. The modeling process resulted in 90 topics, highlighting issues such as domestic violence, reproductive rights, and mental health, reflecting current debates in the Brazilian context. Additionally, an experiment compared topic labeling performed by human participants and LLM, analyzing the similarity between both responses. The BERTScore metric, which assesses semantic similarity, yielded the highest results, with values between 0.68 and 0.79. This result indicates that the LLM produced semantically similar responses to human ones. The results emphasize the role of NLP techniques and language models in identifying complex social themes, providing a solid foundation for future studies on the impact of social media on awareness and social change promotion.Referências
Bérubé, M., Tang, T.-U., Fortin, F., Ozalp, S., Williams, M. L., and Burnap, P. (2020). Social media forensics applied to assessment of post–critical incident social reaction: The case of the 2017 manchester arena terrorist attack. Forensic science international, 313:110364.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
Brown, N. M. (2019). Methodological cyborg as black feminist technology: constructing the social self using computational digital autoethnography and social media. Cultural Studies: Critical Methodologies, 19(1):55–67.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2. Minneapolis, Minnesota.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.
Ibrahim, N. F. andWang, X. (2019). A text analytics approach for online retailing service improvement: Evidence from twitter. Decision Support Systems, 121:37–50.
Joo, S., Lu, K., and Lee, T. (2020). Analysis of content topics, user engagement and library factors in public library social media based on text mining. Online information review, 44(1):258–277.
Kirilenko, A. and Stepchenkova, S. (2024). Automated topic analysis with large language models. In ENTER e-Tourism Conference, pages 29–34. Springer.
Kurten, S. and Beullens, K. (2021). # coronavirus: Monitoring the belgian twitter discourse on the severe acute respiratory syndrome coronavirus 2 pandemic. Cyberpsychology, Behavior, and Social Networking, 24(2):117–122.
Kwon, O. H., Vu, K., Bhargava, N., Radaideh, M. I., Cooper, J., Joynt, V., and Radaideh, M. I. (2024). Sentiment analysis of the united states public support of nuclear power on social media using large language models. Renewable and Sustainable Energy Reviews, 200:114570.
Laureate, C. D. P., Buntine, W., and Linger, H. (2023). A systematic review of the use of topic models for short text social media analysis. Artificial Intelligence Review, 56(12):14223–14255.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. ACL. Workshop of the ACL 2004.
Lowenthal, M. M. (2020). Intelligence: From Secrets to Policy. CQ Press.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Mazarura, J. and de Waal, A. (2016). A comparison of the performance of latent dirichlet allocation and the dirichlet multinomial mixture model on short text. In 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), pages 1–6.
Nobles, A. L., Leas, E. C., Latkin, C. A., Dredze, M., Strathdee, S. A., and Ayers, J. W. (2020). # hiv: alignment of hiv-related visual content on instagram with public health priorities in the us. AIDS and Behavior, 24:2045–2053.
Paasonen, S. (2011). Revisiting cyberfeminism. Communications.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318. ACL.
Peres, D., Silva, G., Faria, E., and Barioni, M. (2023). Análise do estresse e tópicos discutidos no twitter durante a pandemia da covid-19 no brasil. In Anais do XII Brazilian Workshop on Social Network Analysis and Mining, pages 43–54, Porto Alegre, RS, Brasil. SBC.
Statista (2024). Instagram: number of global users 2020-2025. Available at: [link]. Accessed on: March 29, 2025.
Vachhani, S. J. (2024). Networked feminism in a digital age—mobilizing vulnerability and reconfiguring feminist politics in digital activism. Gender, Work & Organization, 31(3):1031–1048.
Wahid, J. A., Xu, M., Ayoub, M., Jiang, X., Lei, S., Gao, Y., Hussain, S., and Yang, Y. (2025). Ai-driven social media text analysis during crisis: A review for natural disasters and pandemics. Applied Soft Computing, page 112774.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
Brown, N. M. (2019). Methodological cyborg as black feminist technology: constructing the social self using computational digital autoethnography and social media. Cultural Studies: Critical Methodologies, 19(1):55–67.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2. Minneapolis, Minnesota.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.
Ibrahim, N. F. andWang, X. (2019). A text analytics approach for online retailing service improvement: Evidence from twitter. Decision Support Systems, 121:37–50.
Joo, S., Lu, K., and Lee, T. (2020). Analysis of content topics, user engagement and library factors in public library social media based on text mining. Online information review, 44(1):258–277.
Kirilenko, A. and Stepchenkova, S. (2024). Automated topic analysis with large language models. In ENTER e-Tourism Conference, pages 29–34. Springer.
Kurten, S. and Beullens, K. (2021). # coronavirus: Monitoring the belgian twitter discourse on the severe acute respiratory syndrome coronavirus 2 pandemic. Cyberpsychology, Behavior, and Social Networking, 24(2):117–122.
Kwon, O. H., Vu, K., Bhargava, N., Radaideh, M. I., Cooper, J., Joynt, V., and Radaideh, M. I. (2024). Sentiment analysis of the united states public support of nuclear power on social media using large language models. Renewable and Sustainable Energy Reviews, 200:114570.
Laureate, C. D. P., Buntine, W., and Linger, H. (2023). A systematic review of the use of topic models for short text social media analysis. Artificial Intelligence Review, 56(12):14223–14255.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. ACL. Workshop of the ACL 2004.
Lowenthal, M. M. (2020). Intelligence: From Secrets to Policy. CQ Press.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Mazarura, J. and de Waal, A. (2016). A comparison of the performance of latent dirichlet allocation and the dirichlet multinomial mixture model on short text. In 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), pages 1–6.
Nobles, A. L., Leas, E. C., Latkin, C. A., Dredze, M., Strathdee, S. A., and Ayers, J. W. (2020). # hiv: alignment of hiv-related visual content on instagram with public health priorities in the us. AIDS and Behavior, 24:2045–2053.
Paasonen, S. (2011). Revisiting cyberfeminism. Communications.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318. ACL.
Peres, D., Silva, G., Faria, E., and Barioni, M. (2023). Análise do estresse e tópicos discutidos no twitter durante a pandemia da covid-19 no brasil. In Anais do XII Brazilian Workshop on Social Network Analysis and Mining, pages 43–54, Porto Alegre, RS, Brasil. SBC.
Statista (2024). Instagram: number of global users 2020-2025. Available at: [link]. Accessed on: March 29, 2025.
Vachhani, S. J. (2024). Networked feminism in a digital age—mobilizing vulnerability and reconfiguring feminist politics in digital activism. Gender, Work & Organization, 31(3):1031–1048.
Wahid, J. A., Xu, M., Ayoub, M., Jiang, X., Lei, S., Gao, Y., Hussain, S., and Yang, Y. (2025). Ai-driven social media text analysis during crisis: A review for natural disasters and pandemics. Applied Soft Computing, page 112774.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
Publicado
20/07/2025
Como Citar
ALMEIDA, Thalia; BARBOSA, Keila; FERNANDES, Sheyla; AQUINO, André.
Topic Modeling in Feminist Debates on Instagram: A Generative AI Approach. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 14. , 2025, Maceió/AL.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 173-186.
ISSN 2595-6094.
DOI: https://doi.org/10.5753/brasnam.2025.8980.
