Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models
Abstract
This study investigates how large language models, in particular LLaMA 3.2-3B, construct narratives about Black and white women in short stories generated in Portuguese. From 2100 texts, we applied computational methods to group semantically similar stories, allowing a selection for qualitative analysis. Three main discursive representations emerge: social overcoming, ancestral mythification and subjective self-realization. The analysis uncovers how grammatically coherent, seemingly neutral texts materialize a crystallized, colonially structured framing of the female body, reinforcing historical inequalities. The study proposes an integrated approach, that combines machine learning techniques with qualitative, manual discourse analysis.References
Abid, A., Farooqi, M., and Zou, J. (2021). Persistent anti-muslim bias in large language models. In AAAI/ACM Conference on AI, Ethics, and Society, pages 298–306.
Araújo, J. (2024). Racismo algorítmico e microagressões nas redes sociais. Domínios de Lingu@gem, 18:e1849.
Assi, F. M. and Caseli, H. d. M. (2024). Biases in gpt-3.5 turbo model: a case study regarding gender and language. In Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL), pages 294–305. SBC.
Blodgett, S. L., Barocas, S., Daumé III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. In 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476.
Bonil, G., Hashiguti, S., Silva, J., Gondim, J., Maia, H., Silva, N., Pedrini, H., and Avila, S. (2025). Yet another algorithmic bias: A discursive analysis of large language models reinforcing dominant discourses on gender and race.
Bordia, S. and Bowman, S. R. (2019). Identifying and reducing gender bias in word-level language models. In Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 7–15.
Caliński, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods, 3:1–27.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., and Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408:189–215.
Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024). M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. In Findings of the Association for Computational Linguistics: ACL 2024, pages 2318–2335. Association for Computational Linguistics.
Corazza, B. X., Silva, D. V. S., and Neves, C. A. d. B. (2024). “Ser ou não ser” digno de uma história de amor: inovações do ChatGPT e persistência colonial na validação de existências. Domínios de Lingu@gem, 18:e1831.
Cortes, C. and Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3):273–297.
Courtine, J. J. (1981). Analyse du discours politique (le discours communiste addresé aux chrétiens). Langages, 62:19–128.
Grant, D. L. and Grant, M. B. (1975). Some notes on the capital “N”. Phylon (1960-), 36(4):435–443.
Grattafiori, A. et al. (2024). The Llama 3 Herd of Models.
Guardian, T. (2021). Facebook data leak: Details from 533 million users found on website for hackers.
Hashiguti, S. T. (2015). Corpo de Memória. Paco Editorial, São Paulo.
Kilomba, G. (2016). Plantation memories: episodes of everyday racism. UNRAST-Verlag, Münster, 4th edition edition.
Lucy, L. and Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories. In Third Workshop on Narrative Understanding, pages 48–55.
Maxwell, J. (1992). Understanding and validity in qualitative research. Harvard Educational Review, 62(3):279–301.
May, C., Wang, A., Bordia, S., Bowman, S. R., and Rudinger, R. (2019). On measuring social biases in sentence encoders. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 622–628. Association for Computational Linguistics.
McInnes, L., Healy, J., Saul, N., and Grossberger, L. (2018). Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861.
Pêcheux, M. (2022). O Discurso Estrutura ou Acontecimento. Pontes, Campinas. Translation: Eni Puccinelli Orlandi.
Rajagopalan, K. (2023). A disciplina chamada linguística aplicada e as contribuições de Luiz Paulo da Moita Lopes. Oficina de Linguística Aplicada INdisciplinar: homenagem a Luiz Paulo da Moita Lopes. Campinas, SP: Editora da Unicamp, pages 193–212.
Remy, P. (2021). Name dataset. [link].
Salinas, A., Haim, A., and Nyarko, J. (2024). What’s in a name? Auditing large language models for race and gender bias. arXiv preprint arXiv:2402.14875.
Sheng, E., Chang, K.-W., Natarajan, P., and Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In International Joint Conference on Natural Language Processing, pages 3407–3412.
Silva, T. (2022). Racismo Algorítmico: Inteligência Artificial e Discriminação nas Redes Digitais. Edições SESC SP, São Paulo.
Tharps, L. L. (2014). I refuse to remain in the lower case. [link]. [Accessed 22-1-2025].
Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T.-H., and Wilson, S. (2023). Nationality bias in text generation. arXiv preprint arXiv:2302.02463.
Zao-Sanders, M. (2025). How People Are Really Using Gen AI in 2025. [link].
Araújo, J. (2024). Racismo algorítmico e microagressões nas redes sociais. Domínios de Lingu@gem, 18:e1849.
Assi, F. M. and Caseli, H. d. M. (2024). Biases in gpt-3.5 turbo model: a case study regarding gender and language. In Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL), pages 294–305. SBC.
Blodgett, S. L., Barocas, S., Daumé III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. In 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476.
Bonil, G., Hashiguti, S., Silva, J., Gondim, J., Maia, H., Silva, N., Pedrini, H., and Avila, S. (2025). Yet another algorithmic bias: A discursive analysis of large language models reinforcing dominant discourses on gender and race.
Bordia, S. and Bowman, S. R. (2019). Identifying and reducing gender bias in word-level language models. In Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 7–15.
Caliński, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods, 3:1–27.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., and Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408:189–215.
Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024). M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. In Findings of the Association for Computational Linguistics: ACL 2024, pages 2318–2335. Association for Computational Linguistics.
Corazza, B. X., Silva, D. V. S., and Neves, C. A. d. B. (2024). “Ser ou não ser” digno de uma história de amor: inovações do ChatGPT e persistência colonial na validação de existências. Domínios de Lingu@gem, 18:e1831.
Cortes, C. and Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3):273–297.
Courtine, J. J. (1981). Analyse du discours politique (le discours communiste addresé aux chrétiens). Langages, 62:19–128.
Grant, D. L. and Grant, M. B. (1975). Some notes on the capital “N”. Phylon (1960-), 36(4):435–443.
Grattafiori, A. et al. (2024). The Llama 3 Herd of Models.
Guardian, T. (2021). Facebook data leak: Details from 533 million users found on website for hackers.
Hashiguti, S. T. (2015). Corpo de Memória. Paco Editorial, São Paulo.
Kilomba, G. (2016). Plantation memories: episodes of everyday racism. UNRAST-Verlag, Münster, 4th edition edition.
Lucy, L. and Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories. In Third Workshop on Narrative Understanding, pages 48–55.
Maxwell, J. (1992). Understanding and validity in qualitative research. Harvard Educational Review, 62(3):279–301.
May, C., Wang, A., Bordia, S., Bowman, S. R., and Rudinger, R. (2019). On measuring social biases in sentence encoders. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 622–628. Association for Computational Linguistics.
McInnes, L., Healy, J., Saul, N., and Grossberger, L. (2018). Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861.
Pêcheux, M. (2022). O Discurso Estrutura ou Acontecimento. Pontes, Campinas. Translation: Eni Puccinelli Orlandi.
Rajagopalan, K. (2023). A disciplina chamada linguística aplicada e as contribuições de Luiz Paulo da Moita Lopes. Oficina de Linguística Aplicada INdisciplinar: homenagem a Luiz Paulo da Moita Lopes. Campinas, SP: Editora da Unicamp, pages 193–212.
Remy, P. (2021). Name dataset. [link].
Salinas, A., Haim, A., and Nyarko, J. (2024). What’s in a name? Auditing large language models for race and gender bias. arXiv preprint arXiv:2402.14875.
Sheng, E., Chang, K.-W., Natarajan, P., and Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In International Joint Conference on Natural Language Processing, pages 3407–3412.
Silva, T. (2022). Racismo Algorítmico: Inteligência Artificial e Discriminação nas Redes Digitais. Edições SESC SP, São Paulo.
Tharps, L. L. (2014). I refuse to remain in the lower case. [link]. [Accessed 22-1-2025].
Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T.-H., and Wilson, S. (2023). Nationality bias in text generation. arXiv preprint arXiv:2302.02463.
Zao-Sanders, M. (2025). How People Are Really Using Gen AI in 2025. [link].
Published
2025-09-29
How to Cite
BONIL, Gustavo; GONDIM, João; SANTOS, Marina dos; HASHIGUTI, Simone; MAIA, Helena; SILVA, Nadia; PEDRINI, Helio; AVILA, Sandra.
Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 66-77.
DOI: https://doi.org/10.5753/stil.2025.37814.
