A study on gender biases in NLP models applied to stories generated by GPT-3.5 and Gemini

  • Maria Clara Ramalho Medeiros IFPB
  • Francisco Paulo de Freitas Neto IFPB

Abstract


This work addresses the importance of studying gender biases in Natural Language Processing (NLP) models, particularly in generative artificial intelligences. The research aimed to understand how these biases are reproduced in texts generated by models such as GPT and Gemini. To achieve this, the BERT NLP model was trained to infer the gender referenced in the text. The study utilized the md_gender_bias dataset to investigate these biases, highlighting the importance of analyzing the social impact of AIs, especially when used without considering these biases. Based on the analysis of the obtained results, the presence of historical bias, confirmation bias, and selection bias in these models was confirmed.

References

Assi, F. and Caseli, H. (2024). Biases in gpt-3.5 turbo model: a case study regarding gender and language. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 294–305, Porto Alegre, RS, Brasil. SBC.

Autran, F. (2018). Ia da amazon usada em análise de currículos discriminava mulheres.

Caseli, H. and Nunes, M. (2024). Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português – 3ª Edição. BPLN, São Carlos.

Costa-jussa, M. (2019). An analysis of gender bias studies in natural language processing. Nature Machine Intelligence, 1.

Cristina Lopes Perna, H. (2010). Linguagens especializadas em corpora : modos de dizer e interfaces de pesquisa. Edipucrs.

Dev, S., Monajatipoor, M., Ovalle, A., Subramonian, A., Phillips, J., and Chang, K.-W. (2021). Harms of gender exclusivity and challenges in non-binary representation in language technologies. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t., editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1968–1994, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Devinney, H., Björklund, J., and Björklund, H. (2022). Theories of ”gender” in nlp bias research.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.

Dinan, E., Fan, A., Wu, L., Weston, J., Kiela, D., and Williams, A. (2020). Multi-dimensional gender bias classification.

II, S. M. W. (2023). Comparative analysis: Google gemini pro vs. openai gpt-3.5.

Koroteev, M. V. (2021). BERT: A review of applications in natural language processing and understanding. CoRR, abs/2103.11943.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.

Lison, P. and Tiedemann, J. (2016). OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 923–929, Portorož, Slovenia. European Language Resources Association (ELRA).

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2022). A survey on bias and fairness in machine learning.

Nemani, P., Joel, Y., Vijay, P., and Liza, F. (2024). Gender bias in transformers: A comprehensive review of detection and mitigation strategies. Natural Language Processing Journal, 6.

OpenAI (2022). Chatgpt: Language models are few-shot learners.

Pichai, S. and Hassabis, D. (2023). Apresentando o gemini: nosso maior e mais hábil modelo de ia.

Qasim, R., Bangyal, W. H., Alqarni, M. A., and Almazroi, A. A. (2022). A fine-tuned bert-based transfer learning approach for text classification. Journal of Healthcare Engineering, 2022:1–17.

Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training.

Rodrigues, G., Albuquerque, D., and Chagas, J. (2023). Análise de vieses ideológicos em produções textuais do assistente de bate-papo chatgpt. In Anais do IV Workshop sobre as Implicações da Computação na Sociedade, pages 148–155, Porto Alegre, RS, Brasil. SBC.

Russell, S. and Norvig, P. (2019). Artificial Intelligence: A Modern Approach. Pearson, Harlow, England, 3rd edition.

Stanovsky, G., Smith, N. A., and Zettlemoyer, L. (2019). Evaluating gender bias in machine translation. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.

Techrunch (2024). Google still hasn’t fixed gemini’s biased image generator.

Torfi, A., Shirvani, R. A., Keneshloo, Y., Tavaf, N., and Fox, E. A. (2020). Natural language processing advancements by deep learning: A survey. CoRR, abs/2003.01200.

Zack, T., Lehman, E., Suzgun, M., Rodriguez, J. A., Celi, L. A., Gichoya, J., Jurafsky, D., Szolovits, P., Bates, D. W., Abdulnour, R.-E. E., Butte, A. J., and Alsentzer, E. (2023). Coding inequity: Assessing gpt-4’s potential for perpetuating racial and gender biases in healthcare. medRxiv.
Published
2025-07-20
MEDEIROS, Maria Clara Ramalho; FREITAS NETO, Francisco Paulo de. A study on gender biases in NLP models applied to stories generated by GPT-3.5 and Gemini. In: WORKSHOP ON THE IMPLICATIONS OF COMPUTING IN SOCIETY (WICS), 6. , 2025, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 41-52. ISSN 2763-8707. DOI: https://doi.org/10.5753/wics.2025.7999.