Biases in GPT-3.5 Turbo model: a case study regarding gender and language

Fernanda Malheiros Assi; Helena de Medeiros Caseli

doi:10.5753/stil.2024.245358

Fernanda Malheiros Assi UFSCar http://orcid.org/0000-0001-8820-964X
Helena de Medeiros Caseli UFSCar https://orcid.org/0000-0003-3996-8599

DOI: https://doi.org/10.5753/stil.2024.245358

Resumo

Interactions with Generative Language Models like OpenAI’s GPT3.5 Turbo are increasingly common in everyday life, making it essential to examine their potential biases. This study assesses biases in the GPT-3.5 Turbo model using the regard metric, which evaluates the level of respect or esteem expressed towards different demographic groups. Specifically, we investigate how the model perceives regard towards different genders (male, female, and neutral) in both English and Portuguese. To achieve this, we isolated three variables (gender, language, and moderation filters) and analyzed their individual impacts on the model’s outputs. Our results indicate a slight positive bias towards feminine over masculine and neutral genders, a more favorable bias towards English compared to Portuguese, and consistently more negative outputs when we attempted to reduce the moderation filters.

Palavras-chave: Natural Language Processing, NLP, Gender Bias, GPT-3.5, Generative Language Models, Bias, Regard Metric, Moderation Filters, Language Bias, Large Language Models, LLM

Referências

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., and Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings.

Busker, T., Choenni, S., and Shoae Bargh, M. (2023). Stereotypes in chatgpt: an empirical study. In Proceedings of the 16th International Conference on Theory and Practice of Electronic Governance, ICEGOV ’23, page 24–32, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3614321.3614325

Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186. DOI: 10.1126/science.aal4230

Cambridge Dictionary (2024). Regard.

Das, A., Selek, S., Warner, A. R., Zuo, X., Hu, Y., Kuttichi Keloth, V., Li, J., Zheng, W. J., and Xu, H. (2022). Conversational bots for psychotherapy: A study of generative transformer models using domain-specific dialogues. In Demner-Fushman, D., Cohen, K. B., Ananiadou, S., and Tsujii, J., editors, Proceedings of the 21st Workshop on Biomedical Language Processing, pages 285–297, Dublin, Ireland. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.bionlp-1.27

Deshpande, A., Murahari, V., Rajpurohit, T., Kalyan, A., and Narasimhan, K. (2023). Toxicity in chatgpt: Analyzing persona-assigned language models. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1236–1270, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-emnlp.88

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. [link] DOI: 10.18653/v1/N19-1423

Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Cohn, T., He, Y., and Liu, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2020.findings-emnlp.301

Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., and Khot, T. (2024). Bias runs deep: Implicit reasoning biases in persona-assigned llms.

Kolomeets, M., Tushkanova, O., Desnitsky, V., Vitkova, L., and Chechulin, A. (2024). Experimental evaluation: Can humans recognise social media bots? Big Data and Cognitive Computing, 8(3). [link] DOI: 10.3390/bdcc8030024

Liang, P. P., Wu, C., Morency, L.-P., and Salakhutdinov, R. (2021). Towards understanding and mitigating social biases in language models.

Liu, Y., Zhang, W., Chen, Y., Zhang, Y., Bai, H., Feng, F., Cui, H., Li, Y., and Che, W. (2023). Conversational recommender system and large language model are made for each other in E-commerce pre-sales dialogue. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9587–9605, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-emnlp.643

Lucas, J., Uchendu, A., Yamashita, M., Lee, J., Rohatgi, S., and Lee, D. (2023). Fighting fire with fire: The dual role of LLMs in crafting and detecting elusive disinformation. In Bouamor, H., Pino, J., and Bali, K., editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14279–14305, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.emnlp-main.883

Nadeem, M., Bethke, A., and Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. In Zong, C., Xia, F., Li, W., and Navigli, R., editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.acl-long.416

Nangia, N., Vania, C., Bhalerao, R., and Bowman, S. R. (2020). CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.acl-long.416

Odbal, Zhang, G., and Ananiadou, S. (2022). Examining and mitigating gender bias in text emotion detection task. Neurocomputing, 493:422–434. [link] DOI: 10.1016/j.neucom.2022.04.057

OpenAI (2024). Gpt-3.5 turbo. [link].

Orabi, M., Mouheb, D., Al Aghbari, Z., and Kamel, I. (2020). Detection of bots in social media: A systematic review. Information Processing Management, 57(4):102250. [link] DOI: 10.1016/j.ipm.2020.102250

Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., and Bowman, S. (2022). BBQ: A hand-built bias benchmark for question answering. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105, Dublin, Ireland. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.findings-acl.165

Pires, R., Abonizio, H., Almeida, T. S., and Nogueira, R. (2023). Sabiá: Portuguese Large Language Models, page 226–240. Springer Nature Switzerland. DOI: 10.1007/978-3-031-45392-2_15

Prates, M. O. R., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl., 32(10):6363–6381. DOI: 10.1007/s00521-019-04144-6

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language models are unsupervised multitask learners. [link]

Rodrigues, G., Albuquerque, D., and Chagas, J. (2023). Análise de vieses ideológicos em produções textuais do assistente de bate-papo chatgpt. In Anais do IV Workshop sobre as Implicações da Computação na Sociedade, pages 148–155, Porto Alegre, RS, Brasil. SBC. [link] DOI: 10.5753/wics.2023.230807

Roy, K., Goyal, P., and Pandey, M. (2021). Attribute value generation from product title using language models. In Malmasi, S., Kallumadi, S., Ueffing, N., Rokhlenko, O., Agichtein, E., and Guy, I., editors, Proceedings of the 4th Workshop on e-Commerce and NLP, pages 13–17, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.ecnlp-1.2

Rudinger, R., Naradowsky, J., Leonard, B., and Van Durme, B. (2018). Gender bias in coreference resolution. In Walker, M., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orleans, Louisiana. Association for Computational Linguistics. [link] DOI: 10.18653/v1/N18-2002

Santana, B. S., Woloszyn, V., and Wives, L. K. (2018). Is there gender bias and stereotype in portuguese word embeddings? [link]

Sheng, E., Arnold, J., Yu, Z., Chang, K.-W., and Peng, N. (2021). Revealing persona biases in dialogue systems. [link]

Sheng, E., Chang, K.-W., Natarajan, P., and Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics. [link] DOI: 10.18653/v1/D19-1339

Shin, J., Song, H., Lee, H., Jeong, S., and Park, J. C. (2024). Ask llms directly, ””what shapes your bias?””: Measuring social bias in large language models.

Stanovsky, G., Smith, N. A., and Zettlemoyer, L. (2019). Evaluating gender bias in machine translation. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics. [link] DOI: 10.18653/v1/P19-1164

Taso, F., Reis, V., and Martinez, F. (2023). Sexismo no brasil: análise de um word embedding por meio de testes baseados em associação implícita. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 53–62, Porto Alegre, RS, Brasil. SBC. [link] DOI: 10.5753/stil.2023.233845

Wang, H., Wang, R., Mi, F., Deng, Y., Wang, Z., Liang, B., Xu, R., and Wong, K.-F. (2023). Cue-CoT: Chain-of-thought prompting for responding to in-depth dialogue questions with LLMs. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12047–12064, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-emnlp.806

Zhang, Q., Naradowsky, J., and Miyao, Y. (2023). Ask an expert: Leveraging language models to improve strategic reasoning in goal-oriented dialogue models. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 6665–6694, Toronto, Canada. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-acl.417

Zhao, J., Wang, T., Yatskar, M., Ordonez, V., and Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. In Walker, M., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics. [link] DOI: 10.18653/v1/N18-2003

Zhou, J., Liu, B., Acharya, J., Hong, Y., Lee, K.-C., and Wen, M. (2023). Leveraging large language models for enhanced product descriptions in eCommerce. In Gehrmann, S., Wang, A., Sedoc, J., Clark, E., Dhole, K., Chandu, K. R., Santus, E., and Sedghamiz, H., editors, Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 88–96, Singapore. Association for Computational Linguistics. [link]