Biases in GPT-3.5 Turbo model: a case study regarding gender and language
Resumo
Interactions with Generative Language Models like OpenAI’s GPT3.5 Turbo are increasingly common in everyday life, making it essential to examine their potential biases. This study assesses biases in the GPT-3.5 Turbo model using the regard metric, which evaluates the level of respect or esteem expressed towards different demographic groups. Specifically, we investigate how the model perceives regard towards different genders (male, female, and neutral) in both English and Portuguese. To achieve this, we isolated three variables (gender, language, and moderation filters) and analyzed their individual impacts on the model’s outputs. Our results indicate a slight positive bias towards feminine over masculine and neutral genders, a more favorable bias towards English compared to Portuguese, and consistently more negative outputs when we attempted to reduce the moderation filters.
Referências
Busker, T., Choenni, S., and Shoae Bargh, M. (2023). Stereotypes in chatgpt: an empirical study. In Proceedings of the 16th International Conference on Theory and Practice of Electronic Governance, ICEGOV ’23, page 24–32, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3614321.3614325
Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186. DOI: 10.1126/science.aal4230
Cambridge Dictionary (2024). Regard.
Das, A., Selek, S., Warner, A. R., Zuo, X., Hu, Y., Kuttichi Keloth, V., Li, J., Zheng, W. J., and Xu, H. (2022). Conversational bots for psychotherapy: A study of generative transformer models using domain-specific dialogues. In Demner-Fushman, D., Cohen, K. B., Ananiadou, S., and Tsujii, J., editors, Proceedings of the 21st Workshop on Biomedical Language Processing, pages 285–297, Dublin, Ireland. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.bionlp-1.27
Deshpande, A., Murahari, V., Rajpurohit, T., Kalyan, A., and Narasimhan, K. (2023). Toxicity in chatgpt: Analyzing persona-assigned language models. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1236–1270, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-emnlp.88
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. [link] DOI: 10.18653/v1/N19-1423
Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Cohn, T., He, Y., and Liu, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2020.findings-emnlp.301
Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., and Khot, T. (2024). Bias runs deep: Implicit reasoning biases in persona-assigned llms.
Kolomeets, M., Tushkanova, O., Desnitsky, V., Vitkova, L., and Chechulin, A. (2024). Experimental evaluation: Can humans recognise social media bots? Big Data and Cognitive Computing, 8(3). [link] DOI: 10.3390/bdcc8030024
Liang, P. P., Wu, C., Morency, L.-P., and Salakhutdinov, R. (2021). Towards understanding and mitigating social biases in language models.
Liu, Y., Zhang, W., Chen, Y., Zhang, Y., Bai, H., Feng, F., Cui, H., Li, Y., and Che, W. (2023). Conversational recommender system and large language model are made for each other in E-commerce pre-sales dialogue. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9587–9605, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-emnlp.643
Lucas, J., Uchendu, A., Yamashita, M., Lee, J., Rohatgi, S., and Lee, D. (2023). Fighting fire with fire: The dual role of LLMs in crafting and detecting elusive disinformation. In Bouamor, H., Pino, J., and Bali, K., editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14279–14305, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.emnlp-main.883
Nadeem, M., Bethke, A., and Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. In Zong, C., Xia, F., Li, W., and Navigli, R., editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.acl-long.416
Nangia, N., Vania, C., Bhalerao, R., and Bowman, S. R. (2020). CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.acl-long.416
Odbal, Zhang, G., and Ananiadou, S. (2022). Examining and mitigating gender bias in text emotion detection task. Neurocomputing, 493:422–434. [link] DOI: 10.1016/j.neucom.2022.04.057
OpenAI (2024). Gpt-3.5 turbo. [link].
Orabi, M., Mouheb, D., Al Aghbari, Z., and Kamel, I. (2020). Detection of bots in social media: A systematic review. Information Processing Management, 57(4):102250. [link] DOI: 10.1016/j.ipm.2020.102250
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., and Bowman, S. (2022). BBQ: A hand-built bias benchmark for question answering. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105, Dublin, Ireland. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.findings-acl.165
Pires, R., Abonizio, H., Almeida, T. S., and Nogueira, R. (2023). Sabiá: Portuguese Large Language Models, page 226–240. Springer Nature Switzerland. DOI: 10.1007/978-3-031-45392-2_15
Prates, M. O. R., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl., 32(10):6363–6381. DOI: 10.1007/s00521-019-04144-6
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language models are unsupervised multitask learners. [link]
Rodrigues, G., Albuquerque, D., and Chagas, J. (2023). Análise de vieses ideológicos em produções textuais do assistente de bate-papo chatgpt. In Anais do IV Workshop sobre as Implicações da Computação na Sociedade, pages 148–155, Porto Alegre, RS, Brasil. SBC. [link] DOI: 10.5753/wics.2023.230807
Roy, K., Goyal, P., and Pandey, M. (2021). Attribute value generation from product title using language models. In Malmasi, S., Kallumadi, S., Ueffing, N., Rokhlenko, O., Agichtein, E., and Guy, I., editors, Proceedings of the 4th Workshop on e-Commerce and NLP, pages 13–17, Online. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2021.ecnlp-1.2
Rudinger, R., Naradowsky, J., Leonard, B., and Van Durme, B. (2018). Gender bias in coreference resolution. In Walker, M., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orleans, Louisiana. Association for Computational Linguistics. [link] DOI: 10.18653/v1/N18-2002
Santana, B. S., Woloszyn, V., and Wives, L. K. (2018). Is there gender bias and stereotype in portuguese word embeddings? [link]
Sheng, E., Arnold, J., Yu, Z., Chang, K.-W., and Peng, N. (2021). Revealing persona biases in dialogue systems. [link]
Sheng, E., Chang, K.-W., Natarajan, P., and Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics. [link] DOI: 10.18653/v1/D19-1339
Shin, J., Song, H., Lee, H., Jeong, S., and Park, J. C. (2024). Ask llms directly, ””what shapes your bias?””: Measuring social bias in large language models.
Stanovsky, G., Smith, N. A., and Zettlemoyer, L. (2019). Evaluating gender bias in machine translation. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics. [link] DOI: 10.18653/v1/P19-1164
Taso, F., Reis, V., and Martinez, F. (2023). Sexismo no brasil: análise de um word embedding por meio de testes baseados em associação implícita. In Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 53–62, Porto Alegre, RS, Brasil. SBC. [link] DOI: 10.5753/stil.2023.233845
Wang, H., Wang, R., Mi, F., Deng, Y., Wang, Z., Liang, B., Xu, R., and Wong, K.-F. (2023). Cue-CoT: Chain-of-thought prompting for responding to in-depth dialogue questions with LLMs. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12047–12064, Singapore. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-emnlp.806
Zhang, Q., Naradowsky, J., and Miyao, Y. (2023). Ask an expert: Leveraging language models to improve strategic reasoning in goal-oriented dialogue models. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 6665–6694, Toronto, Canada. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2023.findings-acl.417
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., and Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. In Walker, M., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics. [link] DOI: 10.18653/v1/N18-2003
Zhou, J., Liu, B., Acharya, J., Hong, Y., Lee, K.-C., and Wen, M. (2023). Leveraging large language models for enhanced product descriptions in eCommerce. In Gehrmann, S., Wang, A., Sedoc, J., Clark, E., Dhole, K., Chandu, K. R., Santus, E., and Sedghamiz, H., editors, Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 88–96, Singapore. Association for Computational Linguistics. [link]