Sexism in Brazil: analysis of a Word Embedding through tests based on implicit association

Abstract


This work reports experiments based on the Psychology Implicit Association Test to identify and quantify biases in a Word Embeding (WE) of the Portuguese language. For this, we use a GloVe model trained on an Internet corpus collection. The results show that several common sense and gender stereotypes can be found in WE. Within the context of professions, we note a historical sexism, since the identified bias often reflects the statistics of gender performance in occupation groups in Brazil. The results show discrimination similar to those of international studies and allow discussing the impact of the use of language models in our society.
Keywords: Artificial Intelligence, Natural Language Processing, Word Embedding, Implicit Association, Gender bias, Discrimination analysis

References

Blodgett, S. L., Barocas, S., Daumé III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. arXiv preprint arXiv:2005.14050. https://doi.org/10.48550/arXiv.2005.14050

Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. (2022). Gender bias in word embeddings: A comprehensive analysis of frequency, syntax, and semantics. In Proc. of AAAI/ACM AIES, pages 156–170. https://doi.org/10.1145/3514094.3534162

Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186. https://doi.org/10.1126/science.aal4230

Chen, X., Li, M., Yan, R., Gao, X., and Zhang, X. (2022). Unsupervised mitigating gender bias by character components: A case study of Chinese word embedding. In Proc. of GeBNLP, pages 121–128. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.14

Ethayarajh, K., Duvenaud, D., and Hirst, G. (2019). Understanding undesirable word embedding associations. arXiv preprint arXiv:1908.06361. http://dx.doi.org/10.18653/v1/P19-1166

Fortuna, P., da Silva, J. R., Wanner, L., Nunes, S., et al. (2019). A hierarchically-labeled portuguese hate speech dataset. In Proc. of AWL, pages 94–104. http://dx.doi.org/10.18653/v1/W19-3510

Gamboa, L. C. and Justina Estuar, M. R. (2023). Evaluating gender bias in pre-trained filipino fasttext embeddings. In Procc. of ITIKD, pages 1–7. https://doi.org/10.1109/ITIKD56332.2023.10100022

Garcia, K. and Berton, L. (2021). Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl Soft Comput, 101:107057. https://doi.org/10.1016/j.asoc.2020.107057

Gonen, H. and Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862. https://doi.org/10.48550/arXiv.1903.03862

Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893. https://doi.org/10.48550/arXiv.1802.06893

Greenwald, A. G., McGhee, D. E., and Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test. J Pers Soc Psychol, 74(6):1464–80. https://doi.org/10.1037/0022-3514.74.6.1464

Hansal, O., Le, N. T., and Sadat, F. (2022). Indigenous language revitalization and the dilemma of gender bias. In Proc. of GeBNLP, pages 244–254. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.25

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025. https://doi.org/10.48550/arXiv.1708.06025

Jiang, T., Li, Y., Fu, S., and Chen, Y. (2023). Creating a Chinese gender lexicon for detecting gendered wording in job advertisements. Inform Process Manag, 60(5):103424. https://doi.org/10.1016/j.ipm.2023.103424

Kiefer, A. K. and Sekaquaptewa, D. (2007). Implicit stereotypes and women’s math performance: How implicit gender-math stereotypes influence women’s susceptibility to stereotype threat. J Exp Soc Psychol, 43(5):825–832. https://doi.org/10.1016/j.jesp.2006.08.004

Li, J., Zhu, S., Liu, Y., and Liu, P. (2022). Analysis of gender bias in social perception and judgement using Chinese word embeddings. In Proc. of GeBNLP, pages 8–16. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.2

Nosek BA, Banaji MR, G. A. (2002). Math = male, me = female, therefore math not = me. J Pers Soc Psychol, 83(1):44–59. http://dx.doi.org/10.1037//0022-3514.83.1.44

Prates, M. O., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: A case study with Google translate. Neural Comput Appl, 32(10):6363–6381. https://doi.org/10.1007/s00521-019-04144-6

Qin, C., Zhang, X., Zhou, C., and Liu, Y. (2023). An interactive method for measuring gender bias and evaluating bias in Chinese word embeddings. In Imane, H., editor, Proc. of CVAA, volume 12613, page 126130U. https://doi.org/10.1117/12.2673321

Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Syst Appl, 146:113199. https://doi.org/10.1016/j.eswa.2020.113199

Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., Mirza, D., Belding, E., Chang, K.-W., and Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976. http://dx.doi.org/10.18653/v1/P19-1159

Suresh, H. and Guttag, J. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Proc. of EAAMO, volume 17, pages 1–9. https://doi.org/10.1145/3465416.3483305

Taso, F. T., Reis, V. Q., and Martinez, F. V. (2023). Discriminação algorítmica de gênero: Estudo de caso e análise no contexto brasileiro. In Anais do WICS, pages 13–25. SBC. https://doi.org/10.5753/wics.2023.229980

Tatman, R. (2017). Gender and dialect bias in YouTube’s automatic captions. In Proc. of EthNLP, pages 53–59. ACL. http://dx.doi.org/10.18653/v1/W17-1606

Torres Berrú, Y., Batista, V., and Zhingre, L. (2023). A data mining approach to detecting bias and favoritism in public procurement. Intell Autom Soft Co, 36(3):3501–3516. https://doi.org/10.32604/iasc.2023.035367

Wagner, J. and Zarrieß, S. (2022). Do gender neutral affixes naturally reduce gender bias in static word embeddings? In Proc. of KONVENS, pages 88–97.

Wairagala, E. P., Mukiibi, J., Tusubira, J. F., Babirye, C., Nakatumba-Nabende, J., Katumba, A., and Ssenkungu, I. (2022). Gender bias evaluation in Luganda-English machine translation. In Proc. of AMTA, pages 274–286. AMTA.

Zhang, H., Sneyd, A., and Stevenson, M. (2020). Robustness and reliability of gender bias assessment in word embeddings: The role of base pairs. arXiv preprint arXiv:2010.02847. https://doi.org/10.48550/arXiv.2010.02847
Published
2023-09-25
TASO, Fernanda Tiemi de S.; REIS, Valéria Q.; MARTINEZ, Fábio V.. Sexism in Brazil: analysis of a Word Embedding through tests based on implicit association. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 14. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 53-62. DOI: https://doi.org/10.5753/stil.2023.233845.