Algorithmic Gender Discrimination: Case Study and Analysis in the Brazilian Context
Abstract
This paper aims to identify discriminatory trends in Natural Language Processing models that represent words through vectors called Word Embeddings. Pre-defined metrics for identifying bias were adapted and exposed the existence of gender stereotypes in traditional occupations and their correlation with the women’s proportion in the national labor market. Moreover, stereotyped analogies between feminine and masculine pronouns were found. Results reveal sexism similar to other studies and allow us to discuss the impact of the use of language models in our society. Finally, the work paves the way for the use of metrics to identify other types of discrimination in the Brazilian context.
References
Blodgett, S. L., Barocas, S., Daumé III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. arXiv preprint arXiv:2005.14050.
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., and Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Proc. of NIPS, pages 4349–4357.
Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. (2022). Gender bias in word embeddings: A comprehensive analysis of frequency, syntax, and semantics. In Proc. of AAAI/ACM AEIS, pages 156–170.
Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
Falcão, C. (2021). Lentes racistas: Rui Costa está transformando a Bahia em um laboratório de vigilância com reconhecimento facial. https://interc.pt/3nKVrw9. [Online: acesso em 20-09-2021].
Garcia, K. and Berton, L. (2021). Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl Soft Comput, 101:107057.
Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. PNAS, 115(16):E3635–E3644.
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893.
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.
Noble, S. and Damorim, F. (2022). Algoritmos da Opressão: Como os mecanismos de busca reforçam o racismo. Editora Rua do Sabão.
Santana, B. S., Woloszyn, V., and Wives, L. K. (2018). Is there gender bias and stereotype in portuguese word embeddings? arXiv preprint arXiv:1810.04528.
Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Syst Appl, 146:113199.
Silva, T. (2022). Linha do tempo do racismo algorítmico. http://bit.ly/3yFFrzw. [Online: acesso em 04-05-2022].
Sogancioglu, G., Mijsters, F., van Uden, A., and Peperzak, J. (2022). Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories. arXiv preprint arXiv:2208.01341.
Suresh, H. and Guttag, J. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Proc. of EAAMO, volume 17, pages 1–9.
Werneck, A. (2019). Reconhecimento facial falha em segundo dia, e mulher inocente é confundida com criminosa já presa. http://bit.ly/3mSoNKy. [Online: acesso em 10-11-2021].
Yee, K., Tantipongpipat, U., and Mishra, S. (2021). Image cropping on twitter: Fairness metrics, their limitations, and the importance of representation, design, and agency. Proc. of HCI, 5:1–24.
Zhang, H., Sneyd, A., and Stevenson, M. (2020). Robustness and reliability of gender bias assessment in word embeddings: The role of base pairs. arXiv preprint arXiv:2010.02847.
