Sexism in Brazil: analysis of a Word Embedding through tests based on implicit association
Abstract
This work reports experiments based on the Psychology Implicit Association Test to identify and quantify biases in a Word Embeding (WE) of the Portuguese language. For this, we use a GloVe model trained on an Internet corpus collection. The results show that several common sense and gender stereotypes can be found in WE. Within the context of professions, we note a historical sexism, since the identified bias often reflects the statistics of gender performance in occupation groups in Brazil. The results show discrimination similar to those of international studies and allow discussing the impact of the use of language models in our society.
Keywords:
Artificial Intelligence, Natural Language Processing, Word Embedding, Implicit Association, Gender bias, Discrimination analysis
References
Blodgett, S. L., Barocas, S., Daumé III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. arXiv preprint arXiv:2005.14050. https://doi.org/10.48550/arXiv.2005.14050
Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. (2022). Gender bias in word embeddings: A comprehensive analysis of frequency, syntax, and semantics. In Proc. of AAAI/ACM AIES, pages 156–170. https://doi.org/10.1145/3514094.3534162
Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186. https://doi.org/10.1126/science.aal4230
Chen, X., Li, M., Yan, R., Gao, X., and Zhang, X. (2022). Unsupervised mitigating gender bias by character components: A case study of Chinese word embedding. In Proc. of GeBNLP, pages 121–128. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.14
Ethayarajh, K., Duvenaud, D., and Hirst, G. (2019). Understanding undesirable word embedding associations. arXiv preprint arXiv:1908.06361. http://dx.doi.org/10.18653/v1/P19-1166
Fortuna, P., da Silva, J. R., Wanner, L., Nunes, S., et al. (2019). A hierarchically-labeled portuguese hate speech dataset. In Proc. of AWL, pages 94–104. http://dx.doi.org/10.18653/v1/W19-3510
Gamboa, L. C. and Justina Estuar, M. R. (2023). Evaluating gender bias in pre-trained filipino fasttext embeddings. In Procc. of ITIKD, pages 1–7. https://doi.org/10.1109/ITIKD56332.2023.10100022
Garcia, K. and Berton, L. (2021). Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl Soft Comput, 101:107057. https://doi.org/10.1016/j.asoc.2020.107057
Gonen, H. and Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862. https://doi.org/10.48550/arXiv.1903.03862
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893. https://doi.org/10.48550/arXiv.1802.06893
Greenwald, A. G., McGhee, D. E., and Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test. J Pers Soc Psychol, 74(6):1464–80. https://doi.org/10.1037/0022-3514.74.6.1464
Hansal, O., Le, N. T., and Sadat, F. (2022). Indigenous language revitalization and the dilemma of gender bias. In Proc. of GeBNLP, pages 244–254. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.25
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025. https://doi.org/10.48550/arXiv.1708.06025
Jiang, T., Li, Y., Fu, S., and Chen, Y. (2023). Creating a Chinese gender lexicon for detecting gendered wording in job advertisements. Inform Process Manag, 60(5):103424. https://doi.org/10.1016/j.ipm.2023.103424
Kiefer, A. K. and Sekaquaptewa, D. (2007). Implicit stereotypes and women’s math performance: How implicit gender-math stereotypes influence women’s susceptibility to stereotype threat. J Exp Soc Psychol, 43(5):825–832. https://doi.org/10.1016/j.jesp.2006.08.004
Li, J., Zhu, S., Liu, Y., and Liu, P. (2022). Analysis of gender bias in social perception and judgement using Chinese word embeddings. In Proc. of GeBNLP, pages 8–16. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.2
Nosek BA, Banaji MR, G. A. (2002). Math = male, me = female, therefore math not = me. J Pers Soc Psychol, 83(1):44–59. http://dx.doi.org/10.1037//0022-3514.83.1.44
Prates, M. O., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: A case study with Google translate. Neural Comput Appl, 32(10):6363–6381. https://doi.org/10.1007/s00521-019-04144-6
Qin, C., Zhang, X., Zhou, C., and Liu, Y. (2023). An interactive method for measuring gender bias and evaluating bias in Chinese word embeddings. In Imane, H., editor, Proc. of CVAA, volume 12613, page 126130U. https://doi.org/10.1117/12.2673321
Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Syst Appl, 146:113199. https://doi.org/10.1016/j.eswa.2020.113199
Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., Mirza, D., Belding, E., Chang, K.-W., and Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976. http://dx.doi.org/10.18653/v1/P19-1159
Suresh, H. and Guttag, J. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Proc. of EAAMO, volume 17, pages 1–9. https://doi.org/10.1145/3465416.3483305
Taso, F. T., Reis, V. Q., and Martinez, F. V. (2023). Discriminação algorítmica de gênero: Estudo de caso e análise no contexto brasileiro. In Anais do WICS, pages 13–25. SBC. https://doi.org/10.5753/wics.2023.229980
Tatman, R. (2017). Gender and dialect bias in YouTube’s automatic captions. In Proc. of EthNLP, pages 53–59. ACL. http://dx.doi.org/10.18653/v1/W17-1606
Torres Berrú, Y., Batista, V., and Zhingre, L. (2023). A data mining approach to detecting bias and favoritism in public procurement. Intell Autom Soft Co, 36(3):3501–3516. https://doi.org/10.32604/iasc.2023.035367
Wagner, J. and Zarrieß, S. (2022). Do gender neutral affixes naturally reduce gender bias in static word embeddings? In Proc. of KONVENS, pages 88–97.
Wairagala, E. P., Mukiibi, J., Tusubira, J. F., Babirye, C., Nakatumba-Nabende, J., Katumba, A., and Ssenkungu, I. (2022). Gender bias evaluation in Luganda-English machine translation. In Proc. of AMTA, pages 274–286. AMTA.
Zhang, H., Sneyd, A., and Stevenson, M. (2020). Robustness and reliability of gender bias assessment in word embeddings: The role of base pairs. arXiv preprint arXiv:2010.02847. https://doi.org/10.48550/arXiv.2010.02847
Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. (2022). Gender bias in word embeddings: A comprehensive analysis of frequency, syntax, and semantics. In Proc. of AAAI/ACM AIES, pages 156–170. https://doi.org/10.1145/3514094.3534162
Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186. https://doi.org/10.1126/science.aal4230
Chen, X., Li, M., Yan, R., Gao, X., and Zhang, X. (2022). Unsupervised mitigating gender bias by character components: A case study of Chinese word embedding. In Proc. of GeBNLP, pages 121–128. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.14
Ethayarajh, K., Duvenaud, D., and Hirst, G. (2019). Understanding undesirable word embedding associations. arXiv preprint arXiv:1908.06361. http://dx.doi.org/10.18653/v1/P19-1166
Fortuna, P., da Silva, J. R., Wanner, L., Nunes, S., et al. (2019). A hierarchically-labeled portuguese hate speech dataset. In Proc. of AWL, pages 94–104. http://dx.doi.org/10.18653/v1/W19-3510
Gamboa, L. C. and Justina Estuar, M. R. (2023). Evaluating gender bias in pre-trained filipino fasttext embeddings. In Procc. of ITIKD, pages 1–7. https://doi.org/10.1109/ITIKD56332.2023.10100022
Garcia, K. and Berton, L. (2021). Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl Soft Comput, 101:107057. https://doi.org/10.1016/j.asoc.2020.107057
Gonen, H. and Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862. https://doi.org/10.48550/arXiv.1903.03862
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893. https://doi.org/10.48550/arXiv.1802.06893
Greenwald, A. G., McGhee, D. E., and Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test. J Pers Soc Psychol, 74(6):1464–80. https://doi.org/10.1037/0022-3514.74.6.1464
Hansal, O., Le, N. T., and Sadat, F. (2022). Indigenous language revitalization and the dilemma of gender bias. In Proc. of GeBNLP, pages 244–254. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.25
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025. https://doi.org/10.48550/arXiv.1708.06025
Jiang, T., Li, Y., Fu, S., and Chen, Y. (2023). Creating a Chinese gender lexicon for detecting gendered wording in job advertisements. Inform Process Manag, 60(5):103424. https://doi.org/10.1016/j.ipm.2023.103424
Kiefer, A. K. and Sekaquaptewa, D. (2007). Implicit stereotypes and women’s math performance: How implicit gender-math stereotypes influence women’s susceptibility to stereotype threat. J Exp Soc Psychol, 43(5):825–832. https://doi.org/10.1016/j.jesp.2006.08.004
Li, J., Zhu, S., Liu, Y., and Liu, P. (2022). Analysis of gender bias in social perception and judgement using Chinese word embeddings. In Proc. of GeBNLP, pages 8–16. ACL. http://dx.doi.org/10.18653/v1/2022.gebnlp-1.2
Nosek BA, Banaji MR, G. A. (2002). Math = male, me = female, therefore math not = me. J Pers Soc Psychol, 83(1):44–59. http://dx.doi.org/10.1037//0022-3514.83.1.44
Prates, M. O., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: A case study with Google translate. Neural Comput Appl, 32(10):6363–6381. https://doi.org/10.1007/s00521-019-04144-6
Qin, C., Zhang, X., Zhou, C., and Liu, Y. (2023). An interactive method for measuring gender bias and evaluating bias in Chinese word embeddings. In Imane, H., editor, Proc. of CVAA, volume 12613, page 126130U. https://doi.org/10.1117/12.2673321
Silva, R. M., Santos, R. L., Almeida, T. A., and Pardo, T. A. (2020). Towards automatically filtering fake news in portuguese. Expert Syst Appl, 146:113199. https://doi.org/10.1016/j.eswa.2020.113199
Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., Mirza, D., Belding, E., Chang, K.-W., and Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976. http://dx.doi.org/10.18653/v1/P19-1159
Suresh, H. and Guttag, J. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Proc. of EAAMO, volume 17, pages 1–9. https://doi.org/10.1145/3465416.3483305
Taso, F. T., Reis, V. Q., and Martinez, F. V. (2023). Discriminação algorítmica de gênero: Estudo de caso e análise no contexto brasileiro. In Anais do WICS, pages 13–25. SBC. https://doi.org/10.5753/wics.2023.229980
Tatman, R. (2017). Gender and dialect bias in YouTube’s automatic captions. In Proc. of EthNLP, pages 53–59. ACL. http://dx.doi.org/10.18653/v1/W17-1606
Torres Berrú, Y., Batista, V., and Zhingre, L. (2023). A data mining approach to detecting bias and favoritism in public procurement. Intell Autom Soft Co, 36(3):3501–3516. https://doi.org/10.32604/iasc.2023.035367
Wagner, J. and Zarrieß, S. (2022). Do gender neutral affixes naturally reduce gender bias in static word embeddings? In Proc. of KONVENS, pages 88–97.
Wairagala, E. P., Mukiibi, J., Tusubira, J. F., Babirye, C., Nakatumba-Nabende, J., Katumba, A., and Ssenkungu, I. (2022). Gender bias evaluation in Luganda-English machine translation. In Proc. of AMTA, pages 274–286. AMTA.
Zhang, H., Sneyd, A., and Stevenson, M. (2020). Robustness and reliability of gender bias assessment in word embeddings: The role of base pairs. arXiv preprint arXiv:2010.02847. https://doi.org/10.48550/arXiv.2010.02847
Published
2023-09-25
How to Cite
TASO, Fernanda Tiemi de S.; REIS, Valéria Q.; MARTINEZ, Fábio V..
Sexism in Brazil: analysis of a Word Embedding through tests based on implicit association. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 14. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 53-62.
DOI: https://doi.org/10.5753/stil.2023.233845.
