Viés de gênero na tradução automática do GPT-3.5 turbo: avaliando o par linguístico inglês-português
Resumo
Este estudo avaliou a qualidade das traduções automáticas geradas pelo GPT-3.5 turbo. Traduzimos para o português o Challenge Test Set WinoMT, que avalia a capacidade de modelos de tradução automática em traduzir o gênero gramatical de substantivos relacionados a profissões. Adaptamos o código de avaliação automática desenvolvido por Stanovsky et al. (2019) para avaliar as traduções resultantes. Os resultados indicam que o GPT-3.5 turbo tende a promover viés de gênero na tradução de profissões.
Palavras-chave:
Métodos de avaliação de tarefas de PLN, Tradução Automática, Modelos de linguagem grandes
Referências
Bender, E. M., Gebru, T., McMillan-Major, A., Shmitchell, S. (2021) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. p. 610-623. https://doi.org/10.1145/3442188.3445922
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. https://dl.acm.org/doi/abs/10.5555/3495724.3495883
Caseli, H. de M. (2017) Tradução Automática: estratégias e limitações. Domínios de Lingu@gem, v. 11, n. 5, p. 1782-1796. https://doi.org/10.14393/DL32-v11n5a2017-21 [link].
Castilho, S., Mallon, C., Meister, R., Yue, S. (2023) Do online machine translation systems care for context? What about a GPT model? In: 24th Annual Conference of the European Association for Machine Translation (EAMT 2023), 12-15 June 2023, Tampere, Finland. (In Press) https://doras.dcu.ie/28297/
Cohen, J. A. (1960) Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, v. 20, n. 1, p. 37–46. https://doi.org/10.1177/001316446002000104
Devinney, H., Björklund, J., Björklund, H. (2022) Theories of “Gender” in NLP Bias Research. arXiv:2205.02526 [cs].
Halliday, M. K. (1978) Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold.
Jakobson, R. (1959) On Linguistic Aspects of Translation. In: Brower, R. A. (ed.). On translation. Cambridge, USA: Harvard University Press. https://doi.org/10.4159/harvard.9780674731615.c18
Kocmi, T., Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520. https://doi.org/10.48550/arXiv.2302.14520 https://arxiv.org/abs/2302.14520
Levesque, H. J. (2011) The Winograd schema challenge. In: AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
Lewis, M., Lupyan, G. (2020) Gender stereotypes are reflected in the distributional structure of 25 languages. Nature Human Behaviour, v. 4, n. 10, p. 1021-1028. https://doi.org/10.1038/s41562-020-0918-6 https://www.nature.com/articles/s41562-020-0918-6
Popović, M., Castilho, S. (2019). Challenge Test Sets for MT Evaluation. In Proceedings of Machine Translation Summit XVII: Tutorial Abstracts, Dublin, Ireland. European Association for Machine Translation. https://aclanthology.org/W19-7602
Rudinger, R., Naradowsky, J., Leonard, B., Van Durme, B. (2018) Gender Bias in Coreference Resolution. arXiv:1804.09301 [cs]. https://doi.org/10.18653/v1/N18-2002 https://aclanthology.org/N18-2002
Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., Turchi, M. (2021) Gender Bias in Machine Translation. Transactions of the Association for Computational Linguistics, v. 9, p. 845–874. https://doi.org/10.1162/tacl_a_00401
Stanovsky, G., Smith, N., Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. https://doi.org/10.18653/v1/P19-1164 https://aclanthology.org/P19-1164
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K. (2018) Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Proceedings [...], volume 2 (Short Papers). https://doi.org/10.18653/v1/N18-2003 https://aclanthology.org/N18-2003
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. https://dl.acm.org/doi/abs/10.5555/3495724.3495883
Caseli, H. de M. (2017) Tradução Automática: estratégias e limitações. Domínios de Lingu@gem, v. 11, n. 5, p. 1782-1796. https://doi.org/10.14393/DL32-v11n5a2017-21 [link].
Castilho, S., Mallon, C., Meister, R., Yue, S. (2023) Do online machine translation systems care for context? What about a GPT model? In: 24th Annual Conference of the European Association for Machine Translation (EAMT 2023), 12-15 June 2023, Tampere, Finland. (In Press) https://doras.dcu.ie/28297/
Cohen, J. A. (1960) Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, v. 20, n. 1, p. 37–46. https://doi.org/10.1177/001316446002000104
Devinney, H., Björklund, J., Björklund, H. (2022) Theories of “Gender” in NLP Bias Research. arXiv:2205.02526 [cs].
Halliday, M. K. (1978) Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold.
Jakobson, R. (1959) On Linguistic Aspects of Translation. In: Brower, R. A. (ed.). On translation. Cambridge, USA: Harvard University Press. https://doi.org/10.4159/harvard.9780674731615.c18
Kocmi, T., Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520. https://doi.org/10.48550/arXiv.2302.14520 https://arxiv.org/abs/2302.14520
Levesque, H. J. (2011) The Winograd schema challenge. In: AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
Lewis, M., Lupyan, G. (2020) Gender stereotypes are reflected in the distributional structure of 25 languages. Nature Human Behaviour, v. 4, n. 10, p. 1021-1028. https://doi.org/10.1038/s41562-020-0918-6 https://www.nature.com/articles/s41562-020-0918-6
Popović, M., Castilho, S. (2019). Challenge Test Sets for MT Evaluation. In Proceedings of Machine Translation Summit XVII: Tutorial Abstracts, Dublin, Ireland. European Association for Machine Translation. https://aclanthology.org/W19-7602
Rudinger, R., Naradowsky, J., Leonard, B., Van Durme, B. (2018) Gender Bias in Coreference Resolution. arXiv:1804.09301 [cs]. https://doi.org/10.18653/v1/N18-2002 https://aclanthology.org/N18-2002
Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., Turchi, M. (2021) Gender Bias in Machine Translation. Transactions of the Association for Computational Linguistics, v. 9, p. 845–874. https://doi.org/10.1162/tacl_a_00401
Stanovsky, G., Smith, N., Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. https://doi.org/10.18653/v1/P19-1164 https://aclanthology.org/P19-1164
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K. (2018) Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Proceedings [...], volume 2 (Short Papers). https://doi.org/10.18653/v1/N18-2003 https://aclanthology.org/N18-2003
Publicado
25/09/2023
Como Citar
SOARES, Tayane Arantes; GUMIEL, Yohan Bonescki; JUNQUEIRA, Rafael; GOMES, Tácio; PAGANO, Adriana.
Viés de gênero na tradução automática do GPT-3.5 turbo: avaliando o par linguístico inglês-português. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 14. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 167-176.
DOI: https://doi.org/10.5753/stil.2023.234186.