Eu, Tu, Ele, Ela, Elu, Nós, Vós, Eles, Elas, Elus por um Modelo de Linguagem Neutra

Washington Roberto Lopes; Be Zilberman; Bruna Magrini da Cruz; Leticia dos Santos Monte Cruz; Rafaella Alves Lucena Gomes; Renata Wassermann; Sarajane Marques Peres; Valdinei Freire

doi:10.5753/eniac.2023.234546

Washington Roberto Lopes University of São Paulo
Be Zilberman University of São Paulo
Bruna Magrini da Cruz University of São Paulo
Leticia dos Santos Monte Cruz University of São Paulo
Rafaella Alves Lucena Gomes University of São Paulo
Renata Wassermann University of São Paulo
Sarajane Marques Peres University of São Paulo
Valdinei Freire University of São Paulo

DOI: https://doi.org/10.5753/eniac.2023.234546

Abstract

The neutral language is at the center of discussions surrounding inclusion and the fight against gender bias. Based on gender neutralization, it can involve the addition of new gender-neutral elements to a language or the prioritization of writing in a neutral syntax. Both approaches are automatically processable and can be incorporated within the scope of natural language processing. This article presents an initiative to optimize a language model focused on translating sentences from traditional Portuguese into neutral language, considering the new gender-neutral elements. For this, a bilingual corpus was constructed, encompassing manually translated paragraphs from news articles, words and phrases from an official guide on neutral language, as well as automatically generated sentences. The results obtained with the optimized language models demonstrate the feasibility of generating inclusive language models.

Keywords: Neutral Language, Language Models, Transformers, Natural Language Processing

References

Carmo, D., Piau, M., Campiotti, I., Nogueira, R., and Lotufo, R. (2020). PTT5: Pre-training and validating the T5 model on brazilian portuguese data. arXiv preprint arXiv:2008.09144.

Cassiano, O. (2023). Guia para “linguagem neutra” (PT-BR). Online. Acessado em 01/07/2023.

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2023). A survey on evaluation of large language models.

Chen, B. and Cherry, C. (2014). A systematic comparison of smoothing techniques for sentence-level BLEU. In Proc. of the 9th Workshop on Stat. Mach. Transl., pages 362–367, Baltimore, Maryland, USA. ACL.

Cho, W. I., Kim, J. W., Kim, S. M., and Kim, N. S. (2019). On measuring gender bias in translation of gender-neutral pronouns. In Proc. of the 1st Workshop on Gender Bias in Nat. Lang. Process., pages 173–181, Florence, Italy. ACL.

Han, J. and Kamber, M. (2006). Data Mining. Concepts and Techniques. Morgan Kaufmann, 2nd ed. edition.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).

Mäder, G. R. C. (2015). Masculino genérico e sexismo gramatical. Master’s thesis, Universidade Federal de Santa Catarina.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proc. of the 40th Annual Meeting on ACL, pages 311–318. ACL.

Piergentili, A., Fucci, D., Savoldi, B., Bentivogli, L., and Negri, M. (2023). From inclusive language to gender-neutral machine translation. arXiv preprint arXiv:2301.10075.

Prates, M. O., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl., 32:6363–6381.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).

Saunders, D., Sallis, R., and Byrne, B. (2020). Neural machine translation doesn’t translate gender coreference right unless you make it. In Proc. of the 2nd Workshop on Gender Bias in Nat. Lang. Process., pages 35–43, Barcelona, Spain (Online). ACL.

Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., and Turchi, M. (2021). Gender bias in machine translation. Trans. of ACL, 9:845–874.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Inf. Process. Syst., volume 30. Curran Associates, Inc.

Wagner Filho, J. A., Wilkens, R., Idiart, M., and Villavicencio, A. (2018). The brWaC corpus: A new open resource for Brazilian Portuguese. In Proc. of the 11th Int. Conf. on Lang. Resour. Eval. (LREC 2018), Miyazaki, Japan. ELRA.