Eu, Tu, Ele, Ela, Elu, Nós, Vós, Eles, Elas, Elus por um Modelo de Linguagem Neutra
Abstract
The neutral language is at the center of discussions surrounding inclusion and the fight against gender bias. Based on gender neutralization, it can involve the addition of new gender-neutral elements to a language or the prioritization of writing in a neutral syntax. Both approaches are automatically processable and can be incorporated within the scope of natural language processing. This article presents an initiative to optimize a language model focused on translating sentences from traditional Portuguese into neutral language, considering the new gender-neutral elements. For this, a bilingual corpus was constructed, encompassing manually translated paragraphs from news articles, words and phrases from an official guide on neutral language, as well as automatically generated sentences. The results obtained with the optimized language models demonstrate the feasibility of generating inclusive language models.
References
Cassiano, O. (2023). Guia para “linguagem neutra” (PT-BR). Online. Acessado em 01/07/2023.
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2023). A survey on evaluation of large language models.
Chen, B. and Cherry, C. (2014). A systematic comparison of smoothing techniques for sentence-level BLEU. In Proc. of the 9th Workshop on Stat. Mach. Transl., pages 362–367, Baltimore, Maryland, USA. ACL.
Cho, W. I., Kim, J. W., Kim, S. M., and Kim, N. S. (2019). On measuring gender bias in translation of gender-neutral pronouns. In Proc. of the 1st Workshop on Gender Bias in Nat. Lang. Process., pages 173–181, Florence, Italy. ACL.
Han, J. and Kamber, M. (2006). Data Mining. Concepts and Techniques. Morgan Kaufmann, 2nd ed. edition.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).
Mäder, G. R. C. (2015). Masculino genérico e sexismo gramatical. Master’s thesis, Universidade Federal de Santa Catarina.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proc. of the 40th Annual Meeting on ACL, pages 311–318. ACL.
Piergentili, A., Fucci, D., Savoldi, B., Bentivogli, L., and Negri, M. (2023). From inclusive language to gender-neutral machine translation. arXiv preprint arXiv:2301.10075.
Prates, M. O., Avelar, P. H., and Lamb, L. C. (2020). Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl., 32:6363–6381.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).
Saunders, D., Sallis, R., and Byrne, B. (2020). Neural machine translation doesn’t translate gender coreference right unless you make it. In Proc. of the 2nd Workshop on Gender Bias in Nat. Lang. Process., pages 35–43, Barcelona, Spain (Online). ACL.
Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., and Turchi, M. (2021). Gender bias in machine translation. Trans. of ACL, 9:845–874.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Inf. Process. Syst., volume 30. Curran Associates, Inc.
Wagner Filho, J. A., Wilkens, R., Idiart, M., and Villavicencio, A. (2018). The brWaC corpus: A new open resource for Brazilian Portuguese. In Proc. of the 11th Int. Conf. on Lang. Resour. Eval. (LREC 2018), Miyazaki, Japan. ELRA.
