Explorando Redes Neurais Profundas para Tarefa de Aceitabilidade Linguística

Henrique Santos; Késia P. Alencar; Rogério F. de Sousa; Rafael T. Anchiêta

doi:10.5753/stil.2023.234644

Henrique Santos UFPI http://orcid.org/0009-0002-3701-7423
Késia P. Alencar IFPI https://orcid.org/0009-0004-9071-7888
Rogério F. de Sousa IFPI http://orcid.org/0000-0003-4589-6157
Rafael T. Anchiêta IFPI https://orcid.org/0000-0003-4209-9013

DOI: https://doi.org/10.5753/stil.2023.234644

Resumo

Aceitabilidade linguística é a tarefa de determinar se uma sentença está gramaticalmente correta. Apesar de existir algumas ferramentas de correção gramatical para o português, elas são baseadas em regras manualmente definidas, o que é uma tarefa laboriosa. Neste trabalho, investigaram-se redes neurais profundas e modelos língua para a tarefa de aceitabilidade linguística a fim de desenvolver ferramentas/métodos mais robustos para o português que obtenham resultados melhores do que as ferramentas existentes. Explorou-se redes recorrentes, redes convolucionais e os modelos de língua BERTimbau e Albertina. Esses modelos foram treinados em um corpus traduzido do inglês para o português e avaliados no corpus Probi. As redes recorrentes e convolucionais atingiram os melhores resultados (0,37 f1), sendo competitivas com a ferramenta LanguageTool (0,40 f1).

Palavras-chave: Aceitabilidade Linguística, Processamento de Língua Natural, Aprendizado Profundo

Referências

Bakshi, S., Batra, S., Heidari, P., Arun, A., Jain, S., and White, M. (2021). Structure-totext generation with self-training, acceptability classifiers and context-conditioning for the GEM shared task. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pages 136–147, Online. Association for Computational Linguistics.

Batra, S., Jain, S., Heidari, P., Arun, A., Youngs, C., Li, X., Donmez, P., Mei, S., Kuo, S., Bhardwaj, V., Kumar, A., and White, M. (2021). Building adaptive acceptability classifiers for neural NLG. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 682–697, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Choshen, L., Hacohen, G., Weinshall, D., and Abend, O. (2022). The grammar-learning trajectories of neural language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8281– 8297, Dublin, Ireland. Association for Computational Linguistics.

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Silva, J., and Aluísio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology, pages 122–131, Uberlândia, Brazil. Sociedade Brasileira de Computação.

Kinoshita, J., Salvador, L., Menezes, C., and Silva, W. (2007). Cogroo - an openoffice grammar checker. In Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007), pages 525–530, Rio de Janeiro, Brazil. IEEE.

Klezl, J., Mohammed, Y. A., and Volodina, E. (2022). Exploring linguistic acceptability in Swedish learners’ language. In Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning, pages 84–94, Louvain-la-Neuve, Belgium. LiU Electronic Press.

Martins, R. (2002). Probi: um corpus de teste para o revisor gramatical regra. Technical report, ICMC-USP.

Nunes, M. d. G. V. and Jr., O. (2000). O processo de desenvolvimento do revisor gramatical regra. In XXVII Seminário Integrado de Software e Hardware, pages 1–15, Curitiba, Brasil. SBC.

Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T.(2023). Advancing neural encoding of portuguese with transformer albertina pt-*.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In 9th Brazilian Conference on Intelligent Systems, pages 403– 417, Rio Grande, Brazil. Springer

T Schütze, C. (2016). The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Language Science Press.

Warstadt, A., Singh, A., and Bowman, S. R. (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641.

Yin, F., Long, Q., Meng, T., and Chang, K.-W. (2020). On the robustness of language encoders against grammatical errors. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3386–3403, Online. Association for Computational Linguistics.

Zhang, Y., Warstadt, A., Li, X., and Bowman, S. R. (2021). When do you need billions of words of pretraining data? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1112–1125, Online. Association for Computational Linguistics.