Exploring Deep Neural Networks for a Linguistic Acceptability Task
Abstract
Linguistic acceptability is the task of determining whether a sentence is grammatically correct. Although some grammar correction tools exist for Portuguese, they are based on manually defined rules, which is a laborious task. This work investigated deep neural networks and language models for the linguistic acceptability task to develop more robust tools/methods for Portuguese to obtain better results than the existing tools. Recurrent networks, convolutional networks, and the BERTimbau and Albertina language models were explored. These models were trained on a corpus translated from English to Portuguese and evaluated on the Probi corpus. The recurrent and convolutional networks achieved the best results (0.37 f1), being competitive with the LanguageTool tool (0.40 f1).
References
Batra, S., Jain, S., Heidari, P., Arun, A., Youngs, C., Li, X., Donmez, P., Mei, S., Kuo, S., Bhardwaj, V., Kumar, A., and White, M. (2021). Building adaptive acceptability classifiers for neural NLG. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 682–697, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Choshen, L., Hacohen, G., Weinshall, D., and Abend, O. (2022). The grammar-learning trajectories of neural language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8281– 8297, Dublin, Ireland. Association for Computational Linguistics.
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Silva, J., and Aluísio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology, pages 122–131, Uberlândia, Brazil. Sociedade Brasileira de Computação.
Kinoshita, J., Salvador, L., Menezes, C., and Silva, W. (2007). Cogroo - an openoffice grammar checker. In Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007), pages 525–530, Rio de Janeiro, Brazil. IEEE.
Klezl, J., Mohammed, Y. A., and Volodina, E. (2022). Exploring linguistic acceptability in Swedish learners’ language. In Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning, pages 84–94, Louvain-la-Neuve, Belgium. LiU Electronic Press.
Martins, R. (2002). Probi: um corpus de teste para o revisor gramatical regra. Technical report, ICMC-USP.
Nunes, M. d. G. V. and Jr., O. (2000). O processo de desenvolvimento do revisor gramatical regra. In XXVII Seminário Integrado de Software e Hardware, pages 1–15, Curitiba, Brasil. SBC.
Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T.(2023). Advancing neural encoding of portuguese with transformer albertina pt-*.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In 9th Brazilian Conference on Intelligent Systems, pages 403– 417, Rio Grande, Brazil. Springer
T Schütze, C. (2016). The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Language Science Press.
Warstadt, A., Singh, A., and Bowman, S. R. (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641.
Yin, F., Long, Q., Meng, T., and Chang, K.-W. (2020). On the robustness of language encoders against grammatical errors. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3386–3403, Online. Association for Computational Linguistics.
Zhang, Y., Warstadt, A., Li, X., and Bowman, S. R. (2021). When do you need billions of words of pretraining data? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1112–1125, Online. Association for Computational Linguistics.
