LegalNLP - Natural Language Processing methods for the Brazilian Legal Language

Felipe Maia Polo; Gabriel Caiaffa Floriano Mendonça; Kauê Capellato J. Parreira; Lucka Gianvechio; Peterson Cordeiro; Jonathan Batista Ferreira; Leticia Maria Paz de Lima; Antônio Carlos do Amaral Maia; Renato Vicente

doi:10.5753/eniac.2021.18301

Felipe Maia Polo University of Michigan
Gabriel Caiaffa Floriano Mendonça USP
Kauê Capellato J. Parreira USP
Lucka Gianvechio USP
Peterson Cordeiro USP
Jonathan Batista Ferreira USP
Leticia Maria Paz de Lima USP
Antônio Carlos do Amaral Maia Tikal Tech
Renato Vicente USP / Latam Datalab Serasa Experian

DOI: https://doi.org/10.5753/eniac.2021.18301

Resumo

We present and make available pre-trained language models (Phraser, Word2Vec, Doc2Vec, FastText, and BERT) for the Brazilian legal language, a Python package with functions to facilitate their use, and a set of demonstrations/tutorials containing some applications involving them. Given that our material is built upon legal texts coming from several Brazilian courts, this initiative is extremely helpful for the Brazilian legal field, which lacks other open and specific tools and language models. Our main objective is to catalyze the use of natural language processing tools for legal texts analysis by the Brazilian industry, government, and academia, providing the necessary tools and accessible material.

Referências

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.

Braz, F. A., da Silva, N. C., de Campos, T. E., Chaves, F. B. S., Ferreira, M. H., Inazawa, P. H., Coelho, V. H., Sukiennik, B. P., de Almeida, A. P. G. S., Vidal, F. B., et al. (2018). Document classification using a bi-lstm to unclog brazil’s supreme court. arXiv preprint arXiv:1811.11569.

da Silva, N. C., Braz, F., Gusmão, D., Chaves, F., Mendes, D., Bezerra, D., Ziegler, G., Horinouchi, L., Ferreira, M., Inazawam, P., et al. (2018). Document type classification for brazil’s supreme court using a convolutional neural network.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.

Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR.

Massoni, G. (2021). Análise de textos por meio de processos estocásticos na representação word2vec.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.

Nguyen, T.-S., Nguyen, L.-M., Tojo, S., Satoh, K., and Shimazu, A. (2018). Recurrent neural network-based models for recognizing requisite and effectuation parts in legal texts. Artificial Intelligence and Law, 26(2):169–199.

Polo, F. M., Ciochetti, I., and Bertolo, E. (2021). Predicting legal proceedings status: approaches based on sequential text data. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pages 264–265.

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. (2017). Catboost: unbiased boosting with categorical features. arXiv preprint arXiv:1706.09516.

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear).

Sulea, O.-M., Zampieri, M., Malmasi, S., Vela, M., Dinu, L. P., and Van Genabith, J. (2017). Exploring the use of text classification in the legal domain. arXiv preprint arXiv:1710.09306.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.