Text Classification in Law Area: a Systematic Review

V. S. Martins; C. D. Silva

doi:10.5753/kdmile.2021.17458

V. S. Martins UFPA http://orcid.org/0000-0002-4789-2646
C. D. Silva UFPA http://orcid.org/0000-0001-8280-2928

DOI: https://doi.org/10.5753/kdmile.2021.17458

Resumo

Automatic Text Classification represents a great improvement in law area workflow, mainly in the migration of physical to electronic lawsuits. A systematic review of studies on text classification in law area from January 2017 up to February 2020 was conducted. The search strategy identified 20 studies, that were analyzed and compared. The review investigates from research questions: what are the state-of-art language models, its application of text classification in English and Brazilian Portuguese datasets from legal area, if there are available language models trained on Brazilian Portuguese, and datasets in Brazilian law area. It concludes that there are applications of automatic text classification in Brazil, although there is a gap on the use of language models when compared with English language dataset studies, also the importance of language model in domain pre-training to improve results, as well as there are two studies making available Brazilian Portuguese language models, and one introducing a dataset in Brazilian law area.

Palavras-chave: Text Classification, law, machine learning

Referências

Bertalan, V. G. F. and Ruiz, E. Predicting judicial outcomes in the brazilian legal system using textual features. In DHandNLP@PROPOR, 2020.

Campos, T., Luz de Araujo, P. H., and Sousa, M. pp. 76–86. In Inferring the Source of Official Texts: CanSVM Beat ULMFiT? pp. 76–86, 2020.

Chalkidis, I., Androutsopoulos, I., and Aletras, N. Neural legal judgment prediction in English. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 4317–4323, 2019.

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., and Androutsopoulos, I. LEGAL-BERT: The muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp. 2898–2904, 2020.

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., and Salakhutdinov, R. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 2978–2988, 2019.

Dal Pont, T. R.,Sabo, I. C.,Hübner, J. F.,and Rover, A. J.Impact of text specificity and size on wordembeddings performance: An empirical evaluation in brazilian legal domain. In Intelligent Systems, R. Cerri andR. C. Prati (Eds.). Springer International Publishing, Cham, pp. 521–535, 2020.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186, 2019.

Hartmann, N. S., Fonseca, E. R., Shulby, C. D., Treviso, M. V., Rodrigues, J. S., and Aluísio, S. M. Portuguese word embeddings evaluating on word analogies and natural language tasks. In Anais do XI Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana. SBC, Porto Alegre, RS, Brasil, pp. 122–131, 2017.

Howard, J. and Ruder, S. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp. 328–339, 2018.

Luz de Araujo, P. H., de Campos, T. E., Ataides Braz, F., and Correia da Silva, N. VICTOR: a dataset for Brazilian legal documents classification. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp. 1449–1458, 2020.

Mota, C., Lima, A., Nascimento, A., Miranda, P., and de Mello, R. Classificação de páginas de petições iniciais utilizando redes neurais convolucionais multimodais. In Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional. SBC, Porto Alegre, RS, Brasil, pp. 318–329, 2020.

Noguti, M. Y., Vellasques, E., and Oliveira, L. S. Legal document classification: An application to law area prediction of petitions to public prosecution service. In 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–8, 2020.

Shaheen, Z., Wohlgenannt, G., and Filtz, E. Large scale legal text classification using transformer models. Computer Science ArXiv vol. abs/2010.12871, 2020.

Silva, A. C. and Maia, L. C. G. The use of machine learning in the classification of electronic lawsuits: An application in the court of justice of minas gerais. In Intelligent Systems, R. Cerri and R. C. Prati (Eds.). Springer International Publishing, Cham, pp. 606–620, 2020.

Silva, N., Braz, F., and de Campos, T. Document type classification for brazil’s supreme court using a convolutional neural network. pp. 7–11, 2018.

Soh, J., Lim, H. K., and Chai, I. E. Legal area classification: A comparative study of text classifiers on Singapore Supreme Court judgments. In Proceedings of the Natural Legal Language Processing Workshop 2019. Association for Computational Linguistics, Minneapolis, Minnesota, pp. 67–77, 2019.

Song, D., Vold, A., Madan, K., and Schilder, F. Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training. Information Systems, 2021.

Souza, F., Nogueira, R., and Lotufo, R. Bertimbau: Pretrained bert models for brazilian portuguese. In Intelligent Systems, R. Cerri and R. C. Prati (Eds.). Springer International Publishing, Cham, pp. 403–417, 2020.

Sun, C., Qiu, X., Xu, Y., and Huang, X. How to fine-tune bert for text classification? In Chinese Computational Linguistics, M. Sun, X. Huang, H. Ji, Z. Liu, and Y. Liu (Eds.). Springer International Publishing, Cham, pp.194–206, 2019.

Sun, C., Qiu, X., Xu, Y., and Huang, X. How to fine-tune bert for text classification? In Chinese Computational Linguistics, M. Sun, X. Huang, H. Ji, Z. Liu, and Y. Liu (Eds.). Springer International Publishing, Cham, pp. 194–206, 2019.

Wang, Z., Wu, Y., Lei, P., and Peng, C. Named entity recognition method of brazilian legal text based on pre-training model. Journal of Physics: Conference Series vol. 1550, pp. 032149, 05, 2020.

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc., 2019.