Text Classification in Legal Documents Extracted from Lawsuits in Brazilian Courts

Aguiar, André; Silveira, Raquel; Pinheiro, Vládia; Furtado, Vasco; Neto, João Araújo

doi:10.1007/978-3-030-91699-2_40

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1056 Accesses
8 Citations

Abstract

Recently, Brazil’s National Council of Justice (CNJ) highlighted the importance of robust solutions to perform automated lawsuit classification. A correct lawsuit classification substantially improves the assertiveness of (i) distribution, (ii) organization of the agenda of court hearing and sessions, (iii) classification of urgent measures and evidence, (iv) identification of prescription and (v) prevention. This paper investigates different text classification methods and different combinations of embeddings, extracted from Portuguese language models, and information about legislation cited in the initial documents. The models were trained with a Golden Collection of 16 thousand initial petitions and indictments from the Court of Justice of the State of Ceará, in Brazil, whose lawsuits were classified in the five more representative CNJ’s classes - Common Civil Procedure, Execution of Extrajudicial Title, Criminal Action - Ordinary Procedure, Special Civil Court Procedure, and Tax Enforcement. Our best result was obtained by the BERT model, achieving 0.88 of F1 score (macro), in the experiment scenario that represents the lawsuit in an embedding formed by concatenating the texts of all the petitions that contain at least one citation to one legislation. Legal documents have specific characteristics such as long documents, specialized vocabulary, formal syntax, semantics based on a broad specific domain of knowledge, and citations to laws. Our interpretation is that the representation of the document through contextual embeddings generated by BERT, as well as the architecture of the model with bidirectional contexts, makes it possible to capture the specific context of the domain of legal documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.cnj.jus.br/sgt/consulta_publica_classes.php
2.
https://github.com/MPMG-DCC-UFMG/M02
3.
https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/crf/CRFClassifier.html
4.
http://www4.planalto.gov.br/legislacao/
5.
https://www.lexml.gov.br/
6.
https://www.al.ce.gov.br/index.php/tividades-legislativas/leis
7.
Given the specifics of the BERT’s original architecture, this model was trained only in scenarios: (S1), (S2), (S3) and (S4).
8.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
9.
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
10.
https://xgboost.readthedocs.io/en/latest/python/python_api.html

References

Conselho Nacional de Justiça: CONVOCAÇÃO nº 01/2021 – Desenvolvimento- piloto de soluções para a automação processual e uso de técnicas de inteligência artificial no Poder Judiciário. https://acessoexterno.undp.org.br/Public/Jobs/18062021164751_Resultado%20para%20publica%C3%A7%C3%A3o_Sinapses.pdf. Accessed 20 June 2021
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019)
Google Scholar
Mikolov, T., Chen, K., Carrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/pdf/1301.3781.pdf. Accessed 20 Nov 2015
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(12), 1735–1780 (1997)
Article Google Scholar
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 1107–1116. Association for Computational Linguistics, Valencia, Spain (2017)
Google Scholar
Shaheen, Z., Wohlgenannt, G., Filtz, E.: Large Scale Legal Text Classification Using Transformer Models. arXiv preprint arXiv:2010.12871 (2020)
Barrón-Cedeño, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)
Article Google Scholar
Dinu, L.P., Niculae, V., Sulea, O.-M.: Pastiche detection based on stopword rankings: exposing impersonators of a Romanian writer. In: Proceedings of the Workshop on Computational Approaches to Deception Detection (2012)
Google Scholar
Niculae, V., Zampieri, M., Dinu, L.P., Ciobanu, A.M.: Temporal text ranking and automatic dating of texts. In: Proceedings of EACL (2014)
Google Scholar
Sumner, C., Byers, A., Boochever, R., Park, G.J.: Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets. In: Proceedings of ICMLA (2012). https://doi.org/10.1109/ICMLA.2012.218
Pérez-Rosas, V., Mihalcea, R.: Experiments in open domain deception detection. In: Lluís, M., Chris, C.B., Jian, S., Daniele, P., Yuval, M. (eds.) Proceedings of EMNLP. Association for Computational Linguistics (2015). https://aclweb.org/anthology/D/D15/D15-1133.pdf
Pinheiro, V., Pequeno, T., Furtado, V., Nogueira, D.: Information extraction from text based on semantic inferentialism. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS (LNAI), vol. 5822, pp. 333–344. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04957-6_29
Chapter Google Scholar
Cheng, J., Danescu-Niculescu-Mizil, C., Leskovec, J.: Anti-social behavior in online discussion communities. In: Proceedings of ICWSM (2015)
Google Scholar
Katz, D.M., Bommarito, M.J.I., Blackman, J.: Predicting the behavior of the supreme court of the United States: a general approach. In: arXiv e-prints, page arXiv:1407.6333 (2014)
Aletras, N., Tsarapatsanis, D., Preotiuc-Pietro, D., Lampos, V.: Predicting judicial decisions of the european court of human rights: a natural language processing perspective. Peer J. Comput. Sci. 10 (2016)
Google Scholar
Sulea, O.M., Zampieri, M., Vela, M., vanGenabith, J. Predicting the law area and decisions of French Supreme Court cases. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP, pp. 716–722. INCOMA Ltd. (2017)
Google Scholar
Undavia, S., Meyers, A., Ortega, J.E.: A comparative study of classifying legal documents with neural networks. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 515–522 (2018)
Google Scholar
Araújo, P.H.L., Campos, T.E., Braz, F.A.; Silva, N.C.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 11–16 May, pp. 1449–1458. Marseille (2020)
Google Scholar
Fabio, S., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, 20–23 October. Rio Grande do Sul, Brazil (2020)
Google Scholar
Silva, N., Braz, F., Campos, T.: Document type classification for Brazil’s supreme court using a convolutional neural network. In: The Tenth International Conference on Forensic Computer Science and Cyber Law-ICoFCS, vol. 10, pp. 7–11 (2018)
Google Scholar
Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in brazilian legal text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32
Chapter Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Hearst, M.A.: Support vector machine. IEEE Intell. Syst. 13(4), 18–28 (1998)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. New York (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Fortaleza, Fortaleza, Brazil
André Aguiar, Vládia Pinheiro, Vasco Furtado & João Araújo Neto
Federal Institute of Education, Science and Technology of Ceará, Fortaleza, Brazil
Raquel Silveira

Authors

André Aguiar
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Silveira
View author publications
You can also search for this author in PubMed Google Scholar
Vládia Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar
Vasco Furtado
View author publications
You can also search for this author in PubMed Google Scholar
João Araújo Neto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raquel Silveira .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aguiar, A., Silveira, R., Pinheiro, V., Furtado, V., Neto, J.A. (2021). Text Classification in Legal Documents Extracted from Lawsuits in Brazilian Courts. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_40
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics