NLP and Legal Certainty: Identifying Jurisprudential Divergences with Natural Language Processing

Abstract


This paper proposes the use of Natural Language Processing (NLP) and Machine Learning techniques to identify judicial rulings that are divergent to the majority understanding of the subject in Brazilian courts, aiming to enhance judicial security. The methodology used includes data cleaning and preprocessing of a summary of the ruling, the use of the Word2Vec neural network technique for word embedding, and the analysis of 3.165 court decisions via k-means clustering to identify semantic similarities and divergences. Specific examples of jurisprudential divergences are presented, demonstrating how technology can assist in the uniformity of judicial decisions.

Keywords: Legal Certainty, Natural Language Processing, Semantic Textual Similarity, Jurisprudence, Word2Vec

References

Brasil. (2015). Código de Processo Civil. Senado Federal. Lei n.º 13.105, de 16 de março de 2015. Edição atualizada.

Brasil, C. N. d. J. (2023). Justiça em Números - 2023. Disponível em [link]. Acessado em 05/08/2024.

Ciurlino, V. H. (2021). BertBR: A Pretrained Language Model for Law Texts. [link]

Didier, F. (2019). Curso de Direito Processual Civil, Vol. 1: Introdução ao Direito Processual Civil, Parte Geral e Processo de Conhecimento. JusPODIVM, 21.ª edição.

Gomes, T. A. (2021). Avaliação de técnicas de similaridade textual na uniformização de jurisprudência. Disponível em [link]. Acessado em 08/06/2024.

Lima, João Pedro e Costa, J. A. (2022). Comparing Clustering Techniques on Brazilian Legal Document Datasets. In Hybrid Artificial Intelligent Systems, pp. 98–110, Cham: Springer International Publishing. DOI: 10.1007/978-3-031-15471-3_9

Maaten, L. van der e Hinton, G. (2008). Visualizing Data Using t-SNE. Journal of Machine Learning Research, 9: 2579–2605. [link]

Magalhães, Dimmy; Pozo, A. e M. S. (2023). Técnicas de Aprendizado de Máquinas Aplicadas à Classificação de Decisões Judiciais. Revista de Estudos Empíricos em Direito. DOI: 10.19092/reed.v9.573

Martins, A. D. M. (2018). Agrupamento Automático de Documentos Jurídicos com Uso de Inteligência Artificial. Disponível em [link]. Acessado em 08/06/2024.

Mikolov, T., Chen, K., Corrado, G., e Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR). DOI: 10.48550/arXiv.1301.3781

Pajankar, A. e Joshi, A. (2022). Hands-on Machine Learning with Python: Implement Neural Network Solutions with Scikit-learn and PyTorch. Apress, Berkeley, CA.

Polo, F. M., Mendonça, G. C. F., Parreira, K. C. J., Gianvechio, L., Cordeiro, P., Ferreira, J. B., de Lima, L. M. P., do Amaral Maia, A. C., e Vicente, R. (2021). LegalNLP: Natural Language Processing Methods for the Brazilian Legal Language. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pp. 763–774. SBC. [link]

Rousseeuw, P. J. (1987). Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20: 53–65. DOI: 10.1016/0377-0427(87)90125-7

Sidorov, G. e Pinto, D. (2014). Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model. Computación y Sistemas, pp. 324–344. [link]

Wilton, P. e Vigneaux, A. J. (2022). Clustering of Brazilian Legal Judgments about Failures in Air Transport Service: An Evaluation of Different Approaches. Artificial Intelligence and Law, 30: 21–57. Accepted: 8 April 2021 / Published online: 17 April 2021. [link]

Xia, C., He, T., Li, W., Qin, Z., e Zou, Z. (2019). Similarity Analysis of Law Documents Based on Word2Vec. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 354–357. DOI: 10.1109/QRS-C.2019.00072
Published
2024-11-17
CASTRO, Marcella Queiroz de; NEVES, Ana Régia. NLP and Legal Certainty: Identifying Jurisprudential Divergences with Natural Language Processing. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 15. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 451-456. DOI: https://doi.org/10.5753/stil.2024.245333.