Machine Learning Algorithms Applied on Classification of Processes for Conciliation on Brazilian Labour Judiciary
Resumo
The Labour Judiciary ensures protection and justice in labour relations, resolving conflicts such as unfair dismissals and wage delays. Artificial intelligence emerges to expedite legal activities, assisting in dealing with the increasing case load in the Judiciary over the past years. In labor dispute resolution, conciliation is a recommended solution, offering speed and cost reduction. In this sense, this study proposes to evaluate models to predict the success of labor cases being resolved through conciliation. The dataset used to generate the models considered in this study consists of initial petitions from cases extracted from the Processo Judiciário Eletrônico (PJe) maintained by the Tribunal Regional do Trabalho da 8ª Região. Pre-processing steps were performed on these documents, including the removal of accents, special characters, numerals, punctuation, stopwords, conversion of text to lowercase, stemming, and tokenization. The next step was text vectorization using the Term Frequency-Inverse Document Frequency (TF-IDF) for model generation. For our analysis, three machine learning algorithms were taken into account: Support Vector Machines (SVM), logistic regression, and decision trees. Additionally, a boosted tree model (XGBoost) was also generated. Based on the analysis conducted, the SVM with RBF kernel yielded better results, achieving an accuracy of 83% and an F1-Score of 82%, with a Matthews Correlation Coefficient (MCC) of 0.66 and an Area Under the ROC Curve (AUC) of 0.83.
Referências
Bird, S.; Loper, E.; Klein, E. Natural Language Processing with Python. [S.l.]: O’Reilly Media Inc, 2009. ISBN 0596516495.
CEJUSC. Centro Judiciário de Solução de Conflitos e Cidadania. CEJUSC, 2022. Disponível em https://www.trt8.jus.br/cejusc. Acesso em 21 de mar. de 2022.
Davis, J., Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning, 233-240.
De Oliveira, Raphael Souza, and Nascimento, Erick Giovani Sperandio. "Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers." arXiv preprint arXiv:2204.07182 (2022)
Horus, Hórus (Inteligência do negócio), Sistemas do Tribunal Regional do Trabalho 8ª Região, 2022. Disponível em: <https://www.trt8.jus.br/servicos> <https://horus.trt8.jus.br/index.htm>
Hsieh, Hsun-Ping, et al. "Predicting the Success of Mediation Requests Using Case Properties and Textual Information for Reducing the Burden on the Court." Digital Government: Research and Practice 2.4 (2022): 1-18
Html2text, ferramenta para converter um documento HTML em texto. Disponível em: https://github.com/grobian/html2text. Acesso em 23 de mar. de 2023.
Jurafsky, D., Martin, J. H. (2020). Speech and language processing. An introduction to natural language processing, computational linguistics, and speech recognition. Pearson.
Noguti, Mariana Y., Eduardo Vellasques, and Luiz S. Oliveira. "Legal document classification: An application to law area prediction of petitions to public prosecution service." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
Orengo, V.; Huyck, C. A stemming algorithmm for the portuguese language. IEEE (em inglês): 186–193. Novembro de 2001. doi:10.1109/SPIRE.2001.989755
PJE, Processo Judicial Eletrônico, Sistemas do Tribunal Regional do Trabalho 8ª Região, 2023. Disponível em: https://www.trt8.jus.br/servicos https://www.trt8.jus.br/pje
Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37-63.
Sklearn TfidfVectorizer, biblioteca python para converter textos em matriz TF-IDF. Disponível em: [link]. Acesso em 13 de ago. de 2023.
TST. Tribunal Superior do Trabalho. Matérias Temáticas Conciliação. TST, 2021. Disponível em http://www.tst.jus.br/web/guest/conciliacao. Acesso em 23 de nov. de 2021.