Assessment of text clustering approaches for legal documents
Resumo
O sistema judiciário é composto por inúmeros documentos relacionados a processos jurídicos. Esses documentos podem conter informações relevantes que suportem a tomada de decisão em processos futuros. No entanto, a coleta dessas informações não é uma tarefa trivial. Este artigo propõe o uso de agrupamento para reunir processos semelhantes e facilitar a coleta de informações. Dessa forma, diferentes abordagens foram avaliadas com a intenção de identificar a mais adequada para realizar esta tarefa. As abordagens foram aplicadas a uma base de dados composta por 1515 textos de fatos de petições iniciais. Essas abordagens foram avaliadas levando em consideração métricas de avaliação internas e os textos dos processos agrupados. Os resultados apontaram que a melhor abordagem para realizar o agrupamento de processos jurídicos é composta pelo algoritmo K-Means e pela técnica de representação TF-IDF em combinação com a técnica PCA.
Referências
Aletras, N., Tsarapatsanis, D., Preotiuc-Pietro, D., and Lampos, V. (2016). Predicting judicial decisions of the european court of human rights: A natural language processing perspective. PeerJ Computer Science, 2:e93.
Amine, A., Elberrichi, Z., and Simonet, M. (2010). Evaluation of text clustering methods using wordnet. Int. Arab J. Inf. Technol., 7(4):349–357.
Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data, pages 25–71. Springer.
Chen, B., Li, Y., Zhang, S., Lian, H., and He, T. (2019). A deep learning method for judicial decision support. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pages 145–149. IEEE.
Conrad, J. G., Al-Kofahi, K., Zhao, Y., and Karypis, G. (2005). Effective document In Proceedings of the 10th clustering for large heterogeneous law firm collections. international conference on Artificial intelligence and law, pages 177–187.
de Colla Furquim, L. O. and De Lima, V. L. S. (2012). Clustering and categorization of brazilian portuguese legal documents. In International Conference on Computational Processing of the Portuguese Language, pages 272–283. Springer.
Fan, B., Liu, T., Hu, H., and Du, X. (2010). Law text clustering based on referential relations. In 2010 Fifth Annual ChinaGrid Conference, pages 60–66. IEEE.
Kachappilly, D. and Wagh, R. (2018). Similarity analysis of court judgments usingclustering of case citation data: a study. International Journal of Engineering & Technology, 7(2):855–858.
Kodinariya, T. M. and Makwana, P. R. (2013). Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95.
Kowsrihawat, K., Vateekul, P., and Boonkwan, P. (2018). Predicting judicial decisions of criminal cases from thai supreme court using bi-directional gru with attention mechanism. In 2018 5th Asian Conference on Defense Technology (ACDT), pages 50–55. IEEE.
Liu, T., Liu, S., Chen, Z., and Ma, W.-Y. (2003). An evaluation on feature selection for text clustering. In Proceedings of the 20th international conference on machine learning (ICML-03), pages 488–495.
Lv, B., Hou, W., Liu, G., Gao, J., Yuan, X., Li, P., and Chen, Z. (2018). A deep cfs model for text clustering. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pages 132–137. IEEE.
Polpinij, J., Bheganan, P., Luaphol, B., Sibunruang, C., and Namee, K. (2020). Identifying of decision components in thai civil case decision by text classification technique. In International Conference on Computing and Information Technology, pages 11–20. Springer.
Poudyal, P., Gonçalves, T., and Quaresma, P. (2019). Using clustering techniques to identify arguments in legal documents. In ASAIL@ ICAIL.
Raghav, K., Reddy, P. B., Reddy, V. B., and Reddy, P. K. (2015). Text and citations based cluster analysis of legal judgments. In International conference on mining intelligence and knowledge exploration, pages 449–459. Springer.
Raghuveer, K. (2012). Legal documents clustering using latent dirichlet allocation. IAES Int. J. Artif. Intell, 2(1):34–37.
Rosca, C., Covrig, B., Goanta, C., van Dijck, G., and Spanakis, G. (2020). Return of the AI: An Analysis of Legal Research on Artificial Intelligence Using Topic Modeling. CEUR-WS. org.
Thammaboosadee, S., Watanapa, B., and Charoenkitkarn, N. (2012). A framework of multi-stage classifier for identifying criminal law sentences. Procedia Computer Science, 13:53–59.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416.
Wang, X. and Xu, Y. (2019). An improved index for clustering validation based on silhouette index and calinski-harabasz index. In IOP Conference Series: Materials Science and Engineering, volume 569, page 052024. IOP Publishing.
Xiao, G., Chow, E., Chen, H., Mo, J., Guo, J., and Gong, Z. (2017). Chinese questions classification in the law domain. In 2017 IEEE 14th International Conference on eBusiness Engineering (ICEBE), pages 214–219. IEEE.