Modelagem de Tópicos para a Tarefa de Recuperação de Casos Legais

Luisa Pereira Novaes; Daniela Vianna; Altigran Soares da Silva

doi:10.5753/sbbd.2023.232576

Luisa Pereira Novaes Universidade Federal do Amazonas
Daniela Vianna Universidade Federal do Amazonas
Altigran Soares da Silva Universidade Federal do Amazonas https://orcid.org/0000-0002-8992-495X

DOI: https://doi.org/10.5753/sbbd.2023.232576

Resumo

Este artigo descreve uma abordagem baseada em tópicos para o problema de recuperação de casos jurídicos (legal case retrieval). O método consiste em duas fases: filtragem e ordenação. Na primeira fase, uma técnica de modelagem de tópicos é aplicada em todo o conjunto de dados para selecionar um conjunto inicial de casos candidatos para cada consulta. Na segunda fase, uma função de ordenação é usada para produzir uma lista ordenada de casos relevantes para a consulta fornecida. Resultados experimentais obtidos utilizando três diferentes funções de ordenação, com coleções de dados em diferentes idiomas, indicam que a abordagem proposta é competitiva, o que se deve à forte correlação, verificada em nossos experimentos, entre os tópicos de um documento-consulta e os tópicos dos casos jurídicos relevantes. De fato, nossa abordagem obteve melhores valores de precisão do que os reportados na recém-realizada Competition on Legal Information Extraction/Entailment (COLIEE) 2023, concorrendo com grupos de todo o mundo.

Palavras-chave: Modelagem de tópicos, RI, Casos Legais, Recuperação de Casos Legais

Referências

Chalkidis, I. et al. (2020). Legal-bert: The muppets straight out of law school.

Devlin, J. et al. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pages 4171–4186.

Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure.

Jalilifard, A. et al. (2021). Semantic sensitive tf-idf to determine word relevance in documents. In Advances in Computing and Network Communications: Proceedings of CoCoNet, pages 327–337.

Le, Q. and , T. M. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - ICML, page II–1188–II–1196.

Mandal, A. et al. (2021). Unsupervised approaches for measuring textual similarity between legal court case reports. Artif. Intell. Law, 29(3):417–451.

McInnes, L. and Healy, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction.

McInnes, L. et al. (2017). hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2:205.

Nanda, R. et al. (2017). Legal information retrieval using topic clustering and neural networks. In 4th Competition on Legal Information Extraction and Entailment (COLIEE), pages 68–78.

Park, L. A. et al. (2009). The sensitivity of latent dirichlet allocation for information retrieval. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD, pages 176–188.

Rabelo, J. et al. (2022). Semantic-based classification of relevant case law. In New Frontiers in Artificial Intelligence - JSAI-isAI, pages 84–95.

Sansone, C. and Sperlí, G. (2022). Legal information retrieval systems: State-of-the-art and open issues. Information Systems, 106:101967.

Silveira, R. et al. (2021). Topic modelling of legal documents via legal-bert1. In Proceedings http://ceur-ws.org ISSN, 1613:0073.

Vianna, D. and Moura, E. (2022). Organizing portuguese legal documents through topic discovery. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 3388–3392.

Vianna, D., Moura, E., and Silva, A. (2023). A topic discovery approach for unsupervised organization of legal document collections. Artificial Intelligence and Law, pages 1–30.