Framework Evaluation for Legislative Document Retrieval: A Case Study in the Brazilian Chamber of Deputies
Abstract
This work investigates information retrieval frameworks to deal with the difficulties present in the law-making process of the Brazilian Chamber of Deputies. Two open-source frameworks were chosen. In addition, different pre-processing techniques, including stemmers and n-gram language models, were evaluated. Two legislative corpora from the Chamber were used to build and validate the experiments. The results were compared to a baseline used by the Chamber of Deputies. The baseline showed the best result, achieving a Recall for 20 documents of 0.7376.
References
Brandt, M. B. (2020). Modelagem da informação legislativa: arquitetura da informação para o processo legislativo brasileiro.
Chalkidis, I., Fergadiotis, M., Manginas, N., Katakalou, E., and Malakasiotis, P. (2021). Regulatory compliance through doc2doc information retrieval: A case study in eu/uk legislation where text similarity has limitations. arXiv preprint arXiv:2101.10726.
da Silva, F. T. and Maia, J. E. B. (2018). Luppar: An information retrieval system for closed document collections. In Anais do XV Encontro Nacional de Inteligência Artificial e Computacional, pages 912–923. SBC.
Li, H., Xu, J., et al. (2014). Semantic matching in search. Foundations and Trends® in Information Retrieval, 7(5):343–469.
Maxwell, K. T. and Schafer, B. (2008). Concept and context in legal information retrieval. In Legal Knowledge and Information Systems, pages 63–72. IOS Press.
Pietsch M., Soni T., C. B. M. T. K. B. (2020). Haystack (version 0.5.0). In GitHub.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137.
Rosa, G. M., Rodrigues, R. C., Lotufo, R., and Nogueira, R. (2021). Yes, bm25 is a strong baseline for legal case retrieval. arXiv preprint arXiv:2105.05686.
Savoy, J. (2006). Light stemming approaches for the french, portuguese, german and hungarian languages. In Proceedings of the 2006 ACM symposium on Applied computing, pages 1031–1035.
Sourty, R., Moreno, J. G., Tamine, L., and Servant, F.-P. (2022). Cherche: A new tool to rapidly implement pipelines in information retrieval. In Proceedings of SIGIR 2022.
Souza, E., Moriyama, G., Vitório, D., de Carvalho, A. C., Félix, N., Albuquerque, H. O., and Oliveira, A. L. (2021a). Assessing the impact of stemming algorithms applied to brazilian legislative documents retrieval. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 227–236. SBC.
Souza, E., Vitório, D., Moriyama, G., Santos, L., Martins, L., Souza, M., Fonseca, M., Félix, N., Carvalho, A. C., Albuquerque, H. O., et al. (2021b). An information retrieval pipeline for legislative documents from the brazilian chamber of deputies. In Legal Knowledge and Information Systems, pages 119–126. IOS Press.
Vitório, D., Souza, E., Martins, L., da Silva, N. F., de Leon Ferreira de Carvalho, A. C. P., and Oliveira, A. L. (2022). Ulysses-RFSQ: A novel method to improve legal information retrieval based on relevance feedback. In Intelligent Systems: 11th Brazilian Conference, BRACIS 2022, Campinas, Brazil, November 28–December 1, 2022, Proceedings, Part I, pages 77–91. Springer.
Wrzalik, M. and Krechel, D. (2021). Gerdalir: A german dataset for legal information retrieval. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 123–128.
Yates, A., Nogueira, R., and Lin, J. (2021). Pretrained transformers for text ranking: Bert and beyond. In Proceedings of the 14th ACM International Conference on web search and data mining, pages 1154–1156.
