Evaluating BERT Models for Semantic Retrieval in Long Portuguese Legal Documents
Abstract
The growing number of digital documents in the Brazilian judiciary creates new challenges to procedural efficiency. This study evaluated five BERT models for dense information retrieval from long court documents, utilizing segmentation and vector retrieval with Elasticsearch. General-purpose, domain-specific, and task-specific models were tested to measure the intra-cluster coherence. BumbaBERT (domain-specific) performed best, confirming that domain specialization is crucial for effective semantic retrieval in “zero-shot” scenarios in the Brazilian legal context.
References
CNJ (2024). Relatório analítico anual da justiça em números 2023. [link].
Costa, J. A. F. and Dantas, N. C. D. (2023). Análise comparativa de embeddings jurídicos aplicados a algoritmos de clustering. Anais do Congresso Brasileiro de Computação Jurídica.
Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.
do Carmo, F. A., Serejo, F., Junior, A. F. J., Santana, E. E., and Lobato, F. M. (2023). Embeddings jurídico: Representações orientadas à linguagem jurídica brasileira. In Anais do XI Workshop de Computação Aplicada em Governo Eletrônico.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Sun, J., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey.
Guimarães, J. A. C. (2004). Elaboração de ementas jurisprudenciais: elementos teórico-metodológicos.
Harispe, S., Ranwez, S., Montmain, J., et al. (2022). Semantic similarity from natural language and ontology analysis.
Karpukhin, V., Oguz, B., Min, S., Lewis, P. S., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In EMNLP (1).
Khattab, O. and Zaharia, M. (2020). Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48.
Ku, L.-W., Wu, T.-H., Lee, L.-Y., and Chen, H.-H. (2005). Construction of an evaluation corpus for opinion extraction. In NTCIR.
Magalhães, R. A. and Freitas, F. O. (2023). A morosidade do poder judiciário e sua interferência nas relações contratuais. Revista Jurídica Cesumar-Mestrado.
Moore, D. S., McCabe, G. P., and Craig, B. A. (2009). Introduction to the Practice of Statistics.
Ni, C., Wu, J., Wang, H., Lu, W., and Zhang, C. (2024). Enhancing cloud-based large language model processing with elasticsearch and transformer models. In ISPP.
Oliveira, R. S. d. and Sperandio Nascimento, E. G. (2025). Analysing similarities between legal court documents using natural language processing approaches based on transformers. PloS one, 20(4):e0320244.
Pires, V. B., Guerreiro, D., et al. (2024). Portuguese fake news classification with bert models. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC). SBC.
Polo, F. M., Mendonça, G. C. F., Parreira, K. C. J., Gianvechio, L., Cordeiro, P., Ferreira, J. B., de Lima, L. M. P., Maia, A. C. d. A., and Vicente, R. (2021). Legalnlp–natural language processing methods for the brazilian legal language.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks.
Scherrer, L., Tomko, M., Ranacher, P., and Weibel, R. (2018). Travelers or locals? identifying meaningful sub-populations from human movement data in the absence of ground truth. EPJ Data Science, 7(1):1–21.
Schütze, H., Manning, C. D., and Raghavan, P. (2008). Introduction to information retrieval.
Silveira, R., Ponte, C., Almeida, V., Pinheiro, V., and Furtado, V. (2023). Legalbert-pt: A pretrained language model for the brazilian portuguese legal domain. In Brazilian Conference on Intelligent Systems, pages 268–282.
Singhal, A. et al. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35–43.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems.
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., and Gurevych, I. (2021). Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models.
Toffoli, J. A. D. and Gusmão, B. G. (2019). Inteligência artificial na justiça. Brasília: CNJ.
Wirth, R. and Hipp, J. (2000). Crisp-dm: Towards a standard process model for data mining. In 4th Int. Conf. on Practical Applications of Knowledge Discovery and Data Mining.
Wortsman, M., Ilharco, G., Kim, J. W., Li, M., Kornblith, S., Roelofs, R., Lopes, R. G., Hajishirzi, H., Farhadi, A., Namkoong, H., et al. (2022). Robust fine-tuning of zero-shot models.
Zhao, L. (2012). Modeling and solving term mismatch for full-text retrieval. Carnegie Mellon University.
Zhao, W. X., Liu, J., Ren, R., and Wen, J.-R. (2024). Dense text retrieval based on pretrained language models: A survey. ACM Transactions on Information Systems.
