Comparing Prompt-based LLMs, Fine-Tuning, and Classical Models for Legal Text Classification in Portuguese

  • Willgnner Ferreira Santos UFG
  • Arlindo Rodrigues Galvão Filho AKCIT
  • Sávio Salvarino Teles de Oliveira UFG
  • João Paulo Cavalcante Presa UFG

Resumo


This study compares fine-tuned Transformer models, LLMs with prompting, and traditional models in Portuguese legal text classification. A new Portuguese dataset was introduced for empirical evaluation. Four input formats were tested in prompting, complete text, summaries, centroids, and descriptions. Summaries achieved effective performance with reduced token usage. Similarly, KNN proved competitive in resource-limited scenarios. Input format and model capacity affected performance. The study discusses trade-offs between efficiency and interpretability. Guidelines are proposed for choosing effective strategies in legal NLP tasks.

Referências

Abonizio, H., Almeida, T. S., Laitz, T., Junior, R. M., Bonás, G. K., Nogueira, R., and Pires, R. (2024). Sabiá-3 technical report. arXiv preprint arXiv:2410.12049.

Aguiar, A., Silveira, R., Pinheiro, V., Furtado, V., and Neto, J. A. (2021). Text classification in legal documents extracted from lawsuits in brazilian courts. In Anais da X Brazilian Conference on Intelligent Systems, Porto Alegre, RS, Brasil. SBC.

Aguiar, M. S. d. (2025). Comparative analysis of the performance of large language models in the classification of legal texts.

ANADEP (2021). Goiás is the second worst brazilian state in number of public defenders per inhabitant. [link].

Balamurali, M. (2021). T-distributed stochastic neighbor embedding. In Encyclopedia of mathematical geosciences, pages 1–9. Springer.

Berman, E. M., Bowman, J. S., West, J. P., and Van Wart, M. R. (2021). Human resource management in public service: Paradoxes, processes, and problems. Cq Press.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners.

Chai, Y., Zhang, H., and Jin, S. (2020). Neural text classification by jointly learning to cluster and align.

Chen, H., Wu, L., Chen, J., Lu, W., and Ding, J. (2022). A comparative study of automated legal text classification using random forests and deep learning. Information Processing & Management, 59(2):102798.

Clark, K., Khandelwal, U., Levy, O., and Manning, C. D. (2019). What does bert look at? an analysis of bert’s attention.

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., and Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context.

de Jesus Falcão, L. C. et al. (2024). Sumarização de texto em deep learning como etapa inicial para a construção de um modelo de recuperação da informação: análise do setor de mineração no brasil.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.

DPE-GO (2025). Public defender’s office of the state of goiás. [link].

Elov, B., Khamroeva, S. M., and Xusainova, Z. (2023). The pipeline processing of nlp. In E3S Web of Conferences, volume 413, page 03011. EDP Sciences.

Goyal, N., Du, J., Ott, M., Anantharaman, G., and Conneau, A. (2021). Larger-scale transformers for multilingual masked language modeling. CoRR, abs/2105.00572.

Jain, S. M. (2022). Hugging face. In Introduction to transformers for NLP: With the hugging face library and models to solve problems, pages 51–67. Springer.

Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., and Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation.

Kiesow Cortez, E. and Maslej, N. (2023). Adjudication of artificial intelligence and automated decision-making cases in europe and the usa. European Journal of Risk Regulation, 14(3):457–475.

LaValley, M. P. (2008). Logistic regression. Circulation, 117(18):2395–2399.

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019a). Roberta: A robustly optimized bert pretraining approach.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019b). Roberta: A robustly optimized bert pretraining approach.

Llama-3.2-3B (2024). Llama-3.2-3b technical report. [link].

Meta (2024). Meta ai introduces llama 3.1: Advanced capabilities in language modeling. [link].

Mills, M. and Uebergang, J. (2017). Artificial intelligence in law: An overview.

Mixtral 8x22B (2024). Mixtral-8x22b technical report. [link].

Mixtral 8x7B (2024). Mixtral-8x7b technical report. [link].

Modrušan, N., Rabuzin, K., and Mrsic, L. (2020). Improving public sector efficiency using advanced text mining in the procurement process. In Proceedings of the 12th International Conference on e-Business (ICE-B), pages 200–206. SCITEPRESS.

Moraes, L. d. C., Silvério, I. C., Marques, R. A. S., Anaia, B. d. C., de Paula, D. F., de Faria, M. C. S., Cleveston, I., Correia, A. d. S., and Freitag, R. M. K. (2024). Análise de ambiguidade linguística em modelos de linguagem de grande escala (llms). arXiv preprint arXiv:2404.16653.

Nonato, L. G. (2022). O cenário regulatório da inteligência artificial.

Palanivinayagam, A., El-Bayeh, C. Z., and Damaševičius, R. (2023). Twenty years of machine-learning-based text classification: A systematic review. Algorithms, 16(5).

Pandey, D. and Malik, N. S. (2022). Artificial intelligence, automation, and the legal system. In Legal Analytics, pages 1–10. Chapman and Hall/CRC.

Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2):1883.

Pichiyan, V., Muthulingam, S., G, S., Nalajala, S., Ch, A., and Das, M. N. (2023). Web scraping using natural language processing: Exploiting unstructured text for data extraction and analysis. Procedia Computer Science, 230:193–202. 3rd International Conference on Evolutionary Computing and Mobile Sustainable Networks (ICECMSN 2023).

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language models are unsupervised multitask learners.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2023). Exploring the limits of transfer learning with a unified text-to-text transformer.

Rahman, A., Mahir, S. H., Tashrif, M. T. A., Aishi, A. A., Karim, M. A., Kundu, D., Debnath, T., Moududi, M. A. A., and Eidmum, M. (2025). Comparative analysis based on deepseek, chatgpt, and google gemini: Features, techniques, performance, future prospects. arXiv preprint arXiv:2503.04783.

Sil, R. and Roy, A. (2021). Machine learning approach for automated legal text classification. International Journal of Computer Information Systems and Industrial Management Applications, 13:10–10.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems, pages 403–417. Springer.

Sun, C., Qiu, X., Xu, Y., and Huang, X. (2020). How to fine-tune bert for text classification?

Sun, X., Li, X., Li, J., Wu, F., Guo, S., Zhang, T., and Wang, G. (2023). Text classification via large language models.

Together AI (2025). Together ai: Open foundation models api and cloud platform. Accessed: 2025-06-15.

Trautmann, D. (2023). Large language model prompt chaining for long legal document classification. arXiv preprint arXiv:2308.04138.

Wan, L., Papageorgiou, G., Seddon, M., and Bernardoni, M. (2019). Long-length legal document classification.

Webb, G. I. (2017). Naı̈ve bayes. In Encyclopedia of machine learning and data mining, pages 895–896. Springer.

Xu, S., Zhang, C., and Hong, D. (2022). Bert-based nlp techniques for classification and severity modeling in basic warranty data study. Insurance: Mathematics and Economics, 107:57–67.

Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., and Raffel, C. (2021). mt5: A massively multilingual pre-trained text-to-text transformer.
Publicado
29/09/2025
SANTOS, Willgnner Ferreira; GALVÃO FILHO, Arlindo Rodrigues; OLIVEIRA, Sávio Salvarino Teles de; PRESA, João Paulo Cavalcante. Comparing Prompt-based LLMs, Fine-Tuning, and Classical Models for Legal Text Classification in Portuguese. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1138-1149. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.14393.

Artigos mais lidos do(s) mesmo(s) autor(es)