Numerical information extraction in legal texts using open and closed Large Language Models
Resumo
This paper investigates numerical named-entity recognition in Portuguese legal texts using both closed and open-source decoder-only Large Language Models (LLMs). We conduct a quantitative and qualitative evaluation of two paradigms: (1) fine-tuning a modified version of LLaMA 2 via LoRA on over 600 new, manually annotated, judicial rulings, and (2) prompt engineering with closed models (OpenAI’s GPT and Google’s Gemini). We compare the performance of instruction tuning and prompt construction with closed models, with a parameter-efficient fine-tuning approach that bridges decoder-only LLMs with traditional encoder-only architectures. The results reveal that the modified, LoRA-tuned, LLaMA 2 achieves competitive entity-recognition performance while offering greater transparency and parameter-efficiency, whereas prompt-engineered closed models simplify deployment but incur limitations in consistency and fine-grained control.Referências
Agrawal, M., Hegselmann, S., Lang, H., Kim, Y., and Sontag, D. (2022). Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022.
Anil, R., Borgeaud, S., Alayrac, J.-B., and et al., J. Y. (2024). Gemini: A family of highly capable multimodal models.
Bitelli, B. and Finger, M. (2024). Numerical information extraction in legal texts using large language models. Master’s thesis, Universidade de São Paulo.
Cabral, B., Souza, M., and Claro, D. B. (2022). Portnoie: A neural framework for open information extraction for the portuguese language. In Computational Processing of the Portuguese Language, pages 243–255, Cham. Springer International Publishing.
Chen, X., Li, L., Deng, S., Tan, C., Xu, C., Huang, F., Si, L., Chen, H., and Zhang, N. (2021). Lightner: a lightweight tuning paradigm for low-resource ner via pluggable prompting. arXiv preprint arXiv:2109.00720.
Cui, L., Wu, Y., Liu, J., Yang, S., and Zhang, Y. (2021). Template-based named entity recognition using bart. arXiv preprint arXiv:2106.01760.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Furquim, L. O. d. C. and de Lima, V. L. S. (2012). Clustering and categorization of brazilian portuguese legal documents. In Computational Processing of the Portuguese Language, pages 272–283, Berlin, Heidelberg. Springer Berlin Heidelberg.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models.
Jurafsky, D. and Martin, J. H. (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. Online manuscript released August 20, 2024.
Li, B., Fang, G., Yang, Y., Wang, Q., Ye, W., Zhao, W., and Zhang, S. (2023a). Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness.
Li, Z., Li, X., Liu, Y., Xie, H., Li, J., lee Wang, F., Li, Q., and Zhong, X. (2023b). Label supervised llama finetuning.
Naik, A., Ravichander, A., Rose, C., and Hovy, E. (2019). Exploring numeracy in word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3374–3380.
Nunes, R. O., Spritzer, A. S., Freitas, C. M. D. S., and Balreira, D. G. (2024). Reconhecimento de entidades nomeadas e vazamento de dados em textos legislativos. Linguamática, 16(2):preprint–preprint.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T. L., Raja, A., et al. (2021). Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
Serras, F. R. and Finger, M. (2021). verbert: Automating brazilian case law document multi-label categorization using bert. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 237–246. SBC.
Sundararaman, D., Subramanian, V., Wang, G., Xu, L., and Carin, L. (2022). Number entity recognition. arXiv preprint arXiv:2205.03559.
Touvron, H., Martin, L., Stone, K., and et al., P. A. (2023). Llama 2: Open foundation and fine-tuned chat models.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need.
Wang, X., Zhou, W., Zu, C., Xia, H., Chen, T., Zhang, Y., Zheng, R., Ye, J., Zhang, Q., Gui, T., et al. (2023). Instructuie: Multi-task instruction tuning for unified information extraction. arXiv e-prints, pages arXiv–2304.
Wei, X., Cui, X., Cheng, N., Wang, X., Zhang, X., Huang, S., Xie, P., Xu, J., Chen, Y., Zhang, M., et al. (2023). Zero-shot information extraction via chatting with chatgpt. arXiv e-prints, pages arXiv–2302.
Anil, R., Borgeaud, S., Alayrac, J.-B., and et al., J. Y. (2024). Gemini: A family of highly capable multimodal models.
Bitelli, B. and Finger, M. (2024). Numerical information extraction in legal texts using large language models. Master’s thesis, Universidade de São Paulo.
Cabral, B., Souza, M., and Claro, D. B. (2022). Portnoie: A neural framework for open information extraction for the portuguese language. In Computational Processing of the Portuguese Language, pages 243–255, Cham. Springer International Publishing.
Chen, X., Li, L., Deng, S., Tan, C., Xu, C., Huang, F., Si, L., Chen, H., and Zhang, N. (2021). Lightner: a lightweight tuning paradigm for low-resource ner via pluggable prompting. arXiv preprint arXiv:2109.00720.
Cui, L., Wu, Y., Liu, J., Yang, S., and Zhang, Y. (2021). Template-based named entity recognition using bart. arXiv preprint arXiv:2106.01760.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Furquim, L. O. d. C. and de Lima, V. L. S. (2012). Clustering and categorization of brazilian portuguese legal documents. In Computational Processing of the Portuguese Language, pages 272–283, Berlin, Heidelberg. Springer Berlin Heidelberg.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models.
Jurafsky, D. and Martin, J. H. (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. Online manuscript released August 20, 2024.
Li, B., Fang, G., Yang, Y., Wang, Q., Ye, W., Zhao, W., and Zhang, S. (2023a). Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness.
Li, Z., Li, X., Liu, Y., Xie, H., Li, J., lee Wang, F., Li, Q., and Zhong, X. (2023b). Label supervised llama finetuning.
Naik, A., Ravichander, A., Rose, C., and Hovy, E. (2019). Exploring numeracy in word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3374–3380.
Nunes, R. O., Spritzer, A. S., Freitas, C. M. D. S., and Balreira, D. G. (2024). Reconhecimento de entidades nomeadas e vazamento de dados em textos legislativos. Linguamática, 16(2):preprint–preprint.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T. L., Raja, A., et al. (2021). Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
Serras, F. R. and Finger, M. (2021). verbert: Automating brazilian case law document multi-label categorization using bert. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 237–246. SBC.
Sundararaman, D., Subramanian, V., Wang, G., Xu, L., and Carin, L. (2022). Number entity recognition. arXiv preprint arXiv:2205.03559.
Touvron, H., Martin, L., Stone, K., and et al., P. A. (2023). Llama 2: Open foundation and fine-tuned chat models.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need.
Wang, X., Zhou, W., Zu, C., Xia, H., Chen, T., Zhang, Y., Zheng, R., Ye, J., Zhang, Q., Gui, T., et al. (2023). Instructuie: Multi-task instruction tuning for unified information extraction. arXiv e-prints, pages arXiv–2304.
Wei, X., Cui, X., Cheng, N., Wang, X., Zhang, X., Huang, S., Xie, P., Xu, J., Chen, Y., Zhang, M., et al. (2023). Zero-shot information extraction via chatting with chatgpt. arXiv e-prints, pages arXiv–2302.
Publicado
29/09/2025
Como Citar
BITELLI, Bruno V.; FINGER, Marcelo.
Numerical information extraction in legal texts using open and closed Large Language Models. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 1197-1208.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.14453.
