Domain-adaptive T5 for structured information extraction in Brazilian legislative texts using Semantic Role Labels

  • Frederico Thiers Dutra de Oliveira da Silva UNIRIO
  • Ana Cristina Bicharra Garcia UNIRIO

Resumo


This paper presents a preliminary investigation into the effectiveness of generative sequence-to-sequence architectures for extracting structured normative information from Brazilian legislative amendments. We evaluate the capacity of a compact, domain-adapted T5 model to map complex legal provisions into a functional Semantic Role Labeling (SRL) schema. By fine-tuning a Portuguese T5-base model on a specialized corpus, our initial results suggest that this text-to-structure approach can reconstruct regulatory intent with higher juridical fidelity than larger, zero-shot general-purpose LLMs. The findings suggests that, in highly conventionalized legal settings, domain-aligned supervision may be a more significant driver for successful extraction than model scale alone. This study provides early evidence for computationally efficient alternatives that preserve institutional drafting patterns, laying the groundwork for more robust scaling in legislative text processing.

Referências

Araujo, G. and Silveira, R. (2025). Análise comparativa do bert e chatgpt no reconhecimento de entidades nomeadas do domínio jurídico. Revista Eletrônica de Iniciação Científica em Computação, 23:63–68.

Athan, T., Boley, H., Governatori, G., Palmirani, M., Paschke, A., and Wyner, A. (2013). Oasis legalruleml. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law, ICAIL ’13, page 3–12. ACM.

Avgerinos Loutsaris, M., Alexopoulos, C., Maratsi, M. I., and Charalabidis, Y. (2023). Se- mantic interoperability for legal information: Mapping the european legislation identifier (eli) and akoma ntoso (akn) ontologies. In Proceedings of the 16th International Conference on Theory and Practice of Electronic Governance, ICEGOV 2023, page 41–53. ACM.

Batista, R. et al. (2021). Reconhecimento de entidades nomeadas em textos jurídicos em português. Revista de Informática Teórica e Aplicada.

Braz, F. A., da Silva, N. C., de Campos, T. E., Chaves, F. B. S., Ferreira, M. H. S., Inazawa, P. H., Coelho, V. H. D., Sukiennik, B. P., de Almeida, A. P. G. S., Vidal, F. B., Bezerra, D. A., Gusmao, D. B., Ziegler, G. G., Fernandes, R. V. C., Zumblick, R., and Peixoto, F. H. (2018). Document classification using a bi-lstm to unclog brazil’s supreme court.

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., and Androutsopoulos, I. (2020). Legal-bert: The muppets straight out of law school. In Findings of EMNLP. Association for Computational Linguistics.

Deng, S., Ma, Y., Zhang, N., Cao, Y., and Hooi, B. (2022). Information extraction in low-resource scenarios: Survey and perspective.

Humphreys, L., Boella, G., van der Torre, L., Robaldo, L., Di Caro, L., Ghanavati, S., and Muthuri, R. (2020). Populating legal ontologies using semantic role labeling. Artificial Intelligence and Law, 29(2):171–211.

Nguyen, C., Nguyen, P., Tran, T., Nguyen, D., Trieu, A., Pham, T., Dang, A., and Nguyen, L.-M. (2024). Captain at coliee 2023: Efficient methods for legal information retrieval and entailment tasks.

Palmirani, M., Governatori, G., Rotolo, A., Tabet, S., Boley, H., and Paschke, A. (2011). LegalRuleML: XML-Based Rules and Norms, page 298–312. Springer Berlin Heidel- berg.

Qin, W. and Luo, X. (2024). A legal fact-finding model based on the t5 and lexilaw large language models. In Proceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence, CSAI 2024, page 229–237. ACM.

Rabelo, J., Goebel, R., Kim, M.-Y., Kano, Y., Yoshioka, M., and Satoh, K. (2022). Overview and discussion of the competition on legal information extraction/entailment (coliee) 2021. The Review of Socionetwork Strategies, 16(1):111–133.

Vitória, J. et al. (2025). Avaliação de modelos sentence-bert para recuperação de informação legislativa. Revista de Estudos Legislativos.

Vitório, D., Souza, E., Dos Santos, J. A., De Carvalho, A. C. P. d. L. F., Oliveira, A. L. I., and F. da Silva, N. F. (2025). Bm25 x vila sésamo: avaliando modelos sentence-bert para recuperação de informação no cenário legislativo brasileiro. Linguamática, 17(1):17–33.
Publicado
25/05/2026
SILVA, Frederico Thiers Dutra de Oliveira da; GARCIA, Ana Cristina Bicharra. Domain-adaptive T5 for structured information extraction in Brazilian legislative texts using Semantic Role Labels. In: TRILHA DE NOVAS IDEIAS E RESULTADOS EMERGENTES EM SI - DESENHOS DE PESQUISA - SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 22. , 2026, Vitória/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 191-196. DOI: https://doi.org/10.5753/sbsi_estendido.2026.249094.