Structuring Information from Initial Petitions Using LLMs: A Study in Brazilian Courts of Justice
Abstract
This study proposes the use of Large Language Models to extract structured information from Brazilian legal petitions. We guided four models (from the Gemini and Gemma3 families) to generate structured JSON outputs, providing a comparative performance benchmark for this novel task. The main contribution is a validated workflow, where results show Gemini models significantly outperform Gemma in capturing complex semantic data. This work establishes a robust evaluation and an important methodological baseline for a critical task in Brazilian legal Natural Language Processing (NLP).References
Adhikary, S., Sen, P., Roy, D., and Ghosh, K. (2024). A case study for automated attribute extraction from legal documents using large language models. Artificial Intelligence and Law, pages 1–22.
Breton, J., Billami, M. M., Chevalier, M., Nguyen, H. T., Satoh, K., Trojahn, C., and Zin, M. M. (2025). Leveraging llms for legal terms extraction with limited annotated data. Artificial Intelligence and Law, pages 1–27.
Chalkidis, I., Jana, A., Hartung, D., Bommarito, M., Androutsopoulos, I., Katz, D. M., and Aletras, N. (2021). Lexglue: A benchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976.
Conselho Nacional de Justiça (2025). Atos normativos - cnj. Last accessed 2025/04/13.
de Aquino, I. V., dos Santos, M. M., Dorneles, C. F., and Carvalho, J. T. (2024). Extracting information from brazilian legal documents with retrieval augmented generation. In Proceedings of the Brazilian Symposium on Databases (SBBD), pages 280–287. SBC.
Google DeepMind (2025). Gemini continues to improve its ability to reason. Published March 2025, last accessed 2025/04/13.
Hussain, A. S. and Thomas, A. (2024). Large language models for judicial entity extraction: A comparative study. arXiv preprint arXiv:2407.05786.
Jayatilleke, N., Weerasinghe, R., and Senanayake, N. (2024). Advancements in natural language processing for automatic text summarization. In Proceedings of the 2024 4th International Conference on Computer Systems (ICCS), pages 74–84. IEEE.
LM Studio (2025). Lm studio – run local llms, no api keys required. Last accessed 2025/04/13.
Ma, Z., Chen, A. R., Kim, D. J., Chen, T.-H., and Wang, S. (2024). Llmparser: An exploratory study on using large language models for log parsing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13.
Ngo, H. Q., Nguyen, H. D., and Le-Khac, N.-A. (2023). Building legal knowledge map repository with nlp toolkits. In Proceedings of the 12th Conference on Information Technology and Its Applications (CITA 2023), volume 734 of Lecture Notes in Networks and Systems, pages 25–36. Springer.
Presidência da República (2018). Lei nº 13.709, de 14 de agosto de 2018 — lei geral de proteção de dados pessoais (lgpd). Last accessed 2025/04/13.
Sakiyama, K., Montanari, R., Junior, R. M., Nogueira, R., and Romero, R. A. F. (2023). Exploring text decoding methods for portuguese legal text generation. In Proceedings of the Brazilian Conference on Intelligent Systems, pages 63–77. Springer.
Souza, F., Souza, R., Neves, M., and Moreira, V. (2023). Legalbert-pt: Pre-trained language model for portuguese legal text. In Proceedings of the Brazilian Conference on Intelligent Systems. Springer.
Supriyono, W., Wibawa, A. P., Suyono, and Kurniawan, F. (2024). A survey of text summarization: Techniques, evaluation and challenges. Natural Language Processing Journal, 7:100070.
Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ramé, A., Rivière, M., Rouillard, L., and et.al., T. M. (2025). Gemma 3 technical report.
Wan, Z., Zhang, Y., Wang, Y., Cheng, F., and Kurohashi, S. (2024). Reformulating domain adaptation of large language models as adapt-retrieve-revise: A case study on chinese legal domain. arXiv preprint arXiv:2310.03328.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, page 24824.
Widyassari, A. P., Rustad, S., Shidik, G. F., Noersasongko, E., Syukur, A., Affandy, A., and Setiadi, D. R. I. M. (2022). Review of automatic text summarization techniques & methods. Journal of King Saud University - Computer and Information Sciences, 34(4):1029–1046.
Zin, M. M., Satoh, K., and Borges, G. (2024). Leveraging llm for identification and extraction of normative statements. In Legal Knowledge and Information Systems, pages 215–225. IOS Press.
Breton, J., Billami, M. M., Chevalier, M., Nguyen, H. T., Satoh, K., Trojahn, C., and Zin, M. M. (2025). Leveraging llms for legal terms extraction with limited annotated data. Artificial Intelligence and Law, pages 1–27.
Chalkidis, I., Jana, A., Hartung, D., Bommarito, M., Androutsopoulos, I., Katz, D. M., and Aletras, N. (2021). Lexglue: A benchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976.
Conselho Nacional de Justiça (2025). Atos normativos - cnj. Last accessed 2025/04/13.
de Aquino, I. V., dos Santos, M. M., Dorneles, C. F., and Carvalho, J. T. (2024). Extracting information from brazilian legal documents with retrieval augmented generation. In Proceedings of the Brazilian Symposium on Databases (SBBD), pages 280–287. SBC.
Google DeepMind (2025). Gemini continues to improve its ability to reason. Published March 2025, last accessed 2025/04/13.
Hussain, A. S. and Thomas, A. (2024). Large language models for judicial entity extraction: A comparative study. arXiv preprint arXiv:2407.05786.
Jayatilleke, N., Weerasinghe, R., and Senanayake, N. (2024). Advancements in natural language processing for automatic text summarization. In Proceedings of the 2024 4th International Conference on Computer Systems (ICCS), pages 74–84. IEEE.
LM Studio (2025). Lm studio – run local llms, no api keys required. Last accessed 2025/04/13.
Ma, Z., Chen, A. R., Kim, D. J., Chen, T.-H., and Wang, S. (2024). Llmparser: An exploratory study on using large language models for log parsing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13.
Ngo, H. Q., Nguyen, H. D., and Le-Khac, N.-A. (2023). Building legal knowledge map repository with nlp toolkits. In Proceedings of the 12th Conference on Information Technology and Its Applications (CITA 2023), volume 734 of Lecture Notes in Networks and Systems, pages 25–36. Springer.
Presidência da República (2018). Lei nº 13.709, de 14 de agosto de 2018 — lei geral de proteção de dados pessoais (lgpd). Last accessed 2025/04/13.
Sakiyama, K., Montanari, R., Junior, R. M., Nogueira, R., and Romero, R. A. F. (2023). Exploring text decoding methods for portuguese legal text generation. In Proceedings of the Brazilian Conference on Intelligent Systems, pages 63–77. Springer.
Souza, F., Souza, R., Neves, M., and Moreira, V. (2023). Legalbert-pt: Pre-trained language model for portuguese legal text. In Proceedings of the Brazilian Conference on Intelligent Systems. Springer.
Supriyono, W., Wibawa, A. P., Suyono, and Kurniawan, F. (2024). A survey of text summarization: Techniques, evaluation and challenges. Natural Language Processing Journal, 7:100070.
Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ramé, A., Rivière, M., Rouillard, L., and et.al., T. M. (2025). Gemma 3 technical report.
Wan, Z., Zhang, Y., Wang, Y., Cheng, F., and Kurohashi, S. (2024). Reformulating domain adaptation of large language models as adapt-retrieve-revise: A case study on chinese legal domain. arXiv preprint arXiv:2310.03328.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, page 24824.
Widyassari, A. P., Rustad, S., Shidik, G. F., Noersasongko, E., Syukur, A., Affandy, A., and Setiadi, D. R. I. M. (2022). Review of automatic text summarization techniques & methods. Journal of King Saud University - Computer and Information Sciences, 34(4):1029–1046.
Zin, M. M., Satoh, K., and Borges, G. (2024). Leveraging llm for identification and extraction of normative statements. In Legal Knowledge and Information Systems, pages 215–225. IOS Press.
Published
2025-09-29
How to Cite
ESASHIKA, Rhedson; FIGUEIREDO, Carlos M. S.; MELO, Tiago de.
Structuring Information from Initial Petitions Using LLMs: A Study in Brazilian Courts of Justice. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 546-557.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.13876.
