Hybrid Summarization for Brazilian Judicial Decisions

Gabriele S. Araújo; Ewaldo E. C. Santana; Fábio M. F. Lobato

doi:10.5753/sbsi.2026.248671

Gabriele S. Araújo UEMA
Ewaldo E. C. Santana UEMA / UFMA
Fábio M. F. Lobato UEMA / UFOPA / USP

DOI: https://doi.org/10.5753/sbsi.2026.248671

Resumo

Research Context: Judicial decisions in Brazil have a complex textual structure, comprising reports, votes, and summaries, which hinders systematic analysis and affects information intelligibility. This also applies to many other countries in the Global South. This situation challenges not only legal professionals but also Information Systems (IS) that support the processing and organization of large volumes of documents. Scientific and/or Practical Problem: Many systems struggle with lengthy texts due to language model limitations and the need to preserve technical accuracy and cohesion. This restricts the development of reliable solutions for judicial activities and transparency. Proposed Solution and/or Analysis: This study proposes and evaluates a hybrid summarization pipeline for decisions from the Federal Supreme Court, integrating extractive and abstractive methods to handle long documents without information loss. Related IS Theory: The research is grounded in Task-Technology Fit (TTF), justifying the alignment between the pipeline and the legal summarization task to support information-processing objectives. Additionally, Design Science Research is employed as the methodology for artifact development and evaluation. Research Method: The pipeline was implemented and tested on the RulingBR corpus, which comprises Brazilian Supreme Court decisions. Five summarization methods were compared: TF-IDF (baseline), BumbaBERT, Gemini-2.5, and two hybrid variants, evaluated using standard metrics (ROUGE, BERTScore, and METEOR). Summary of Results: Hybrid approaches balanced technical accuracy and textual cohesion, while chunked processing expanded the applicability of specialized models to long documents. Results demonstrate practical impact through faster case review and improved jurisprudential research. Contributions and Impact on IS area: The study contributes to IS by proposing strategies that support information governance in organizational environments while considering processing limitations. Aligned with the Grand Challenge of IS in the Open World, it provides guidance for scalable, transparent, and interoperable decision-support systems, improving institutional transparency and supporting legal activities in complex digital ecosystems.

Referências

Alva Principe, R., Chiarini, N., and Viviani, M. (2025). Long document classification in the transformer era: A survey on challenges, advances, and open issues. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(2):e70019.

Araújo, G. S., Jacob Junior, A. F. L., Santana, E. E. C., and Lobato, F. M. F. (2025). The artificial intelligence integration in the brazilian legal sector: A systematic review. In Proceedings of the Brazilian Symposium on Information Systems (SBSI), Recife, PE, Brazil.

Arfat, Y., Colella, M., and Marello, E. (2024). Legal text analysis using large language models. In International Conference on Applications of Natural Language to Information Systems, pages 258–268. Springer.

Bae, S., Kim, T., Kim, J., and Lee, S.-g. (2019). Summary level training of sentence rewriting for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 10–20.

Banerjee, S. and Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, MI, USA.

Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., and Ghosh, S. (2019). A comparative study of summarization algorithms applied to legal case judgments. In Advances in Information Retrieval, pages 413–428. Springer.

Boscarioli, C., de Araujo, R. M., Maciel, R. S., Neto, V. V. G., Oquendo, F., Nakagawa, E. Y., Berrnardini, F. C., Viterbo, J., Vianna, D., Martins, C. B., et al. (2017). I grandsi-br: Grand research challenges in information systems in brazil 2016-2026.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.

Carmo, F. A., Serejo, F., Jacob Junior, A. F. L., Santana, E. E. C., and Lobato, F. M. F. (2024). Bumbabert: Um modelo de linguagem pré-treinado para o domínio jurídico brasileiro. Anais do XI Workshop de Computação Aplicada em Governo Eletrônico, pages 188–199.

Casimiro, J. S. C. and Teixeira, S. T. (2024). Artificial intelligence approaches within the brazilian judiciary’s contemporary jurisdictional model. Beijing L. Rev., 15:730.

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., et al. (2020). Legal-bert: The muppets straight out of law school. In Findings of EMNLP 2020, pages 2898–2904.

CNJ, C. N. d. J. (2021). Diretrizes para elaboração de ementas e indexação de acórdãos. Technical report, CNJ, Brasília. [link].

CNJ, C. N. d. J. (2024). Justiça em números 2024. Technical report, CNJ, Brasília. [link].

de Castro, M. Q. and Ralha, C. G. (2025). Identificando divergências jurisprudenciais com técnicas de inteligência artificial para apoio de sistemas de informaçao judiciais. In Simpósio Brasileiro de Sistemas de Informação (SBSI), pages 289–295. SBC.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.

El-Kassas, W. S., Salama, C. R., Rafea, A. A., and Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165:113679.

Fabbri, A. R., Li, I., She, T., Li, S., and Radev, D. (2019). Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084.

Fama, I., Bueno, B., Alcoforado, A., Ferraz, T., Moya, A., and Costa, A. H. (2024). No argument left behind: Overlapping chunks for faster processing of arbitrarily long legal texts. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 129–138, Porto Alegre, RS, Brasil. SBC.

Feijó, D. V. and Moreira, V. P. (2018). Rulingbr: A summarization dataset for legal texts. In International Conference on Computational Processing of the Portuguese Language, pages 255–264. Springer.

Glenn, H. P. (2014). Legal Traditions of the World: Sustainable Diversity in Law. Oxford University Press, Oxford, UK, 5th edition.

Goodhue, D. L. and Thompson, R. L. (1995). Task-technology fit and individual performance. MIS quarterly, pages 213–236.

Gregor, S. and Hevner, A. R. (2013). Positioning and presenting design science research for maximum impact. MIS Quarterly, 37(2):337–355.

Gupta, S. and Gupta, S. K. (2019). Abstractive summarization: An overview of the state of the art. Expert Systems with Applications, 121:49–65.

Hevner, A. R., March, S. T., Park, J., and Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1):75–105.

Jain, D., Borah, M. D., and Biswas, A. (2024). A sentence is known by the company it keeps: improving legal document summarization using deep clustering. Artificial Intelligence and Law, 32(1):165–200.

Janakiraman, A. and Ghoraani, B. (2025). An empirical comparison of text summarization: A multi-dimensional evaluation of large language models. arXiv preprint arXiv:2504.04534.

Jiang, Z., Yang, J., and Rao, D. (2024). An empirical study of leveraging plms and llms for long-text summarization. In Pacific Rim International Conference on Artificial Intelligence, pages 424–435. Springer.

Kirmani, M., Manzoor Hakak, N., Mohd, M., and Mohd, M. (2018). Hybrid text summarization: a survey. In Soft Computing: Theories and Applications: Proceedings of SoCTA 2017, pages 63–73. Springer.

Koh, H. Y., Ju, J., Liu, M., and Pan, S. (2022). An empirical survey on long document summarization: Datasets, models, and metrics. ACM computing surveys, 55(8):1–35.

Kuş, A. and Acı, Ç. İ. (2024). A hybrid approach to automatic text summarization of turkish texts: Integrating extractive methods with llms. In 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), pages 1–6. IEEE.

Lai, J., Gan, W., Wu, J., Qi, Z., and Yu, P. S. (2024). Large language models in law: A survey. AI Open, 5:181–196.

Lewis, M., Liu, Y., Goyal, N., et al. (2020). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of ACL 2020, pages 7871–7880.

Limsopatham, N. (2021). Effectively leveraging bert for legal document classification. In Proceedings of the natural legal language processing workshop 2021, pages 210–216.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain.

Liu, Y. and Lapata, M. (2019). Text summarization with pretrained encoders. In Proceedings of EMNLP-IJCNLP 2019, pages 3730–3740.

Luz de Araujo, P. H., de Almeida, A. P. G., Ataides Braz, F., Correia da Silva, N., de Barros Vidal, F., and de Campos, T. E. (2023). Sequence-aware multimodal page classification of brazilian legal documents. International Journal on Document Analysis and Recognition (IJDAR), 26(1):33–49.

Moreira, M. C. G. and de Souza Moura, P. N. (2023). A tecnologia como suporte para o judiciário e o acesso à justiça: uma proposta de aplicação no âmbito da violência doméstica. In Simpósio Brasileiro de Sistemas de Informação (SBSI), pages 48–57. SBC.

Pedroso, B. C., Pereira, M. R., and Pereira, D. A. (2025). Performance evaluation of llms in the text-to-sql task in portuguese. In Simpósio Brasileiro de Sistemas de Informação (SBSI), pages 260–269. SBC.

Peffers, K., Tuunanen, T., Rothenberger, M. A., and Chatterjee, S. (2007). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3):45–77.

Polsley, S. D., Jhaver, S., Mukerjee, S., et al. (2016). Casesummarizer: A system for automated summarization of legal texts. In Proceedings of COLING 2016, pages 258–268.

Preti, D., Giannone, C., Favalli, A., and Romagnoli, R. (2024). Automatic summarization of legal texts, extractive summarization using llms. In Ital-IA 2024: 4th National Conference on Artificial Intelligence, Naples, Italy.

Silva Junior, D. d., Oliveira, D. d., and Paes, A. (2025). Evaluating text representations for unsupervised legal semantic textual similarity in brazilian portuguese. Discover Data, 3(1):23.

Silveira, R., Ponte, C., Almeida, V., Pinheiro, V., and Furtado, V. (2023). Legalbert-pt: A pretrained language model for the brazilian portuguese legal domain. In Brazilian Conference on Intelligent Systems, pages 268–282. Springer.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems, pages 403–417. Springer.

Tsirmpas, D., Gkionis, I., Papadopoulos, G. T., and Mademlis, I. (2024). Neural natural language processing for long texts: A survey on classification and summarization. Eng. Appl. Artif. Intell., 133(PC).

Wang, Y. (2024). Design and application of legal information systems based on big data technology. International Journal of Information Systems and Supply Chain Management (IJISSCM), 17(1):1–18.

Zhang, J., Zhao, Y., Saleh, M., and Liu, P. J. (2020). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, pages 11328–11339.

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.

Zhang, Y., Jin, H., Meng, D., Wang, J., and Tan, J. (2025). A comprehensive survey on automatic text summarization with exploration of llm-based methods. arXiv preprint arXiv:2403.02901.

Zigurs, I. and Khazanchi, D. (2008). From profiles to patterns: A new view of task-technology fit. Information systems management, 25(1):8–13.

Hybrid Summarization for Brazilian Judicial Decisions

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)