FREEsum: A Conceptual Framework for Evaluating Text Summarization Approaches

  • Lucas V. Alves Instituto Atlântico
  • Cleilton L. Rocha Instituto Atlântico

Resumo


Research Context: The increasing amount of digital information in various areas, combined with the use of large language models (LLMs) in intelligent systems, chatbots, and LLM-based agents, underscores the importance of automatic text summarization for enhancing the efficiency of information processing and dissemination. Scientific and/or Practical Problem: Although advances in summarization techniques have been made, practitioners face difficulties in systematically comparing strategies, as automatic evaluations rely on metrics and tools with limited scope and insufficient integration with reproducible experimental pipelines, which creates challenges for rigorous benchmarking and the transparent selection of the best solutions. Proposed Solution and/or Analysis: This work proposes FREEsum—a conceptual, reproducible framework that enables the creation of end-to-end pipelines for automatic summarization evaluation, covering all stages of the experiment from data ingestion to results analysis, through declarative workflows and traceable artifacts, thus enabling systematic experimentation and direct comparison of summarization strategies. Related IS Theory: The framework applies Design Science Research principles and open-science practices, ensuring transparency, traceability, and reproducibility in Information Systems experiments. Research Method: FREEsum was developed and validated by implementing a configurable, reproducible experimental pipeline, including a proof-of-concept that demonstrates its applicability in realistic evaluation scenarios. Summary of Results: Experiments show that FREEsum standardizes benchmarking, streamlines configuration, supports method-and-metric trade-off analysis, and facilitates auditing across all experimental stages. Contributions and Impact to IS area: This work contributes to the field of Information Systems by connecting AI techniques for summarization (LLMand NLP-based) with core IS concerns, such as transparency, governance, and technological and economic impacts. A standardized, auditable, and reusable infrastructure for reliable and extensible summarization evaluation across domains, addressing IS demands for empirical rigor and transparent experimentation and supporting evidence-based adoption of AI methods, particularly in large-scale and complex-text settings where summarization mitigates information overload and improves interpretability.

Referências

Arslan, M., Ghanem, H., Munawar, S., and Cruz, C. (2024). A survey on rag with llms. Procedia computer science, 246:3781–3790.

Bhat, I. K., Mohd, M., and Hashmy, R. (2017). Sumitup: A hybrid single-document text summarizer. In Soft Computing: Theories and Applications: Proceedings of SoCTA 2016, Volume 1, pages 619–634. Springer.

Blei, D., Ng, A., and Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3.

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

Cardoso, P. C., Maziero, E. G., Jorge, M. L. C., Seno, E. M., Di Felippo, A., Rino, L. H. M., Nunes, M. d. G. V., and Pardo, T. A. (2011). Cstnews-a discourse-annotated corpus for single and multi-document summarization of news texts in brazilian portuguese. In Proceedings of the 3rd RST Brazilian Meeting, pages 88–105. sn.

Condori, R. L., Pardo, T., Avanço, L., Balage Filho, P., Bokan, A., Cardoso, P., Dias, M., Nóbrega, F., Cabezudo, M., Souza, J., et al. (2015). A qualitative analysis of a corpus of opinion summaries based on aspects. In Proceedings of the 9th Linguistic Annotation Workshop, pages 62–71.

Cranganu-Cretu, B., Chen, Z., Uchimoto, T., and Miya, K. (2001). Automatic text summarizing based on sentence extraction: A statistical approach. International Journal of Applied Electromagnetics and Mechanics, 13(1-4):19–23.

de Oliveira, A. A., Frade, S., Vieira-Marques, P., Jacinto, T. A. Q., Homem-Silva, P., Lemos-Sebasteão, S., de Oliveira, C. M. M., Ferreira, L. F., Rocha, J. C. O., Cruz-Correia, R. J., et al. (2025). Challenges and strategies in the implementation of the international patient summary in accordance with international standards: A systematic review. Simpósio Brasileiro de Sistemas de Informação (SBSI), pages 211–220.

Deutsch, D. and Roth, D. (2020). SacreROUGE: An open-source library for using and developing summarization evaluation metrics. In Park, E. L., Hagiwara, M., Milajevs, D., Liu, N. F., Chauhan, G., and Tan, L., editors, Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pages 120–125, Online. Association for Computational Linguistics.

El-Kassas, W. S., Salama, C. R., Rafea, A. A., and Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert systems with applications, 165:113679.

Fabbri, A. R., Kryściński, W., McCann, B., Xiong, C., Socher, R., and Radev, D. (2021). Summeval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9:391–409.

Gambhir, M. and Gupta, V. (2017). Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 47(1):1–66.

Ghinassi, I., Catalano, L., and Colella, T. (2024). Efficient aspect-based summarization of climate change reports with small language models. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 123–139.

Goyal, R., Kumar, P., and Singh, V. (2023). A systematic survey on automated text generation tools and techniques: application, evaluation, and challenges. Multimedia Tools and Applications, 82(28):43089–43144.

Gupta, V. and Lehal, G. S. (2010). A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2(3):258–268.

Hou, L., Hu, P., and Bei, C. (2018). Abstractive document summarization via neural model with joint attention. In Natural Language Processing and Chinese Computing: 6th CCF International Conference, NLPCC 2017, Dalian, China, November 8–12, 2017, Proceedings 6, pages 329–338. Springer.

Jorge, G. A. Z., Bezerra, D. A., Xavier, C. C., and Pardo, T. A. S. (2024). Multilingual extractive summarization: Investigating state-of-the-art methods for english and brazilian portuguese. In Brazilian Conference on Intelligent Systems, pages 212–223. Springer.

Lavie, A., Sagae, K., and Jayaraman, S. (2004). The significance of recall in automatic metrics for mt evaluation. In Conference of the Association for Machine Translation in the Americas, pages 134–143. Springer.

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.

Liu, Y. and Lapata, M. (2019). Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3730–3740.

Mahajani, A., Pandya, V., Maria, I., and Sharma, D. (2019). A comprehensive survey on extractive and abstractive techniques for text summarization. Ambient Communications and Computer Systems: RACCCS-2018, pages 339–351.

Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411.

Ni, A., Azerbayev, Z., Mutuma, M., Feng, T., Zhang, Y., Yu, T., Awadallah, A. H., and Radev, D. (2021). SummerTime: Text summarization toolkit for non-experts. In Adel, H. and Shi, S., editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 329–338, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.

Pardo, T. A. S. and Rino, L. H. M. (2003). Temário: Um corpus para sumarização automática de textos. São Carlos: Universidade de São Carlos, Relatório Técnico.

Parveen, D., Mesgar, M., and Strube, M. (2016). Generating coherent summaries of scientific articles using coherence patterns. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 772–783.

Rodríguez-Ortega, M., Rodríguez-Lopez, E., Lima-López, S., Escolano, C., Melero, M., Pratesi, L., Vigil-Giménez, L., Fernandez, L., Farré-Maduell, E., and Krallinger, M. (2025). Overview of multiclinsum task at bioasq 2025: evaluation of clinical case summarization strategies for multiple languages: data, evaluation, resources and results. In CLEF.

Rydning, D. R.-J. G.-J., Reinsel, J., and Gantz, J. (2018). The digitization of the world from edge to core. Framingham: International Data Corporation, 16:1–28.

Shukla, A., Bhattacharya, P., Poddar, S., Mukherjee, R., Ghosh, K., Goyal, P., and Ghosh, S. (2022). Legal case document summarization: Extractive and abstractive methods and their evaluation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1048–1064.

Song, H., Su, H., Shalyminov, I., Cai, J., and Mansour, S. (2024). FineSurE: Fine-grained summarization evaluation using LLMs. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 906–922, Bangkok, Thailand. Association for Computational Linguistics.

Tsirmpas, D., Gkionis, I., Papadopoulos, G. T., and Mademlis, I. (2024). Neural natural language processing for long texts: A survey on classification and summarization. Engineering Applications of Artificial Intelligence, 133:108231.

Vilca, G. C. V. and Cabezudo, M. A. S. (2017). A study of abstractive summarization using semantic representations and discourse level information. In International Conference on Text, Speech, and Dialogue, pages 482–490. Springer.

Zhang, H., Yu, P. S., and Zhang, J. (2024). A systematic survey of text summarization: From statistical methods to large language models. ACM Computing Surveys.

Zhang, H., Yu, P. S., and Zhang, J. (2025). A systematic survey of text summarization: From statistical methods to large language models. ACM Computing Surveys, 57(11):1–41.

Zhang*, T., Kishore*, V., Wu*, F., Weinberger, K. Q., and Artzi, Y. (2020). Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
Publicado
25/05/2026
ALVES, Lucas V.; ROCHA, Cleilton L.. FREEsum: A Conceptual Framework for Evaluating Text Summarization Approaches. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 22. , 2026, Vitória/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 871-889. DOI: https://doi.org/10.5753/sbsi.2026.248662.