A Generic Extractive Multi-document Text Summarization Method Using Memetic Algorithm and Combinatorial Optimization

  • Alysson Guimarães UFS
  • Methanias Colaço Junior UFS / UFRN

Resumo


Research Context:Automatic text summarization remains a subject of considerable relevance across multiple domains. In particular, extractive multi-document generic summarization has garnered increased attention due to its capacity to mitigate information overload in a wide range of applications. Scientific and/or Practical Problem: The volume of unstructured text data produced on the internet has grown exponentially in recent years, driven by advances in information and communication technologies (ICTs). This massive generation of data makes it difficult for users to find relevant information. Proposed Solution and/or Analysis: This study introduces, implements, and applies the memetic algorithm known as Holistic Text Summarization with the Shuffled Frog-Leaping Algorithm (HSSFLA) to address the generic extractive multi-document multi-language text summarization problem using combinatorial optimization techniques. Related IS Theory: This research integrates swarm intelligence, memetic algorithms and combinatorial optimization. Research Method: An in vitro experiment was conducted to quantitatively compare the summary quality between the proposed method and similar methods in the literature. Summary of Results: Experiments were carried out on the DUC2001/2002 benchmark datasets, and performance was evaluated using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric. The results demonstrate that the proposed approach yielded an average improvement of 25.12% in ROUGE-1 and 34.91% in ROUGE-2 on the DUC 2001 dataset. On the DUC2002 dataset, the method achieved average gains of 35.42% in ROUGE-1 and 36.08% in ROUGE-2. Contributions and Impact to IS area: HSSFLA, a memetic algorithm based on swarm intelligence, was developed to solve this problem for the first time. It creates holistic summaries, in which it evaluates the quality of the summary as a whole, rather than focusing exhaustively on finding the best individual sentences. HSSFLA outperforms the results of the scientific literature in DUC2001 and DUC2002.

Referências

Abbasi-ghalehtaki, R., Khotanlou, H., and Esmaeilpour, M. (2016). Fuzzy evolutionary cellular learning automata model for text summarization. Swarm and Evolutionary Computation, 30:11 – 26. Cited by: 52.

Alguliev, R. M., Aliguliyev, R. M., and Isazade, N. R. (2012). Desamc+docsum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization. Knowledge-Based Systems, 36:21 – 38. Cited by: 56.

Alguliyev, R. M., Aliguliyev, R. M., and Isazade, N. R. (2015). An unsupervised approach to generating generic summaries of documents. Applied Soft Computing Journal, 34:236 – 250. Cited by: 44.

Alqaisi, R., Ghanem, W., and Qaroush, A. (2020). Extractive multi-document arabic text summarization using evolutionary multi-objective optimization with k-medoid clustering. IEEE Access, 8:228206 – 228224. Cited by: 34; All Open Access, Gold Open Access.

ChatGPT (2025). Chatgpt. Disponível em: [link]. Acesso em: janeiro de 2025.

DUC (2024). Document understanding conference.

El-Kassas, W. S., Salama, C. R., Rafea, A. A., and Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165:113679.

Eusuff, M., Lansey, K., and Pasha, F. (2006). Shuffled frog-leaping algorithm: A memetic meta-heuristic for discrete optimization. Engineering Optimization, 38(2):129–154. Published online: 25 Jan 2007, Received: 29 Sep 2004.

Gomes, L. and Oliveira, H. (2019). A multi-document summarization system for news articles in portuguese using integer linear programming. In Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pages 622–633, Porto Alegre, RS, Brasil. SBC.

Huang, L., He, Y., Wei, F., and Li, W. (2010). Modeling document summarization as multi-objective optimization. page 382 – 386. Cited by: 42.

Jorge, G. A. Z., Bezerra, D. A., Xavier, C. C., and Pardo, T. A. S. (2025). Multilingual extractive summarization: Investigating state-of-the-art methods for english and brazilian portuguese. In Paes, A. and Verri, F. A. N., editors, Intelligent Systems, pages 212–223, Cham. Springer Nature Switzerland.

Khurana, A. and Bhatnagar, V. (2022). Investigating entropy for extractive document summarization. Expert Systems with Applications, 187:115820.

Kumar, Y. J., Salim, N., Abuobieda, A., and Albaham, A. T. (2014). Multi document summarization based on news components using fuzzy cross-document relations. Applied Soft Computing Journal, 21:265 – 279. Cited by: 34.

Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Mendoza, M., Bonilla, S., Noguera, C., Cobos, C., and León, E. (2014). Extractive single-document summarization based on genetic operators and guided local search. Expert Systems with Applications, 41(9):4158 – 4169. Cited by: 114.

Saini, N., Saha, S., Jangra, A., and Bhattacharyya, P. (2019). Extractive single document summarization using multi-objective optimization: Exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowledge-Based Systems, 164:45 – 67. Cited by: 61.

Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523.

Sanchez-Gomez, J. M., Vega-Rodríguez, M. A., and Pérez, C. J. (2018). Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems, 159:1 – 8. Cited by: 84.

Sanchez-Gomez, J. M., Vega-Rodríguez, M. A., and Pérez, C. J. (2020). A decomposition-based multi-objective optimization approach for extractive multi-document text summarization. Applied Soft Computing Journal, 91. Cited by: 39.

Sanchez-Gomez, J. M., Vega-Rodríguez, M. A., and Pérez, C. J. (2022). A multi-objective memetic algorithm for query-oriented text summarization: Medicine texts as a case study. Expert Systems with Applications, 198:116769.

Sanchez-Gomez, J. M., Vega-Rodríguez, M. A., and Pérez, C. J. (2024). An indicator-based multi-objective variable neighborhood search approach for query-focused summarization. Swarm and Evolutionary Computation, 91. Cited by: 0.

Sarmento, M. and de Oliveira, H. (2024). Sumarização automática de artigos de notícias em português: Da extração à abstração com abordagens clássicas e modelos de neurais. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 139–148, Porto Alegre, RS, Brasil. SBC.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423.

Tomer, M. and Kumar, M. (2022). Multi-document extractive text summarization based on firefly algorithm. Journal of King Saud University - Computer and Information Sciences, 34(8, Part B):6057–6065.

Verma, P., Verma, A., and Pal, S. (2022). An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Applied Soft Computing, 120. Cited by: 33.
Publicado
25/05/2026
GUIMARÃES, Alysson; COLAÇO JUNIOR, Methanias. A Generic Extractive Multi-document Text Summarization Method Using Memetic Algorithm and Combinatorial Optimization. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 22. , 2026, Vitória/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 81-100. DOI: https://doi.org/10.5753/sbsi.2026.248295.