A Multi-document Summarization System for News Articles in Portuguese using Integer Linear Programming

  • Laerth Gomes Centro Universitário de João Pessoa
  • Hilário de Oliveira Instituto Federal do Espírito Santo

Resumo


Automatic Text Summarization (ATS) has been demanding intense research in recent years. Its importance is given the fact that ATS systems can aid in the processing of large amounts of textual documents. The ATS task aims to create a summary of one or more documents by extracting their most relevant information. Despite the existence of several works, researches involving the development of ATS systems for documents written in Brazilian Portuguese are still a few. In this paper, we propose a multi-document summarization system following a concept-based approach using Integer Linear Programming for the generation of summaries from news articles written in Portuguese. Experiments using the CSTNews corpus were performed to evaluate different aspects of the proposed system. The experimental results obtained regarding the ROUGE measures demonstrate that the developed system presents encourage results, outperforming other works of the literature.

Palavras-chave: Automatic Text Summarization, Multi-document Summarization, Integer Linear Programming, CSTNews

Referências

Cardoso, P. C. and Pardo, T. A. (2016). Multi-document summarization using semantic discourse models. Procesamiento del Lenguaje Natural, (56):57–64.

Castro Jorge, M. L. d. R. and Pardo, T. A. S. (2010). Experiments with cst-based multidocument summarization. In Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-5, pages 74–82, Stroudsburg, PA, USA. Association for Computational Linguistics.

de Oliveira, H. T. A., Lins, R. D., Lima, R., Freitas, F., and Simske, S. J. (2018). A concept-based ILP approach for multi-document summarization exploring centrality and position. In 7th Brazilian Conference on Intelligent Systems, BRACIS 2018, São Paulo, Brazil, October 22-25, 2018, pages 37–42.

Dias, M. S., Garay, A. Y. B., Chuman, C., Barros, C. D., Maziero, E. G., Nobrega, F. A. A., Souza, J. W. C., Cabezudo, M. A. S., Delege, M., Jorge, M. L. R. C., Silva, N. L., Cardoso, P. C. F., Balage Filho, P. P., Condori, R. E. L., Marcasso, V., Felippo, A. d., Nunes, M. d. G. V., and Pardo, T. A. S. (2014). Enriquecendo o córpus csnews: a criação de novos sumários multidocumento. In International Conference on Computational Processing of the Portuguese Language - PROPOR. SBC.

Gambhir, M. and Gupta, V. (2016). Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, pages 1–66.

Gillick, D., Favre, B., Hakkani-Tür, D., Bohnet, B., Liu, Y., and Xie, S. (2009). The ICSI/UTD summarization system at TAC 2009. In Proceedings of the Second Text Analysis Conference, TAC 2009, Gaithersburg, Maryland, USA, November 16-17, 2009.

Gupta, S. and Gupta, S. (2018). Abstractive summarization: An overview of the state of the art. Expert Systems with Applications, 121.

Hong, K., Conroy, J. M., Favre, B., Kulesza, A., Lin, H., and Nenkova, A. (2014). A repository of state of the art and competitive baseline summaries for generic news summarization. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014., pages 1608–1616.

Khan, A., Salim, N., and Kumar, Y. J. (2015). A framework for multi-document abstractive summarization based on semantic role labelling. Applied Soft Computing, 30:737 – 747.

Li, C., Xu, W., Li, S., and Gao, S. (2018). Guiding generation for abstractive text summarization based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 55–60, New Orleans, Louisiana. Association for Computational Linguistics.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In MarieFrancine Moens, S. S., editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Linhares Pontes, E., Huet, S., Gouveia da Silva, T., Linhares, A. c., and Torres-Moreno, J.-M. (2018). Multi-sentence compression with word vertex-labeled graphs and integer linear programming. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), pages 18–27, New Orleans, Louisiana, USA. Association for Computational Linguistics.

Nenkova, A. and McKeown, K. (2012). A survey of text summarization techniques. In Aggarwal, C. C. and Zhai, C., editors, Mining Text Data, pages 43–76. Springer.

Ouyang, Y., Li, W., Zhang, R., Li, S., and Lu, Q. (2013). A progressive sentence selection strategy for document summarization. Information Processing & Management, 49(1):213 – 221.

Owczarzak, K. and Dang, H. T. (2011). Overview of the TAC 2011 summarization track: Guided task and AESOP task. In Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA, November.

Pardo, T. A. S. (2005). Gistsumm-gist summarizer: Extensoes e novas funcionalidades. Série de Relatórios do NILC.

Tohalino, J. V. and Amancio, D. R. (2017). Extractive multi-document summarization using dynamical measurements of complex networks. In 2017 Brazilian Conference on Intelligent Systems (BRACIS), pages 366–371.
Publicado
15/10/2019
Como Citar

Selecione um Formato
GOMES, Laerth; OLIVEIRA, Hilário de. A Multi-document Summarization System for News Articles in Portuguese using Integer Linear Programming. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 622-633. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9320.