Evaluating Regression Algorithms for Automatic Text Summarization in Brazilian Portuguese

  • Lucas Sodré Centro Universitário de João Pessoa
  • Hilário de Oliveira Instituto Federal do Espírito Santo

Abstract


Automatic Text Summarization (ATS) is a prominent research area, which aims is to automatically create a summary containing the most relevant information from one or more documents. One of the main challenges of ATS is to identify the most relevant information that should be included in the summary to be generated. This paper aims to analyze the application of regression algorithms to estimate the sentence relevance score of a collection of news articles written in Brazilian Portuguese in the ATS task. Experiments were performed to evaluate different sentence scoring methods, regression algorithms, and compare the results obtained with other works in the literature. The experimental results showed that the Bayesian regression algorithm obtained the best results based on the ROUGE evaluation measures, reaching a coverage rate of 62.09\%.

Keywords: Automatic Text Summarization, Multi-document summarization, Sentence Scoring Methods, Regression algorithms

References

Cardoso, P. C. and Pardo, T. A. (2016). Multi-document summarization using semantic discourse models. Procesamiento del Lenguaje Natural, (56):57–64.

Castro Jorge, M. L. d. R. and Pardo, T. A. S. (2010). Experiments with cst-based multidocument summarization. In Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-5, pages 74–82, Stroudsburg, PA, USA. Association for Computational Linguistics.

Dias, M. S., Garay, A. Y. B., Chuman, C., Barros, C. D., Maziero, E. G., Nobrega, F. A. A., Souza, J. W. C., Cabezudo, M. A. S., Delege, M., Jorge, M. L. R. C., Silva, N. L., Cardoso, P. C. F., Balage Filho, P. P., Condori, R. E. L., Marcasso, V., Felippo, A. d., Nunes, M. d. G. V., and Pardo, T. A. S. (2014). Enriquecendo o córpus csnews: a criação de novos sumários multidocumento. In International Conference on Computational Processing of the Portuguese Language - PROPOR. SBC.

Leite, D. S. and Rino, L. H. (2008). Combining multiple features for automatic text summarization through machine learning. In Proceedings of the 8th International Conference on Computational Processing of the Portuguese Language, PROPOR ’08, pages 122–132, Berlin, Heidelberg. Springer-Verlag.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In MarieFrancine Moens, S. S., editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Lloret, E., Plaza, L., and Aker, A. (2017). The challenging task of summary evaluation: an overview. Language Resources and Evaluation.

Nenkova, A. and McKeown, K. (2012). A survey of text summarization techniques. In Aggarwal, C. C. and Zhai, C., editors, Mining Text Data, pages 43–76. Springer.

Oliveira, H., Ferreira, R., Lima, R., Lins, R. D., Freitas, F., Riss, M., and Simske, S. J. (2016). Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Syst. Appl., 65(C):68–86.

Pardo, T. A. S. (2005). Gistsumm-gist summarizer: Extensões e novas funcionalidades. Série de Relatórios do NILC.
Published
2019-10-15
SODRÉ, Lucas; OLIVEIRA, Hilário de. Evaluating Regression Algorithms for Automatic Text Summarization in Brazilian Portuguese. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 634-645. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9321.