A Framework for Multi-document Extractive Summarization of Reviews with Aspect-based Sentiment Analysis

  • André Oliveira Universidade de São Paulo
  • Anna Costa Universidade de São Paulo
  • Eduardo Hruschka Universidade de São Paulo

Resumo


We propose an integrated framework, named Multi-Document Aspect-based Sentiment Extractive Summarization (MD-ASES for short), to automatically generate extractive review summaries based on aspects of a large database with reviews of items such as films, businesses, and companies. Such summaries are got by extracting a subset of sentences as they are in the reviews, based on some relevance criteria. In MD-ASES, initially sentences are grouped in terms of aspects identified as predominant in the reviews. Then, sentences are selected by the similarity of the sentiment expressed about a particular aspect to the overall sentiment of the dataset reviews. Our results show that MD-ASES can successfully preserve the average sentiment of the reviews while including the most important aspects in the summary.

Palavras-chave: Machine Learning, Text and Web mining, Natural Language Processing, Decision Support Systems, Data Science

Referências

Carbonell, J. and Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335–336.

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

Davies, M. Corpus of news on the web (now): 3+ billion words from 20 countries, updated every day. https://digital.library.unt.edu/ark:/67531/metadc1234358/. Accessed August 25, 2020.

Erkan, G. and Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457–479.

Gupta, S. and Gupta, S. K. (2019). Abstractive summarization: An overview of the state of the art. Expert Systems with Applications, 121:49–65.

Howard, J. and Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.

Johnson, R. and Zhang, T. (2017). Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 562–570.

Mallick, C., Das, A. K., Dutta, M., Das, A. K., and Sarkar, A. (2019). Graph-based text summarization using modified textrank. In Soft Computing in Data Analytics, pages 137–146. Springer.

Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Rtrieval. Cambridge University Press.

Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411.

Moghaddam, S. and Ester, M. (2012). On the design of lda models for aspect-based opinion mining. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pages 803–812.

Musto, C., de Gemmis, M., Semeraro, G., and Lops, P. (2017). A multi-criteria recommender system exploiting aspect-based sentiment analysis of users’ reviews. In Proceedings of the 11th ACM Conference on Recommender Systems, pages 321–325.

Musto, C., Rossiello, G., de Gemmis, M., Lops, P., and Semeraro, G. (2019). Combining text summarization and aspect-based sentiment analysis of users’ reviews to justify recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 383–387.

Nallapati, R., Zhai, F., and Zhou, B. (2017). Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.

Ramos, J. et al. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, volume 242, pages 133–142. Piscataway, NJ.

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., and Le, Q. V. (2019). Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848.

Yin, W. and Pei, Y. (2015). Optimizing sentence modeling and selection for document summarization. In 24th International Joint Conference on Artificial Intelligence.
Publicado
20/10/2020
OLIVEIRA, André; COSTA, Anna; HRUSCHKA, Eduardo. A Framework for Multi-document Extractive Summarization of Reviews with Aspect-based Sentiment Analysis. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 17. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 471-482. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2020.12152.