A Framework for Multi-document Extractive Summarization of Reviews with Aspect-based Sentiment Analysis
Resumo
We propose an integrated framework, named Multi-Document Aspect-based Sentiment Extractive Summarization (MD-ASES for short), to automatically generate extractive review summaries based on aspects of a large database with reviews of items such as films, businesses, and companies. Such summaries are got by extracting a subset of sentences as they are in the reviews, based on some relevance criteria. In MD-ASES, initially sentences are grouped in terms of aspects identified as predominant in the reviews. Then, sentences are selected by the similarity of the sentiment expressed about a particular aspect to the overall sentiment of the dataset reviews. Our results show that MD-ASES can successfully preserve the average sentiment of the reviews while including the most important aspects in the summary.
Referências
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Davies, M. Corpus of news on the web (now): 3+ billion words from 20 countries, updated every day. https://digital.library.unt.edu/ark:/67531/metadc1234358/. Accessed August 25, 2020.
Erkan, G. and Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457–479.
Gupta, S. and Gupta, S. K. (2019). Abstractive summarization: An overview of the state of the art. Expert Systems with Applications, 121:49–65.
Howard, J. and Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
Johnson, R. and Zhang, T. (2017). Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 562–570.
Mallick, C., Das, A. K., Dutta, M., Das, A. K., and Sarkar, A. (2019). Graph-based text summarization using modified textrank. In Soft Computing in Data Analytics, pages 137–146. Springer.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Rtrieval. Cambridge University Press.
Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411.
Moghaddam, S. and Ester, M. (2012). On the design of lda models for aspect-based opinion mining. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pages 803–812.
Musto, C., de Gemmis, M., Semeraro, G., and Lops, P. (2017). A multi-criteria recommender system exploiting aspect-based sentiment analysis of users’ reviews. In Proceedings of the 11th ACM Conference on Recommender Systems, pages 321–325.
Musto, C., Rossiello, G., de Gemmis, M., Lops, P., and Semeraro, G. (2019). Combining text summarization and aspect-based sentiment analysis of users’ reviews to justify recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 383–387.
Nallapati, R., Zhai, F., and Zhou, B. (2017). Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.
Ramos, J. et al. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, volume 242, pages 133–142. Piscataway, NJ.
Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., and Le, Q. V. (2019). Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848.
Yin, W. and Pei, Y. (2015). Optimizing sentence modeling and selection for document summarization. In 24th International Joint Conference on Artificial Intelligence.