A Long Texts Summarization Approach to Scientific Articles
Resumo
Automatic text summarization aims at condensing the contents of a text into a simple and descriptive summary. Summarization techniques drastically benefited from the recent advances in Deep Learning. Nevertheless, these techniques are still unable to properly deal with long texts. In this work, we investigate whether the combination of summaries extracted from multiple sections of long scientific texts may enhance the quality of the summary for the whole document. We conduct experiments on a real world corpus to assess the effectiveness of our proposal. The results show that our multi-section proposal is as good as summaries generated using the entire text as input and twice as good as single section.Referências
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), 2018.
M. Ding, C. Zhou, H. Yang, and J. Tang. Cogltx: Applying bert to long texts. Advances in Neural Information Processing Systems, 33, 2020.
Y. Dong, A. M. Romascanu, and J. C. K. Cheung. Discourse-aware unsupervised sumIn Proceedings of the 16th Conference of marization for long scientific documents. the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1089–1102, 2021.
G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479, 2004.
A. Gidiotis and G. Tsoumakas. A divide-and-conquer approach to the summarization of long documents. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:3029–3040, 2020.
H. Kane, M. Y. Kocyigit, A. Abdalla, P. Ajanoh, and M. Coulibali. Nubia: Neural based interchangeability assessor for text generation. pages 28–37. Association for Computational Linguistics, 2020.
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
R. Mihalcea and P. Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004.
D. Miller. Leveraging bert for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165, 2019.
R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang, et al. Abstractive text summarization In Proceedings of The 20th SIGNLL using sequence-to-sequence rnns and beyond. Conference on Computational Natural Language Learning, page 280–290, 2016.
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.
J. M. Sanchez-Gomez, M. A. Vega-Rodríguez, and C. J. Pérez. Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems, 159:1–8, 2018.
C. M. Souza, M. R. Meireles, and P. E. Almeida. A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics, 126(1):135–156, 2021.
L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova. Beyond sumbasic: Taskfocused summarization with sentence simplification and lexical expansion. Information Processing & Management, 43(6):1606–1618, 2007.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, . Kaiser, and I. Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
D. Xiao, H. Zhang, Y. Li, Y. Sun, H. Tian, H. Wu, and H. Wang. Ernie-gen: An enhanced multi-ow pre-training and fine-tuning framework for natural language generation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 3997–4003, 2020.
J. Xu, Z. Gan, Y. Cheng, and J. Liu. Discourse-aware neural extractive text summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5021–5031. Association for Computational Linguistics, 2019.
J. Zhang, Y. Zhao, M. Saleh, and P. Liu. Pegasus: Pre-training with extracted gapIn International Conference on Machine sentences for abstractive summarization. Learning, pages 11328–11339. PMLR, 2020.
M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang. Extractive summarization as text matching. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, page 6197–6208. Association for Computational Linguistics, 2020.
M. Ding, C. Zhou, H. Yang, and J. Tang. Cogltx: Applying bert to long texts. Advances in Neural Information Processing Systems, 33, 2020.
Y. Dong, A. M. Romascanu, and J. C. K. Cheung. Discourse-aware unsupervised sumIn Proceedings of the 16th Conference of marization for long scientific documents. the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1089–1102, 2021.
G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479, 2004.
A. Gidiotis and G. Tsoumakas. A divide-and-conquer approach to the summarization of long documents. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:3029–3040, 2020.
H. Kane, M. Y. Kocyigit, A. Abdalla, P. Ajanoh, and M. Coulibali. Nubia: Neural based interchangeability assessor for text generation. pages 28–37. Association for Computational Linguistics, 2020.
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
R. Mihalcea and P. Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004.
D. Miller. Leveraging bert for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165, 2019.
R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang, et al. Abstractive text summarization In Proceedings of The 20th SIGNLL using sequence-to-sequence rnns and beyond. Conference on Computational Natural Language Learning, page 280–290, 2016.
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.
J. M. Sanchez-Gomez, M. A. Vega-Rodríguez, and C. J. Pérez. Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems, 159:1–8, 2018.
C. M. Souza, M. R. Meireles, and P. E. Almeida. A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics, 126(1):135–156, 2021.
L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova. Beyond sumbasic: Taskfocused summarization with sentence simplification and lexical expansion. Information Processing & Management, 43(6):1606–1618, 2007.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, . Kaiser, and I. Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
D. Xiao, H. Zhang, Y. Li, Y. Sun, H. Tian, H. Wu, and H. Wang. Ernie-gen: An enhanced multi-ow pre-training and fine-tuning framework for natural language generation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 3997–4003, 2020.
J. Xu, Z. Gan, Y. Cheng, and J. Liu. Discourse-aware neural extractive text summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5021–5031. Association for Computational Linguistics, 2019.
J. Zhang, Y. Zhao, M. Saleh, and P. Liu. Pegasus: Pre-training with extracted gapIn International Conference on Machine sentences for abstractive summarization. Learning, pages 11328–11339. PMLR, 2020.
M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang. Extractive summarization as text matching. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, page 6197–6208. Association for Computational Linguistics, 2020.
Publicado
29/11/2021
Como Citar
SOUZA, Cinthia M.; VIMIEIRO, Renato.
A Long Texts Summarization Approach to Scientific Articles. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13. , 2021, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 182-189.
DOI: https://doi.org/10.5753/stil.2021.17797.