Siamese Network-Based Prioritization for Enhanced Multi-document Summarization
Resumo
Methods for document summarization hold significance in numerous applications, particularly in scenarios involving extensive content, such as news and social media monitoring. Nevertheless, enhancing the task of summarizing multiple documents remains an area for improvement. Systems that generate summaries from multiple documents must take into account additional challenges such as document redundancies and inconsistencies. On the other hand, as multi-document databases contain multiple descriptions of the same content, they offer benefits that go beyond the availability of more content and we believe that this benefit has not been explored in the literature. Different authors may write different descriptions of the same event, some more objective, others more detailed, or even with different terms or writing styles. As a way to take advantage of this availability of different descriptions, we present a new approach to evaluating documents in order to identify those that can contribute most effectively to the generation of automatic summaries. For this, we employed a Siamese network, which was trained using the ROUGE metric observed in individual documents. Additionally, we demonstrated how to apply the outcomes of document evaluation to different summarization techniques. Our experiments included a comparison with SOTA approaches (LeadSum, TextRank, PacSum, BertSum) in multiple datasets encompassing news, events, Wikipedia texts, and scientific publications (Multi-News, WCEP, WikiSum, arXiv, Multi-XScience) and the results indicated our approach had a relevant improvement in the production of summaries, with statistical significance in the evaluation with Wilcoxon rank sum for a confidence factor of 95%.
Publicado
17/11/2024
Como Citar
GARCIA, Klaifer; BERTON, Lilian.
Siamese Network-Based Prioritization for Enhanced Multi-document Summarization. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 400-415.
ISSN 2643-6264.