SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

  • Leandro Carísio Fernandes IPE Digital
  • Gustavo Bartz Guedes UNICAMP
  • Thiago Soares Laitz UNICAMP / Maritaca AI
  • Thales Sales Almeida UNICAMP / Maritaca AI
  • Rodrigo Nogueira UNICAMP / Maritaca AI
  • Roberto Lotufo UNICAMP / NeuralMind
  • Jayr Pereira UNICAMP / UFCA

Resumo


Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.
Publicado
17/11/2024
FERNANDES, Leandro Carísio; GUEDES, Gustavo Bartz; LAITZ, Thiago Soares; ALMEIDA, Thales Sales; NOGUEIRA, Rodrigo; LOTUFO, Roberto; PEREIRA, Jayr. SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 431-444. ISSN 2643-6264.