Multi-Document Summarization Using Complex and Rich Features

  • Maria Lucía del Rosario Castro Jorge USP
  • Verônica Agostini USP
  • Thiago Alexandre Salgueiro Pardo USP


Multi-document summarization consists in automatically producing a unique informative summary from a collection of texts on the same topic. In this paper we model the multi-document summarization task as a problem of machine learning classification where sentences from the source texts have to be classified as belonging or not to the summary. For this aim, we combine superficial (e.g., sentence position in the text) and deep linguistic features (e.g. semantic relations across documents). In particular, the linguistic features are given by CST (Cross-document Structure Theory). We conduct our experiments on a CST-annotated corpus of news texts. Results show that linguistic features help to produce a better classification model, producing state-of-the-art results.


JORGE, Maria Lucía del Rosario Castro; AGOSTINI, Verônica; PARDO, Thiago Alexandre Salgueiro. Multi-Document Summarization Using Complex and Rich Features. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 8. , 2011, Natal/RN.

