Multidocument opinion summarization for portuguese: comparing a graph-based method with an LLM
Abstract
In this paper, we explore a graph-based method for multidocument opinion summarization for Portuguese. The method, which consists of an updated version of the well-known Opinosis method (Ganesan et al., 2010), has its results compared to those produced by a large language model, Mistral, when performing the same task for a small corpus.
References
Correa, N.K.; Falk, S.; Fatimah, S.; Sen, A.; Oliveira, N. (2024). TeenyTinyLlama: opensource tiny language models trained in Brazilian Portuguese. Machine Learning With Applications, Vol. 16.
de Marneffe, M.C.; Manning, C.D.; Nivre, J.; Zeman, D. (2021). Universal Dependencies. Computational Linguistics, Vol. 47, N. 2, pp. 255-308.
Duran, M.S.; Lopes, L.; Nunes, M.G.V.; Pardo, T.A.S. (2023). The Dawn of the Porttinari Multigenre Treebank: Introducing its Journalistic Portion. In the Proceedings of the 14th Symposium in Information and Human Language Technology (STIL), pp. 115-124.
Ganesan, K.; Zhai, C.; Han, J. (2010). Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In the Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 340-348.
Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L.R.; Lachaux, M.A.; Stock, P.; Scao, T.L.; Lavril, T.; Wang, T.; Lacroix, T.; Sayed, W.E. (2023). Mistral 7B. Disponível em [link]. Acesso em 23 de junho de 2025.
Lin, C.W. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In the Proceedings of the Text Summarization Branches Out Workshop, pp. 74-81.
López Condori, R.E.; Avanço, L.V.; Balage Filho, P.P.; Bokan Garan, A.Y.; Cardoso, P.C.F.; Dias, M.S.; Nóbrega, F.A.A.; Sobrevilla Cabezudo, M.A.; Souza, J.W.C.; Zacarias, A.C.I.; Seno, E.M.R.; Di Felippo, A.; Pardo, T.A.S. (2015). A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects. In the Proceedings of the 9th Linguistic Annotation Workshop (LAW), pp. 62-71.
Mistral AI Team (2024). Mistral 7B Instruct v0.3. [S. l.]: Hugging Face, 2024. 1 modelo de linguagem. Disponível em [link]. Acesso em 23 de junho de 2025.
Pardo, T.A.S.; Duran, M.S.; Lopes, L.; Di Felippo, A.; Roman, N.T.; Nunes, M.G.V. (2021). Porttinari a large multi-genre treebank for brazilian portuguese. In the Proceedings of the XIII Symposium in Information and Human Language (STIL), pp. 110.
Souza, J.W.C.; Cardoso, P.C.F.; Paixão, C.A. (2024). Sumarização Automática. In H. M. Caseli e M. G. V. Nunes (eds), Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. 3a edição, BPLN.
Straka, M. (2018). UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In the Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), pp. 197-207.
Vasilyev, O.; Dharnidharka, V.; Bohannon J. (2020). Fill in the BLANC: Human-free quality estimation of document summaries. In the Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, p. 11-20.
de Marneffe, M.C.; Manning, C.D.; Nivre, J.; Zeman, D. (2021). Universal Dependencies. Computational Linguistics, Vol. 47, N. 2, pp. 255-308.
Duran, M.S.; Lopes, L.; Nunes, M.G.V.; Pardo, T.A.S. (2023). The Dawn of the Porttinari Multigenre Treebank: Introducing its Journalistic Portion. In the Proceedings of the 14th Symposium in Information and Human Language Technology (STIL), pp. 115-124.
Ganesan, K.; Zhai, C.; Han, J. (2010). Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In the Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 340-348.
Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L.R.; Lachaux, M.A.; Stock, P.; Scao, T.L.; Lavril, T.; Wang, T.; Lacroix, T.; Sayed, W.E. (2023). Mistral 7B. Disponível em [link]. Acesso em 23 de junho de 2025.
Lin, C.W. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In the Proceedings of the Text Summarization Branches Out Workshop, pp. 74-81.
López Condori, R.E.; Avanço, L.V.; Balage Filho, P.P.; Bokan Garan, A.Y.; Cardoso, P.C.F.; Dias, M.S.; Nóbrega, F.A.A.; Sobrevilla Cabezudo, M.A.; Souza, J.W.C.; Zacarias, A.C.I.; Seno, E.M.R.; Di Felippo, A.; Pardo, T.A.S. (2015). A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects. In the Proceedings of the 9th Linguistic Annotation Workshop (LAW), pp. 62-71.
Mistral AI Team (2024). Mistral 7B Instruct v0.3. [S. l.]: Hugging Face, 2024. 1 modelo de linguagem. Disponível em [link]. Acesso em 23 de junho de 2025.
Pardo, T.A.S.; Duran, M.S.; Lopes, L.; Di Felippo, A.; Roman, N.T.; Nunes, M.G.V. (2021). Porttinari a large multi-genre treebank for brazilian portuguese. In the Proceedings of the XIII Symposium in Information and Human Language (STIL), pp. 110.
Souza, J.W.C.; Cardoso, P.C.F.; Paixão, C.A. (2024). Sumarização Automática. In H. M. Caseli e M. G. V. Nunes (eds), Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. 3a edição, BPLN.
Straka, M. (2018). UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In the Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), pp. 197-207.
Vasilyev, O.; Dharnidharka, V.; Bohannon J. (2020). Fill in the BLANC: Human-free quality estimation of document summaries. In the Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, p. 11-20.
Published
2025-09-29
How to Cite
LIMA, Gustavo Sampaio; SILVA, Davi Fagundes Ferreira da; PARDO, Thiago Alexandre Salgueiro.
Multidocument opinion summarization for portuguese: comparing a graph-based method with an LLM. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 698-702.
DOI: https://doi.org/10.5753/stil.2025.37874.
