Sumarização de opinião multidocumento para o português: comparando um método baseado em grafo com um LLM

Gustavo Sampaio Lima; Davi Fagundes Ferreira da Silva; Thiago Alexandre Salgueiro Pardo

doi:10.5753/stil.2025.37874

Gustavo Sampaio Lima USP
Davi Fagundes Ferreira da Silva USP
Thiago Alexandre Salgueiro Pardo USP

DOI: https://doi.org/10.5753/stil.2025.37874

Resumo

Neste artigo, exploramos um método baseado em grafo para sumarização de opinião multidocumento para o português. O método, que consiste em uma versão atualizada do conhecido Opinosis (Ganesan et al., 2010), tem seus resultados comparados aos produzidos por um grande modelo de língua, o Mistral, ao realizar a mesma tarefa para um pequeno corpus.

Referências

Correa, N.K.; Falk, S.; Fatimah, S.; Sen, A.; Oliveira, N. (2024). TeenyTinyLlama: opensource tiny language models trained in Brazilian Portuguese. Machine Learning With Applications, Vol. 16.

de Marneffe, M.C.; Manning, C.D.; Nivre, J.; Zeman, D. (2021). Universal Dependencies. Computational Linguistics, Vol. 47, N. 2, pp. 255-308.

Duran, M.S.; Lopes, L.; Nunes, M.G.V.; Pardo, T.A.S. (2023). The Dawn of the Porttinari Multigenre Treebank: Introducing its Journalistic Portion. In the Proceedings of the 14th Symposium in Information and Human Language Technology (STIL), pp. 115-124.

Ganesan, K.; Zhai, C.; Han, J. (2010). Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In the Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 340-348.

Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L.R.; Lachaux, M.A.; Stock, P.; Scao, T.L.; Lavril, T.; Wang, T.; Lacroix, T.; Sayed, W.E. (2023). Mistral 7B. Disponível em [link]. Acesso em 23 de junho de 2025.

Lin, C.W. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In the Proceedings of the Text Summarization Branches Out Workshop, pp. 74-81.

López Condori, R.E.; Avanço, L.V.; Balage Filho, P.P.; Bokan Garan, A.Y.; Cardoso, P.C.F.; Dias, M.S.; Nóbrega, F.A.A.; Sobrevilla Cabezudo, M.A.; Souza, J.W.C.; Zacarias, A.C.I.; Seno, E.M.R.; Di Felippo, A.; Pardo, T.A.S. (2015). A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects. In the Proceedings of the 9th Linguistic Annotation Workshop (LAW), pp. 62-71.

Mistral AI Team (2024). Mistral 7B Instruct v0.3. [S. l.]: Hugging Face, 2024. 1 modelo de linguagem. Disponível em [link]. Acesso em 23 de junho de 2025.

Pardo, T.A.S.; Duran, M.S.; Lopes, L.; Di Felippo, A.; Roman, N.T.; Nunes, M.G.V. (2021). Porttinari a large multi-genre treebank for brazilian portuguese. In the Proceedings of the XIII Symposium in Information and Human Language (STIL), pp. 110.

Souza, J.W.C.; Cardoso, P.C.F.; Paixão, C.A. (2024). Sumarização Automática. In H. M. Caseli e M. G. V. Nunes (eds), Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. 3a edição, BPLN.

Straka, M. (2018). UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In the Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), pp. 197-207.

Vasilyev, O.; Dharnidharka, V.; Bohannon J. (2020). Fill in the BLANC: Human-free quality estimation of document summaries. In the Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, p. 11-20.