LLMs as Tools for Evaluating Textual Coherence: A Comparative Analysis

Resumo


Este estudo avalia o desempenho de Grandes Modelos de Língua (LLMs) recentes, como GPT-4o, GPT-3.5, Claude Opus e LLaMA 2, na análise automática de coerência textual. A pesquisa foca em três aspectos: coerência local, onde GPT-4o e o Claude Opus se destacam; coerência global, na qual Claude Opus e o mais eficaz; e detecção de incoerências, onde GPT-4o apresenta melhor desempenho. Esses resultados revelam as capacidades e limitações dos modelos atuais, contribuindo para o entendimento de suas aplicações no âmbito do Processamento de Línguas Naturais e trazendo avanços contínuos à área.

Palavras-chave: textual coherence, incoherence, comparison, NLP

Referências

Aleixo, P. and Pardo, T. A. S. (2008). Cstnews: Um córpus de textos jornalísticos anotados segundo a teoria discursiva multidocumento cst (cross-document structure theory). Technical Report 326, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos-SP. 12p.

Barzilay, R. and Lapata, M. (2008). Modeling local coherence: An entity-based approach. In Knight, K., Ng, H. T., and Oflazer, K., editors, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 141–148, Ann Arbor, Michigan. Association for Computational Linguistics.

Braz Junior, G. and Fileto, R. (2021). Investigating coherence in posts from a doubts forum in a virtual learning environment with bert. Conference Paper.

Charolles, M. (1978). Introdução aos problemas da coerência dos textos: abordagem teórica e estudo das práticas pedagógicas. Editora Pontes.

Davies, M. (2008). The corpus of contemporary american english (coca). Available online at [link]. [link]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186. Association for Computational Linguistics.

Dias, M. (2016). Investigação de modelos de coerência local para sumários multi-documento. PhD thesis, Universidade de São Paulo.

Elsner, M., Austerweil, J., and Charniak, E. (2007). A unified local and global model for discourse coherence. In Sidner, C., Schultz, T., Stone, M., and Zhai, C., editors, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 436–443, Rochester, New York. Association for Computational Linguistics.

Freitas, A. R. P. (2013). Análise automática de coerência usando o modelo grade de entidades para o português. PhD thesis.

Grosz, B. J., Joshi, A. K., and Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2):203–225.

Halliday, M. A. K. and Hasan, R. (1976). Cohesion in English. Longman.

Hoey, M. (2013). Textual interaction: An introduction to written discourse analysis. Routledge.

Jurafsky, D. and Martin, J. H. (2024). Speech and Language Processing, chapter 23. Draft, 3 edition. Accessed: 2024-10-10.

Koch, I. and Travaglia, L. (2003). A coerência textual. Editora Contexto.

Lai, A. and Tetreault, J. (2018). Discourse coherence in the wild: A dataset evaluation and methods. In Proceedings of SIGdial, pages 214–223.

Lapata, M. and Barzilay, R. (2005). Automatic evaluation of text coherence: models and representations. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05, page 1085–1090, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Lin, Z., Ng, H. T., and Kan, M.Y. (2011). Automatically evaluating text coherence using discourse relations. In Lin, D., Matsumoto, Y., and Mihalcea, R., editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 997–1006, Portland, Oregon, USA. Association for Computational Linguistics.

Mann, W. C. and Thompson, S. A. (1987). Rhetorical structure theory: Description and construction of text structures. In Natural Language Generation, pages 85–95. Springer Netherlands.

Mikkelsen, L. F., Kinch, O., Pedersen, A. J., and Lacroix, O. (2022). Ddisco: A discourse coherence dataset for danish. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), pages 1234–1243.

Naismith, B., Mulcaire, P., and Burstein, J. (2023). Automated evaluation of written discourse coherence using gpt-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 394–403, Online. Association for Computational Linguistics.

Sagi, E. (2010). Discourse structure effects on the global coherence of texts.

Seno, E. R. M. and Rino, L. H. M. (2005). Co-referential chaining for coherent summaries through rhetorical and linguistic modeling. In Proceedings of the Workshop on Crossing Barriers in Text Summarization Research/RANLP, Borovets, Bulgaria. Núcleo Interinstitucional de Linguística Computacional – NILC/USFCAR.

Thompson, I. (1986). Readability beyond the sentence: Global coherence and ease of comprehension. Journal of Technical Writing and Communication, 16(1):131–140.

Van Dijk, T. A. (1977). Text and context: Explorations in the semantics and pragmatics of discourse.
Publicado
17/11/2024
BARBOSA, Bryan K. S.; CAMPELO, Cláudio E. C.. LLMs as Tools for Evaluating Textual Coherence: A Comparative Analysis. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 15. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 278-287. DOI: https://doi.org/10.5753/stil.2024.245379.