LLMs as Tools for Evaluating Textual Coherence: A Comparative Analysis

Abstract


This study evaluate recent Large Language Models (LLMs), such as GPT-4o, GPT-3.5, Claude Opus, and LLaMA 2, for their ability to analyze textual coherence. The research focuses on three areas: local coherence, where models like GPT-4o and Claude Opus excel; global coherence, where Claude Opus is most effective; and incoherence detection, where GPT-4o demonstrates strong performance. These findings reveal both the capabilities and areas for improvement in current models, shedding light on their potential applications in natural language processing, paving the way for improvements in the field.

Keywords: textual coherence, incoherence, comparison, NLP

References

Aleixo, P. and Pardo, T. A. S. (2008). Cstnews: Um córpus de textos jornalísticos anotados segundo a teoria discursiva multidocumento cst (cross-document structure theory). Technical Report 326, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos-SP. 12p.

Barzilay, R. and Lapata, M. (2008). Modeling local coherence: An entity-based approach. In Knight, K., Ng, H. T., and Oflazer, K., editors, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 141–148, Ann Arbor, Michigan. Association for Computational Linguistics.

Braz Junior, G. and Fileto, R. (2021). Investigating coherence in posts from a doubts forum in a virtual learning environment with bert. Conference Paper.

Charolles, M. (1978). Introdução aos problemas da coerência dos textos: abordagem teórica e estudo das práticas pedagógicas. Editora Pontes.

Davies, M. (2008). The corpus of contemporary american english (coca). Available online at [link]. [link]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186. Association for Computational Linguistics.

Dias, M. (2016). Investigação de modelos de coerência local para sumários multi-documento. PhD thesis, Universidade de São Paulo.

Elsner, M., Austerweil, J., and Charniak, E. (2007). A unified local and global model for discourse coherence. In Sidner, C., Schultz, T., Stone, M., and Zhai, C., editors, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 436–443, Rochester, New York. Association for Computational Linguistics.

Freitas, A. R. P. (2013). Análise automática de coerência usando o modelo grade de entidades para o português. PhD thesis.

Grosz, B. J., Joshi, A. K., and Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2):203–225.

Halliday, M. A. K. and Hasan, R. (1976). Cohesion in English. Longman.

Hoey, M. (2013). Textual interaction: An introduction to written discourse analysis. Routledge.

Jurafsky, D. and Martin, J. H. (2024). Speech and Language Processing, chapter 23. Draft, 3 edition. Accessed: 2024-10-10.

Koch, I. and Travaglia, L. (2003). A coerência textual. Editora Contexto.

Lai, A. and Tetreault, J. (2018). Discourse coherence in the wild: A dataset evaluation and methods. In Proceedings of SIGdial, pages 214–223.

Lapata, M. and Barzilay, R. (2005). Automatic evaluation of text coherence: models and representations. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05, page 1085–1090, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Lin, Z., Ng, H. T., and Kan, M.Y. (2011). Automatically evaluating text coherence using discourse relations. In Lin, D., Matsumoto, Y., and Mihalcea, R., editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 997–1006, Portland, Oregon, USA. Association for Computational Linguistics.

Mann, W. C. and Thompson, S. A. (1987). Rhetorical structure theory: Description and construction of text structures. In Natural Language Generation, pages 85–95. Springer Netherlands.

Mikkelsen, L. F., Kinch, O., Pedersen, A. J., and Lacroix, O. (2022). Ddisco: A discourse coherence dataset for danish. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), pages 1234–1243.

Naismith, B., Mulcaire, P., and Burstein, J. (2023). Automated evaluation of written discourse coherence using gpt-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 394–403, Online. Association for Computational Linguistics.

Sagi, E. (2010). Discourse structure effects on the global coherence of texts.

Seno, E. R. M. and Rino, L. H. M. (2005). Co-referential chaining for coherent summaries through rhetorical and linguistic modeling. In Proceedings of the Workshop on Crossing Barriers in Text Summarization Research/RANLP, Borovets, Bulgaria. Núcleo Interinstitucional de Linguística Computacional – NILC/USFCAR.

Thompson, I. (1986). Readability beyond the sentence: Global coherence and ease of comprehension. Journal of Technical Writing and Communication, 16(1):131–140.

Van Dijk, T. A. (1977). Text and context: Explorations in the semantics and pragmatics of discourse.
Published
2024-11-17
BARBOSA, Bryan K. S.; CAMPELO, Cláudio E. C.. LLMs as Tools for Evaluating Textual Coherence: A Comparative Analysis. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 15. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 278-287. DOI: https://doi.org/10.5753/stil.2024.245379.