Investigating coherence in posts from a question forum in a virtual learning environment with BERT

Abstract


Incoherences can cause difficulties in interpreting discourses and impact the performance of conversational agents and intelligent tutoring systems, among others. Contextualized language models, such as BERT, have not yet been exploited in coherence analysis, despite their proven efficacy in several related tasks. This work employs Portuguese language variations of BERT to classify and measure text coherence. Experiments with news and an educational forum of student questions show that BERT supports sentence order discriminationwith up to 99.20% accuracy and measures of (in)coherence consistent with such classification, being most of the best results for the forum texts.

Keywords: Semantic coherence, Coherence models, Contextualized embeddings, BERT

References

Barzilay, R. and Lapata, M. (2008). Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1):1–34.

Cavalcanti, A., Mello, R., Miranda, P., and Freitas, F. (2020). Análise automática defeedback em ambientes de aprendizagem online. In Anais do XXXI Simp. Bras. de Informática na Educação, pages 892–901, Porto Alegre, RS, Brasil. SBC.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. Conf. North American Chapter of the ACL: Human Language Technologies, Vol, 1, pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics (ACL).

Dias, M. d. S. (2016). Investigação de modelos de coerência local para sumários multi-documento. PhD thesis, Universidade de São Paulo.

Foltz, P. W., Kintsch, W., and Landauer, T. K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2-3):285–307.

Li, J. and Jurafsky, D. (2017). Neural net models of open-domain discourse coherence. In Proc. of the 2017 Conf. on Empirical Methods in Natural Language Processing, pages 198–209, Copenhagen, Denmark. Association for Computational Linguistics.

Mann, W. C. and Thompson, S. A. (1987). Rhetorical structure theory: A theory of text organization. University of Southern California, Information Sciences Institute Los Angeles, Marina del Rey, California.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space.arXiv preprint arXiv:1301.3781.

Mohiuddin, T., Joty, S., and Nguyen, D. T. (2018). Coherence modeling of asynchronous conversations: A neural entity grid approach.arXiv preprint arXiv:1805.02275, pages 558–568.

Neto, S. S. C., Favero, E., dos Santos, J. A., Freitas, S., and Júnior, M. N. (2020). Avaliação automática de redações na língua portuguesa baseada na coleta de atributos e aprendizagem de máquina. In Anais do XXXI Simp. Bras. de Informática na Educação, pages 1162–1171, Porto Alegre, RS, Brasil. SBC.

Oliveira, D., Pozzebon, E., and Santos, T. (2020). Aplicação das técnicas de processamento de linguagem natural cosine similarity e word movers distance para auxiliar na correção de questões discursivas em um tutor inteligente. In Anais do XXXI Simp. Bras. de Informática na Educação, pages 1243–1252, Porto Alegre, RS, Brasil. SBC.

Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for wordrepresentation. In Proc. of the 2014 conf. on empirical methods in natural language processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational.

Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., and Huang, X. (2020). Pre-trained models fornatural language processing: A survey. arXiv preprint arXiv:2003.08271.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese bert-networks. In Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing, pages 3982–3992, Hong Kong, China. ACL.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Brazilian Conference on Intelligent Systems, pages 403–417, Rio Grande, Brazil. Springer, Springer.

Xu, P., Saghir, H., Kang, J. S., Long, T., Bose, A. J., Cao, Y., and Cheung, J. C. K. (2019). A cross-domain transferable neural coherence model. In Proc. 57th Annual Meeting ofthe ACL, pages 678–687, Florence, Italy. Assoc. for Computational Linguistics (ACL).

Zhang, Z., Zhao, H., and Wang, R. (2020). Machine reading comprehension: The role of contextualized language models and beyond. arXiv preprint arXiv:2005.06249,abs/2005.06249.
Published
2021-11-22
BRAZ JUNIOR, Osmar Oliveira; FILETO, Renato. Investigating coherence in posts from a question forum in a virtual learning environment with BERT. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 32. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 749-759. DOI: https://doi.org/10.5753/sbie.2021.217397.