Agrupamento automático de mensagens em fóruns educacionais

  • Fábio Mariano Universidade Federal Rural de Pernambuco
  • Valmir Macário Universidade Federal Rural de Pernambuco
  • Rafael Ferreira Mello Universidade Federal Rural de Pernambuco / Centro de Estudos e Sistemas Avançados do Recife


A internet trouxe inúmeras vantagens quando a questão é facilitar o acesso a informação. Porém, um problema comum que dificulta o acompanhamento dos professores é a sobrecarga de informações. Com intuito de mitigar isto, este artigo realiza agrupamentos utilizando os algoritmos K-Means, K-Medoids e o Aglomerativo em 1652 postagens de 4 fóruns educacionais diferentes de um curso superior agrupando as mensagens semelhantes para auxiliar o professor, lidando com uma quantidade menor de informação. Em cada postagem, extrai características e aplica técnicas de PLN, além de utilizar uma representação vetorial para o texto das postagens. Por fim, avalia a qualidade dos agrupamentos utilizando as métricas: silhueta e Davies-Boulding.
Palavras-chave: Agrupamento, Processamento de Linguagem Natural, Learning Analytics, Fórum de Discussão


Aguiar, R. F., & Prati, R. C. (2015). Incorporação de representação vetorial distribuída de palavras e parágrafos na classificação de SMS SPAM. ENIAC-Encontro Nacional de Inteligência Artificial e Computacional. Natal, Brasil.

Aljalbout, E., Golkov, V., Siddiqui, Y., and Cremers, D. (2018). Clustering with deep learning: Taxonomy and new methods. ArXiv, abs/1801.07648.

Anderson, T. (2009). The Theory and Practice of Online Learning. AU Press, Edmonton, AB, CAN, 2nd edition.

André, M., Mello, R. F., Nascimento, A., Lins, R. D., and Gasevic, D. (2021). Toward automatic classification of online discussion messages for social presence. IEEE Transactions on Learning Technologies, 14(6):802–816.

Balabantaray, R. C., Sarma, C., and Jha, M. (2015). Document clustering using k-means and k-medoids. CoRR, abs/1502.07938.

Berry, M. W. (2003). Survey of Text Mining. Springer-Verlag, Berlin, Heidelberg

Caspi, A., Gorsky, P., and Chajut, E. (2003). The influence of group size on nonmandatory asynchronous instructional discussion groups. Internet and Higher Education, 6(3):227–240.

Cavalcanti, A. P., Barbosa, A., Carvalho, R., Freitas, F., Tsai, Y.-S., Gasevic, D., and Mello, R. F. (2021). Automatic feedback in online learning environments: A systematic literature review. Computers and Education: Artificial Intelligence, 2:100027.

Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science and Technology, 37(1):51–89.

Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227.

Day, W. and Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1):7–24.

Ferreira-Mello, R., André, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, page e1332.

Ferreira Mello, R., Fiorentino, G., Oliveira, H., Miranda, P., Rakovic, M., and Gasevic, D. (2022). Towards automated content analysis of rhetorical structure of written essays using sequential content-independent features in portuguese. In LAK22: 12th International Learning Analytics and Knowledge Conference, pages 404–414.

Gerosa, M. A., Fuks, H., and De Lucena, C. J. P. (2001). Use of categorization and structuring of messages in order to organize the discussion and reduce information overload in asynchronous textual communication tools. In Proceedings Seventh International Workshop on Groupware. CRIWG 2001, pages 136–141.

Guo, H., Ma, J., and Li, Z. (2019). Active semi-supervised k-means clustering based on silhouette coefficient. In Xhafa, F., Patnaik, S., and Tavana, M., editors, Advances in Intelligent, Interactive Systems and Applications, pages 202–209, Cham. Springer International Publishing.

Han, J., Kamber, M., and Pei, J. (2012). Data mining concepts and techniques, third edition.

Hassani, H., Beneki, C., Unger, S., Mazinani, M. T., and Yeganegi, M. R. (2020). Text mining in big data analytics. Big Data and Cognitive Computing, 4(1):1.

Kaufmann, L. and Rousseeuw, P. (1987). Clustering by means of medoids. Data Analysis based on the L1-Norm and Related Methods, pages 405–416.

Kovács, F., Legany, C., and Babos, A. (2006). Cluster validity measurement techniques. Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases

Lim, L. A., Gentili, S., Pardo, A., Kovanović, V., Whitelock-Wainwright, A., Gasevic, D., & Dawson, S. (2021). What changes, and for whom? A study of the impact of learning analytics-based process feedback in a large course. Learning and Instruction, 72, 101202.

Lopez, M., Luna, J. M., Romero, C., and Ventura, S. (2012). Classification via clustering for predicting final marks based on student participation in forums. Proc. of 5th Int. Conf. on Educational Datamining, pages 148–151.

Lopez, M., Luna, J. M., Romero, C., and Ventura, S. (2012). Classification via clustering for predicting final marks based on student participation in forums. Proc. of 5th Int. Conf. on Educational Datamining, pages 148–151.

MacQueen, J. (1967). Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, pages 281–297.

Means, B., Toyama, Y., Murphy, R., Bakia, M., and Jones, K. (2009). Evaluation of evidence-based practices in online learning: A meta-analysis and review of online learning studies. Centre for Learning Technology.

Mello, R. F., Neto, R., Fiorentino, G., Alves, G., Arêdes, V., Silva, J. V. G. F., Falcão, T. P., and Gaševic, D. (2022). Enhancing instructors capability to assess open-response using natural language processing and learning analytics. In European Conference on Technology Enhanced Learning, pages 102–115. Springer.

Morilhas, L. J. (2009). The expansion of distance learning (dl) in brazilian higher education: Trends for the beginning of the next decade. Future Studies Research Journal: Trends and Strategies, 1(1):66–88.

Nason, M. (2006). Learning together online: Research on asynchronous learning networks. Education and Information Technologies, 11:191–192.

Pardo, A., Jovanovic, J., Dawson, S., Gaševic, D., and Mirriahi, N. (2019). Using learning analytics to scale the provision of personalised feedback. British Journal of Educational Technology, 50(1):128–138.

Passero, G., Ferreira, R., and Dazzi, R. L. S. (2019). Off-topic essay detection: A comparative study on the portuguese language. Revista Brasileira de Informática na Educação, 27(03):177–190.

Pinheiro, A., Ferreira, R., Ferreira, M. A., Rolim, V., Freitas, F., and Gasevic, D. (2019). An analysis of the use of good feedback practices in online learning courses.

Ramos, J., Rodrigues, R., Sedraz, J., Gomes, A., and Silva, R. (2016). A comparative study between clustering methods in educational data mining. IEEE Latin America Transactions, 14:3755.

Rolim, V., Mello, R. F., and Lins, R. D. (2020). Análise de discussões em fóruns educacionais usando mineração de texto e análise de grafos. Sociedade Brasileira de Computação

Singh, V. K., Tiwari, N., and Garg, S. (2011). Document clustering using k-means, heuristic k-means and fuzzy c-means. In 2011 International Conference on Computational Intelligence and Communication Networks, pages 297–301.

Wever, B. D., Schellens, T., Valcke, M., and Keer, H. V. (2006). Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Computers Education, 46(1):6 – 28. Methodological Issues in Researching CSCL.

Wulf, J., Blohm, I., Brenner, W., and Leimeister, J. M. (2014). Massive open online courses. Business Information Systems Engineering, 6:111–114.
Como Citar

Selecione um Formato
MARIANO, Fábio; MACÁRIO, Valmir; FERREIRA MELLO, Rafael. Agrupamento automático de mensagens em fóruns educacionais. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE), 33. , 2022, Manaus. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 798-809. DOI: