Classificação Automática da Presença Social em Discussões Online Escritas em Português
Resumo
Este trabalho apresenta um método que permite a classificação automática das mensagens trocadas em fóruns online de ensino a distância escritas em português brasileiro de acordo com as categorias da presença social. Para atingir esse objetivo, o método proposto faz uso de um conjunto de 116 características extraídas de técnicas de mineração de texto e contagem de palavras como o LIWC e Coh-Metrix. O classificador com melhor desempenho obteve 0.97% e 0.95% para acurácia e cohen kappa, respectivamente. Este trabalho também fornece uma análise da natureza da presença social, observando as características de classificação que foram mais relevantes para distinguir cada uma das três categorias.
Palavras-chave:
Presença Social, Modelo de Comunidade de Investigação, Discussões Online, Classificação de Texto
Referências
Balage Filho, P., Pardo, T. A. S., and Aluísio, S. (2013). An evaluation of the brazilian portuguese liwc dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
Barros, Maria das Graças e Carvalho, A. B. G. (2011). As concepções de interatividade nos ambientes virtuais de aprendizagem. Campina Grande: EDUEPB.
Bauer, M. W. (2007). Content analysis. an introduction to its methodology–by klaus krippendorff from words to numbers. narrative, data and social science–by roberto franzosi. The British Journal of Sociology, 58(2):329–331.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Fernandez-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1):3133–3181.
Ferreira, M., Rolim, V., Mello, R. F., Lins, R. D., Chen, G., and Gasevic, D. (2020). Towards automatic content analysis of social presence in transcripts of online discussions. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pages 141–150.
Ferreira-Mello, R., Andre, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6):e1332.
Garrison, D. R., Anderson, T., and Archer, W. (1999). Critical inquiry in a text-based environment: Computer conferencing in higher education. The internet and higher education, 2(2-3):87–105.
Garrison, D. R., Anderson, T., and Archer, W. (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. American Journal of distance education, 15(1):7–23.
He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284.
Kovanovic, V., Joksimovic, S., Gasevic, D., and Hatala, M. (2014). What is the source of social capital? the association between social network position and social presence in communities of inquiry. In Proceedings of the Workshops held at Educational Data Mining 2014 co-located with 7th International Conference on Educational Data Mining (EDM 2014). Citeseer.
Kovanovic, V., Joksimovi´c, S., Waters, Z., Gasevic, D., Kitto, K., Hatala, M., and Siemens, G. (2016). Towards automated content analysis of discussion transcripts: A cognitive presence case. In Proceedings of the sixth international conference on learning analytics & knowledge, pages 15–24.
Neto, V., Rolim, V., Ferreira, R., Kovanovic, V., Gasevic, D., Lins, R. D., and Lins, R. (2018). Automated analysis of cognitive presence in online discussions written in portuguese. In European conference on technology enhanced learning, pages 245–261. Springer.
Orengo, V. M. and Huyck, C. R. (2001). A stemming algorithmm for the portuguese language. In spire, volume 8, pages 186–193.
Palloff, R. M. and Pratt, K. (2004). O aluno virtual-um guia para trabalhar com estudantes on-line. Penso Editora.
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.
Scarton, C., Gasperin, C., and Aluisio, S. (2010). Revisiting the readability assessment of texts in portuguese. In Ibero-American Conference on Artificial Intelligence, pages 306–315. Springer.
Soares, F. B. M., Machado, C. J. R., Diniz, D., and Maciel, A. M. A. (2016). Educational data mining to support distance learning students with difficulties in the portuguese grammar. In Anais do XXVII Simpósio Brasileiro de Informática na Educação (SBIE 2016), pages 956–965, Brasil.
Strijbos, J.-W., Martens, R. L., Prins, F. J., and Jochems, W. M. (2006). Content analysis: What are they talking about? Computers & Education, 46(1):29–48.
Suhang, J., Williams, A., Schenke, K., Warschauer, M., and Odowd, D. (2014). Predicting mooc performance with week 1 behavior. Educational Data Mining.
Barros, Maria das Graças e Carvalho, A. B. G. (2011). As concepções de interatividade nos ambientes virtuais de aprendizagem. Campina Grande: EDUEPB.
Bauer, M. W. (2007). Content analysis. an introduction to its methodology–by klaus krippendorff from words to numbers. narrative, data and social science–by roberto franzosi. The British Journal of Sociology, 58(2):329–331.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Fernandez-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1):3133–3181.
Ferreira, M., Rolim, V., Mello, R. F., Lins, R. D., Chen, G., and Gasevic, D. (2020). Towards automatic content analysis of social presence in transcripts of online discussions. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pages 141–150.
Ferreira-Mello, R., Andre, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6):e1332.
Garrison, D. R., Anderson, T., and Archer, W. (1999). Critical inquiry in a text-based environment: Computer conferencing in higher education. The internet and higher education, 2(2-3):87–105.
Garrison, D. R., Anderson, T., and Archer, W. (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. American Journal of distance education, 15(1):7–23.
He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284.
Kovanovic, V., Joksimovic, S., Gasevic, D., and Hatala, M. (2014). What is the source of social capital? the association between social network position and social presence in communities of inquiry. In Proceedings of the Workshops held at Educational Data Mining 2014 co-located with 7th International Conference on Educational Data Mining (EDM 2014). Citeseer.
Kovanovic, V., Joksimovi´c, S., Waters, Z., Gasevic, D., Kitto, K., Hatala, M., and Siemens, G. (2016). Towards automated content analysis of discussion transcripts: A cognitive presence case. In Proceedings of the sixth international conference on learning analytics & knowledge, pages 15–24.
Neto, V., Rolim, V., Ferreira, R., Kovanovic, V., Gasevic, D., Lins, R. D., and Lins, R. (2018). Automated analysis of cognitive presence in online discussions written in portuguese. In European conference on technology enhanced learning, pages 245–261. Springer.
Orengo, V. M. and Huyck, C. R. (2001). A stemming algorithmm for the portuguese language. In spire, volume 8, pages 186–193.
Palloff, R. M. and Pratt, K. (2004). O aluno virtual-um guia para trabalhar com estudantes on-line. Penso Editora.
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.
Scarton, C., Gasperin, C., and Aluisio, S. (2010). Revisiting the readability assessment of texts in portuguese. In Ibero-American Conference on Artificial Intelligence, pages 306–315. Springer.
Soares, F. B. M., Machado, C. J. R., Diniz, D., and Maciel, A. M. A. (2016). Educational data mining to support distance learning students with difficulties in the portuguese grammar. In Anais do XXVII Simpósio Brasileiro de Informática na Educação (SBIE 2016), pages 956–965, Brasil.
Strijbos, J.-W., Martens, R. L., Prins, F. J., and Jochems, W. M. (2006). Content analysis: What are they talking about? Computers & Education, 46(1):29–48.
Suhang, J., Williams, A., Schenke, K., Warschauer, M., and Odowd, D. (2014). Predicting mooc performance with week 1 behavior. Educational Data Mining.
Publicado
24/11/2020
Como Citar
TEIXEIRA, Jean Barros; COSTA, Evandro de Barros; DIONÍSIO, Máverick; NASCIMENTO, André Câmara; MELLO, Rafael Ferreira Leite de.
Classificação Automática da Presença Social em Discussões Online Escritas em Português. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE), 31. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 942-951.
DOI: https://doi.org/10.5753/cbie.sbie.2020.942.