Automatic Classification of Social Presence in Online Discussions Written in Portuguese
Abstract
This work presents a method that allows the automatic classification of messages exchanged in online distance learning forums written in Brazilian Portuguese according to categories (Affective, Interactive and Cohesive) of social presence. To achieve this goal, the adopted method uses a set of 116 resources extracted from text mining and word counting techniques, such as LIWC and Coh-Metrix. The classifier with the best performance presented 0,97% and 0,95 % for precision and cohen kappa, respectively. This work also provides an analysis of the nature of social presence, looking at the most relevant classification characteristics to distinguish the three categories of social presence.
Keywords:
Social Presence, Community of Inquiry (CoI) model, Online Discussion, Text Classification
References
Balage Filho, P., Pardo, T. A. S., and Aluísio, S. (2013). An evaluation of the brazilian portuguese liwc dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
Barros, Maria das Graças e Carvalho, A. B. G. (2011). As concepções de interatividade nos ambientes virtuais de aprendizagem. Campina Grande: EDUEPB.
Bauer, M. W. (2007). Content analysis. an introduction to its methodology–by klaus krippendorff from words to numbers. narrative, data and social science–by roberto franzosi. The British Journal of Sociology, 58(2):329–331.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Fernandez-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1):3133–3181.
Ferreira, M., Rolim, V., Mello, R. F., Lins, R. D., Chen, G., and Gasevic, D. (2020). Towards automatic content analysis of social presence in transcripts of online discussions. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pages 141–150.
Ferreira-Mello, R., Andre, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6):e1332.
Garrison, D. R., Anderson, T., and Archer, W. (1999). Critical inquiry in a text-based environment: Computer conferencing in higher education. The internet and higher education, 2(2-3):87–105.
Garrison, D. R., Anderson, T., and Archer, W. (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. American Journal of distance education, 15(1):7–23.
He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284.
Kovanovic, V., Joksimovic, S., Gasevic, D., and Hatala, M. (2014). What is the source of social capital? the association between social network position and social presence in communities of inquiry. In Proceedings of the Workshops held at Educational Data Mining 2014 co-located with 7th International Conference on Educational Data Mining (EDM 2014). Citeseer.
Kovanovic, V., Joksimovi´c, S., Waters, Z., Gasevic, D., Kitto, K., Hatala, M., and Siemens, G. (2016). Towards automated content analysis of discussion transcripts: A cognitive presence case. In Proceedings of the sixth international conference on learning analytics & knowledge, pages 15–24.
Neto, V., Rolim, V., Ferreira, R., Kovanovic, V., Gasevic, D., Lins, R. D., and Lins, R. (2018). Automated analysis of cognitive presence in online discussions written in portuguese. In European conference on technology enhanced learning, pages 245–261. Springer.
Orengo, V. M. and Huyck, C. R. (2001). A stemming algorithmm for the portuguese language. In spire, volume 8, pages 186–193.
Palloff, R. M. and Pratt, K. (2004). O aluno virtual-um guia para trabalhar com estudantes on-line. Penso Editora.
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.
Scarton, C., Gasperin, C., and Aluisio, S. (2010). Revisiting the readability assessment of texts in portuguese. In Ibero-American Conference on Artificial Intelligence, pages 306–315. Springer.
Soares, F. B. M., Machado, C. J. R., Diniz, D., and Maciel, A. M. A. (2016). Educational data mining to support distance learning students with difficulties in the portuguese grammar. In Anais do XXVII Simpósio Brasileiro de Informática na Educação (SBIE 2016), pages 956–965, Brasil.
Strijbos, J.-W., Martens, R. L., Prins, F. J., and Jochems, W. M. (2006). Content analysis: What are they talking about? Computers & Education, 46(1):29–48.
Suhang, J., Williams, A., Schenke, K., Warschauer, M., and Odowd, D. (2014). Predicting mooc performance with week 1 behavior. Educational Data Mining.
Barros, Maria das Graças e Carvalho, A. B. G. (2011). As concepções de interatividade nos ambientes virtuais de aprendizagem. Campina Grande: EDUEPB.
Bauer, M. W. (2007). Content analysis. an introduction to its methodology–by klaus krippendorff from words to numbers. narrative, data and social science–by roberto franzosi. The British Journal of Sociology, 58(2):329–331.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Fernandez-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1):3133–3181.
Ferreira, M., Rolim, V., Mello, R. F., Lins, R. D., Chen, G., and Gasevic, D. (2020). Towards automatic content analysis of social presence in transcripts of online discussions. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pages 141–150.
Ferreira-Mello, R., Andre, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6):e1332.
Garrison, D. R., Anderson, T., and Archer, W. (1999). Critical inquiry in a text-based environment: Computer conferencing in higher education. The internet and higher education, 2(2-3):87–105.
Garrison, D. R., Anderson, T., and Archer, W. (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. American Journal of distance education, 15(1):7–23.
He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284.
Kovanovic, V., Joksimovic, S., Gasevic, D., and Hatala, M. (2014). What is the source of social capital? the association between social network position and social presence in communities of inquiry. In Proceedings of the Workshops held at Educational Data Mining 2014 co-located with 7th International Conference on Educational Data Mining (EDM 2014). Citeseer.
Kovanovic, V., Joksimovi´c, S., Waters, Z., Gasevic, D., Kitto, K., Hatala, M., and Siemens, G. (2016). Towards automated content analysis of discussion transcripts: A cognitive presence case. In Proceedings of the sixth international conference on learning analytics & knowledge, pages 15–24.
Neto, V., Rolim, V., Ferreira, R., Kovanovic, V., Gasevic, D., Lins, R. D., and Lins, R. (2018). Automated analysis of cognitive presence in online discussions written in portuguese. In European conference on technology enhanced learning, pages 245–261. Springer.
Orengo, V. M. and Huyck, C. R. (2001). A stemming algorithmm for the portuguese language. In spire, volume 8, pages 186–193.
Palloff, R. M. and Pratt, K. (2004). O aluno virtual-um guia para trabalhar com estudantes on-line. Penso Editora.
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.
Scarton, C., Gasperin, C., and Aluisio, S. (2010). Revisiting the readability assessment of texts in portuguese. In Ibero-American Conference on Artificial Intelligence, pages 306–315. Springer.
Soares, F. B. M., Machado, C. J. R., Diniz, D., and Maciel, A. M. A. (2016). Educational data mining to support distance learning students with difficulties in the portuguese grammar. In Anais do XXVII Simpósio Brasileiro de Informática na Educação (SBIE 2016), pages 956–965, Brasil.
Strijbos, J.-W., Martens, R. L., Prins, F. J., and Jochems, W. M. (2006). Content analysis: What are they talking about? Computers & Education, 46(1):29–48.
Suhang, J., Williams, A., Schenke, K., Warschauer, M., and Odowd, D. (2014). Predicting mooc performance with week 1 behavior. Educational Data Mining.
Published
2020-11-24
How to Cite
TEIXEIRA, Jean Barros; COSTA, Evandro de Barros; DIONÍSIO, Máverick; NASCIMENTO, André Câmara; MELLO, Rafael Ferreira Leite de.
Automatic Classification of Social Presence in Online Discussions Written in Portuguese. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 31. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 942-951.
DOI: https://doi.org/10.5753/cbie.sbie.2020.942.
