Probabilistic classification of educational videos considering comments: an experimental analysis on Youtube

  • Henrique C. F. B. Carvalho UFU
  • Cristiano G. Pitangui UFSJ
  • Fabiano A. Dorça UFU
  • Catrine S. Oliveira UFSJ
  • Eduardo A. C. Trindade UFVJM
  • Alessandro V. Andrade UFVJM
  • Luciana P. Assis UFVJM


Youtube is a constantly growing video platform that is massively used for teachers and students in the teaching and learning process. Some works point an important issue in Youtube search mechanism, as in many cases, the number of results returned by the platform is very large and not related to the search performed. In this sense, some works proposed methodologies to classy Youtube videos as educational or not to help in searching more specific educational content. This work develops a new methodology that probabilistically classify Youtube videos as educational or non-educational using its comments. Preliminary results show that comments can be used in order to probabilistically classify a video with high accuracy rates.


Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., and Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675.

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., and Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.

Berrar, D. (2019). Cross-validation. In Ranganathan, S., Gribskov, M., Nakai, K., and Schönbach, C., editors, Encyclopedia of Bioinformatics and Computational Biology, pages 542–545. Academic Press, Oxford, UK.

Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.

Carvalho, H., Pitangui, C., Trindade, E., Assis, L., Andrade, A., and de Souza, D. (2020). Categorização de vídeos educacionais do youtube por meio de comentários. RENOTE, 18(2):621–629.

Carvalho, H. C. F. B., Dorça, F. A., Pitangui, C. G., de Assis, L. P., Andrade, A. V., and Trindade, E. A. C. (2022). Classificação automática de vídeos educacionais por meio de comentários apoiada por técnicas de aprendizado de máquina: uma análise experimental utilizando o youtube. Revista Brasileira de Informática na Educação, 30:419–448.

Gomes, L. (2008). Vídeos didáticos: uma proposta de critérios para análise. Revista Brasileira de Estudos Pedagógicos, 89(223).

Hickman, L., Thapa, S., Tay, L., Cao, M., and Srinivasan, P. (2020). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, page 1094428120971683.

Jusoh, S. and Alfawareh, H. M. (2012). Techniques, applications and challenging issue in text mining. International Journal of Computer Science Issues (IJCSI), 9(6):431.

Kesavaraj, G. and Sukumaran, S. (2013). A study on classification techniques in data mining. In 2013 fourth international conference on computing, communications and networking technologies (ICCCNT), pages 1–7. IEEE.

Mitchell, T. M. et al. (1997). Machine learning. McGraw-hill New York, New York.

Russell, S. and Norvig, P. (2002). Artificial intelligence: a modern approach.

Sukanya, M. and Biruntha, S. (2012). Techniques on text mining. In 2012 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), pages 269–271. IEEE.

Thelwall, M. (2018). Social media analytics for youtube comments: Potential and limitations. International Journal of Social Research Methodology, 21(3):303–316.

Vapnik, V. N. and Vapnik, V., editors (1998). Statistical learning theory, volume 1. Wiley New York, New York.

Vijayarani, S., Janani, R., et al. (2016). Text mining: open source tokenization toolsan analysis. Advanced Computational Intelligence: An International Journal (ACII), 3(1):37–47.

Wiederhold, G. and McCarthy, J. (1992). Arthur samuel: Pioneer in machine learning. IBM Journal of Research and Development, 36(3):329–331.

Wiley, D. A. (2000). Learning object design and sequencing theory. PhD thesis, Brigham Young University.

Youtube (2019). Youtube insights. Acesso em: 17 de Abril de 2019.
CARVALHO, Henrique C. F. B.; PITANGUI, Cristiano G.; DORÇA, Fabiano A.; OLIVEIRA, Catrine S.; TRINDADE, Eduardo A. C.; ANDRADE, Alessandro V.; ASSIS, Luciana P.. Probabilistic classification of educational videos considering comments: an experimental analysis on Youtube. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE), 34. , 2023, Passo Fundo/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 1408-1418. DOI: