A Framework for prediction of dropout in distance learning through XAI techniques in Virtual Learning Environment
Abstract
A challenge in the distance learning modality is to avoid student’s dropout which, according to the ABED, can vary between 21% and 50%. To this end, several data mining methods are applied, using student’s interaction data in the Virtual Learning Environment. However, a relevant problem is to select the best features (variables/attributes) for early prediction student’s dropout. In this paper, we propose a framework that uses explainable AI methods (XAI-SHAP) to find out attributes with greater predictive power on VLE integrated with third-party CMS. After selection, the proposed model achieved results of recall 0.96 and precision 0.95, compatible with the state of the art, but using a smaller set of attributes and a database with a smaller number of instances.
References
Adnan, M., Habib, A., Ashraf, J., Mussadiq, S., Raza, A. A., Abid, M., Bashir, M., and Khan, S. U. (2021). Predicting at-risk students at different percentages of course length for early intervention using machine learning models. IEEE Access, 9:7519-7539. IEEE Access.
Alamri, A., Alshehri, M., Cristea, A. I., Pereira, F. D., Oliveira, E., Shi, L., and Stewart, C. (2019). Predicting MOOCs dropout using only two easily obtainable features from the first week's activities. In Lecture Notes in Computer Science, volume 11528, pages 163-173. arXiv.org.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321-357.
Daoud, E. A. (2019). Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. International Journal of Computer and Information Engineering, 13(1):6-10.
Demir, S. and Sahin, E. K. (2022). Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dynamics and Earthquake Engineering, 154:107130.
Gramegna, A. and Giudici, P. (2022). Shapley feature selection. FinTech, 1(1):72-80. Number: 1 Publisher: Multidisciplinary Digital Publishing Institute.
INEP (2020). Apresentação da coletiva de imprensa censo da educação superior 2019.
Jin, C. (2021). Dropout prediction model in MOOC based on clickstream data and student sample weight. Soft Computing, 25(14):8971-8988.
Kostopoulos, G., Panagiotakopoulos, T., Kotsiantis, S., Pierrakeas, C., and Kameas, A. (2021). Interpretable models for early prediction of certification in MOOCs: A case study on a MOOC for smart city professionals. IEEE Access, 9:165881-165891. IEEE Access.
Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S. (2021). Explainable AI: A review of machine learning interpretability methods. Entropy, 23(1):18. Number: 1 Publisher: Multidisciplinary Digital Publishing Institute.
Liu, K., Tatinati, S., and Khong, A. W. H. (2020). A weighted feature extraction technique based on temporal accumulation of learner behavior features for early prediction of dropouts. In 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pages 295-302.
Lundberg, S. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. arXiv:1705.07874 [cs, stat]. arXiv:1705.07874 [cs, stat].
Lundberg, S. M., Erion, G. G., and Lee, S.-I. (2019). Consistent individualized feature attribution for tree ensembles. arXiv:1802.03888.
Mahani, A. and Ali, A. R. B. (2019). Classification Problem in Imbalanced Datasets. IntechOpen. Publication Title: Recent Trends in Computational Intelligence.
Marcílio, W. E. and Eler, D. M. (2020). From explanations to feature selection: assessing SHAP values as feature selection mechanism. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 340-347.
Nalepa, G. J., Bobek, S., Kutt, K., and Atzmueller, M. (2021). Semantic data mining in ubiquitous sensing: A survey. Sensors, 21(13):4322. Number: 13 Publisher: Multidisciplinary Digital Publishing Institute.
Panagiotakopoulos, T., Kotsiantis, S., Kostopoulos, G., Iatrellis, O., and Kameas, A. (2021). Early dropout prediction in MOOCs through supervised learning and hyper-parameter optimization. Electronics, 10(14):1701. Number: 14 Publisher: Multidisciplinary Digital Publishing Institute.
Rabelo, H., Burlamaqui, A., Valentim, R., Rabelo, D. S. d. S., and Medeiros, S. (2017). Utilização de técnicas de mineração de dados educacionais para predição de desempenho de alunos de EaD em ambientes virtuais de aprendizagem. 28(1):1527. Simpósio Brasileiro de Informática na Educação - SBIE.
Ramos, J. L. C., Santos, L. F. L., Silva, J. C. S., and Rodrigues, R. L. (2020). Identificação de perfis de interação de estudantes de educação a distância por meio de técnicas de agrupamentos. In Anais do Simpósio Brasileiro de Informática na Educação, pages 932-941. SBC.
Ramos, J. L. C., Silva, J., Prado, L., Gomes, A., and Rodrigues, R. (2018). Um estudo comparativo de classificadores na previsão da evasão de alunos em EAD. 29(1):1463. Simpósio Brasileiro de Informática na Educação - SBIE.
Ramos, J. L. C., Silva, J., Rodrigues, R., Gomes, A. S., and Souza, F. d. F. d. (2016). Mapeamento de dados de um LMS para medida de construtos da distância transacional. 27(1):1056. Simpósio Brasileiro de Informática na Educação - SBIE.
