Predicting ENEM Candidates' Performance Through Socioeconomic Data
Abstract
The present work analyzed the possibility to predict students performance based only in socioeconomic status. The dataset used in this work was extracted from the most important examn to join Brazilian Universities: National High School Examn (ENEM). The study compared the performance of two decision trees ensemble methods, in the task of predicting the scholar grade using socioeconomic data. The results show that socio-economic indicators can partly explain a bias in student scores.
References
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.
Hastie, T., Tibshirani, R., Friedman, J., and Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2):83–85.
Kennedy, J. (2011). Particle swarm optimization. In Encyclopedia of machine learning, pages 760–766. Springer.
Krishnaiah, V., Narsimha, G., and Chandra, N. S. (2014). Survey of classification techniques in data mining. International Journal of Computer Science and Engineering, 2.
Lafferty, G. L. J. (2002). Boosting and maximum likelihood for exponential models. Advances in neural information processing systems, 14:447.
Lee, S. J., Liu, Y.-E., and Popovic, Z. (2014). Learning individual behavior in an educational game: A data-driven approach. In Educational Data Mining 2014.
Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. In Encyclopedia of database systems, pages 532–538. Springer.
Satyanarayana, N., Ramalingaswamy, C., and Ramadevi, Y. (2014). Survey of classification techniques in data mining. International Journal of Innovative Science, Engineering & Technology, 1.
Segal, A., Katzir, Z., Gal, K., Shani, G., and Shapira, B. (2014). Edurank: A collaborative filtering approach to personalization in e-learning. In Educational Data Mining 2014.
Stearns, B., Rangel, F., Rangel, F., Firmino, F., and Oliveira, J. (2017). Scholar performance prediction using boosted regression trees techniques. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). Citeseer.
Uragun, B. and Rajan, R. (2011). Developing an appropriate data normalization method. In Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, volume 2, pages 195–199. IEEE.
Vahdat, M., Ghio, A., Oneto, L., Anguita, D., Funk, M., and Rauterberg, M. (2015). Advances in learning analytics and educational data mining. Proc. of ESANN2015, pages 297–306.
Woźniak, M., Graña, M., and Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17.