Prevendo Desempenho dos Candidatos do ENEM Através de Dados Socioeconômicos

Bernardo Stearns; Flavio Rangel; Fabrício Firmino; Fabio Rangel; Jonice Oliveira

Bernardo Stearns UFRJ
Flavio Rangel UFRJ
Fabrício Firmino UFRJ
Fabio Rangel UFRJ
Jonice Oliveira UFRJ

Resumo

O presente artigo analisou a possibilidade de prever a performance de estudantes baseando-se apenas em suas informações socioeconômicas. O trabalho utilizou dados do exame mais importante para adentrar em universidades brasileiras: Exame Nacional do Ensino Médio (ENEM). O estudo comparou a capacidade de generalizar de dois métodos de agrupamento de árvores de decisão, na tarefa de regressão da nota por meio dos dados socioeconômicos. Os resultados apontaram que existe um viés significativo das características socioculturais dos alunos sobre as notas.

Referências

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1):281–305.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754.

Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.

Hastie, T., Tibshirani, R., Friedman, J., and Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2):83–85.

Kennedy, J. (2011). Particle swarm optimization. In Encyclopedia of machine learning, pages 760–766. Springer.

Krishnaiah, V., Narsimha, G., and Chandra, N. S. (2014). Survey of classification techniques in data mining. International Journal of Computer Science and Engineering, 2.

Lafferty, G. L. J. (2002). Boosting and maximum likelihood for exponential models. Advances in neural information processing systems, 14:447.

Lee, S. J., Liu, Y.-E., and Popovic, Z. (2014). Learning individual behavior in an educational game: A data-driven approach. In Educational Data Mining 2014.

Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. In Encyclopedia of database systems, pages 532–538. Springer.

Satyanarayana, N., Ramalingaswamy, C., and Ramadevi, Y. (2014). Survey of classification techniques in data mining. International Journal of Innovative Science, Engineering & Technology, 1.

Segal, A., Katzir, Z., Gal, K., Shani, G., and Shapira, B. (2014). Edurank: A collaborative filtering approach to personalization in e-learning. In Educational Data Mining 2014.

Stearns, B., Rangel, F., Rangel, F., Firmino, F., and Oliveira, J. (2017). Scholar performance prediction using boosted regression trees techniques. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). Citeseer.

Uragun, B. and Rajan, R. (2011). Developing an appropriate data normalization method. In Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, volume 2, pages 195–199. IEEE.

Vahdat, M., Ghio, A., Oneto, L., Anguita, D., Funk, M., and Rauterberg, M. (2015). Advances in learning analytics and educational data mining. Proc. of ESANN2015, pages 297–306.

Woźniak, M., Graña, M., and Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17.