A genetic algorithm with flexible fitness function for feature selection in educational data.

  • Danielle F. de Albuquerque Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)
  • Diego N. Brandão Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)
  • Rafaelli Coutinho Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) http://orcid.org/0000-0002-1735-1718

Abstract


Due to the growing volume and increasing availability of educational data, data mining techniques have been frequently applied to help understand phenomena related to education. However, much of this data can be sparse, redundant, irrelevant, and noisy, which can degrade predictive models' quality and computational performance. One way to minimize these problems is to select attributes in the modeling process using Feature Selection (FS) techniques. This article proposes a FS approach with a genetic algorithm adapted to the educational context. The results indicate that the proposal improves classification performance and allows education specialists to have greater flexibility in selecting attributes according to their needs and realities.

Keywords: Feature Selection, Educational Data Mining, Genetic Algorithm

References

M. R. Ahmed et al. A comprehensive analysis on undergraduate student academic performance using feature selection techniques on classification algorithms. In ICCCNT 2020, pages 1–6, 2020.

G. Chandrashekar and F. Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.

G. Chandrashekar and F. Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.

S. Davies and S. J. Russell. NP-completeness of searches for smallest possible feature sets. In AAAI Symposium on Intelligent Relevance, pages 37–39. AAAI Press, 1994.

J. D. Febro. Utilizing feature selection in identifying predicting factors of student retention. International Journal of Advanced Computer Science and Applications, 10(9), 2019.

N. Gitinabard et al. Your actions or your associates? predicting certification and dropout in moocs with behavioral and social features. In EDM, 2018.

Khoshgoftaar T. M. Hancock, J.T. Survey on categorical data for neural networks. Journal of Big Data, 28(7), 2020.

C. Jalota and R. Agrawal. Feature selection algorithms and student academic performance: A study. Advances in Intelligent Systems and Computing, 1165:317–328, 2021.

C. Romero and S. Ventura. Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2010.

G. A.S. Santos et al. EvolveDTree: Analyzing Student Dropout in Universities. In ICSSIP, pages 173–178, 2020.

S. Singh and S. Selvakumar. A hybrid feature subset selection by combining filters and genetic algorithm. In ICCCA, pages 283–289, 2015.

F. Tan, X. Fu, Y. Zhang, and A. G. Bourgeois. A genetic algorithm-based method for feature subset selection. Soft Computing, 12(2):111–120, 2008. ISSN 14327643.

M. Zaffar, M. A. Hashmani, and K. S. Savita. Performance analysis of feature selection algorithm for educational data mining. In ICBDA 2017, pages 7–12, 2018.
Published
2021-10-04
DE ALBUQUERQUE, Danielle F.; BRANDÃO, Diego N.; COUTINHO, Rafaelli. A genetic algorithm with flexible fitness function for feature selection in educational data.. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 355-360. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2021.17898.