Analysis of ENEM’s attendants between 2012 and 2017 using a clustering approach
Keywords:ENEM, KDD, Clustering, k-means, Elbow Method
Data analysis is increasingly being used as an unbiased and accurate way to evaluate many aspects of society and their evolution over the years. This article presents an analysis of student’s characteristics, between 2012 and 2017, in the most important exam for entry into higher education in Brazil, the Exame Nacional do Ensino Médio (ENEM). The intention is to gain insights of Brazilian regions, ENEM’s areas of knowledge, type of school and accessibility, using a clustering method (K-means). An extensive and careful cleaning of the database was made in order to homogenize it and avoid types of statistical bias. The results of this work are presented objectively in the article, so it may be useful and used as a numerical base in works of socio-educational disciplines or studies that are interested in better understanding the evolution of ENEM in recent years. Finally, some discussions and restrictions on grouping results were presented in a timely manner.
Aggarwal, C. C. Data Mining: The Textbook. Springer, Cham, 2015.
Berkhin, P. A Survey of Clustering Data Mining Techniques. In Grouping multidimensional data. Springer, Manhattan, New York City, USA, pp. 25–71, 2006.
Bholowalia, P. and Kumar, A. EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN. International Journal of Computer Applications 105 (9): 17–24, 2014.
Cabral, S. P., Beduschi, N. B., Zancanaro, A., Todesco, J. L., and Gauthier, F. A. O. Aplicando Linked Data na Publicação de Dados do ENEM. In ONTOBRAS/MOST. Recife, Pernambuco, Brasil, pp. 176–181, 2012.
Deza, M. M. and Deza, E. Encyclopedia of Distances. Springer, 2009.
Gan, G., Ma, C., and Wu, J. Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics, 2007.
Leoni, R. C. and Sampaio, N. A. d. S. Desempenho das escola públicas e privadas da região do vale do Paraíba: uma aplicação da técnica de agrupamentos Kmeans com base nas variáveis do ENEM 2015. Cadernos do IME-Série Estatística vol. 42, pp. 31–43, 2017.
Marutho, D., Handaka, S. H., Wijaya, E., and Muljono. The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News. In 2018 International Seminar on Application for Technology of Information and Communication. Semarang, Indonesia, pp. 533–538, 2018.
Ortega, J. P., Pires, C. E. S., Marinho, L. B., Mexicano, A., and Hidalgo, M. A. Early Classification: A New Heuristic to Improve the Classification Step of K-Means. Journal of Information and Data Management 4 (2): 94–103, 2013.
Paterlini, A. A., Nascimento, M. A., and Traina, C. J. Using Pivots to Speed-Up k-Medoids Clustering. Journal of Information and Data Management 2 (2): 221–236, 2011.
Silveira, I. C. and Mauá, D. D. Advances in Automatically Solving the ENEM. In 7th Brazilian Conference on Intelligent Systems (BRACIS). São Paulo, Brasil, pp. 43–48, 2018.
Simon, A. and Cazella, S. Mineração de Dados Educacionais nos Resultados do ENEM de 2015. In Anais dos Workshops do Congresso Brasileiro de Informática na Educação. Vol. 6. Recife, Pernambuco, Brasil, pp. 754–763, 2017.
Viggiano, E. and Mattos, C. O desempenho de estudantes no Enem 2010 em diferentes regiões brasileiras. Revista Brasileira de Estudos Pedagógicos 94 (237): 417–438, 2013.