ENEM under a Socioeconomic Perspective: Analysis and Evaluation Through Dimensionality Reduction

  • Cristiano C. Mendieta UFPR
  • André L. Vignatti UFPR

Resumo


This study investigates the relationship between socioeconomic factors and student academic performance in the 2022 ENEM, applying dimensionality reduction techniques to the microdata set provided by INEP. This dataset includes information collected from the exam, such as test scores, answer keys, evaluated items, participant scores, and responses to the socioeconomic questionnaire. The research compares linear methods, such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and Independent Component Analysis (ICA), with non-linear methods, such as Autoencoders and Pairwise Controlled Manifold Approximation Projection (PaCMAP), in binary and multiclass classification scenarios. The results indicate that linear methods provide a good balance between accuracy and computational efficiency, especially in binary classification scenarios. However, non-linear methods are more suitable for capturing complex structures in multiclass classifications, despite their higher computational cost. The Feature Selection technique using XGBoost proved effective in identifying key variables that differentiate students based on socioeconomic characteristics and academic performance. This study provides a comprehensive analysis of large educational datasets, generating results that can guide the formulation of public policies aimed at promoting equity within the Brazilian educational system.

Referências

Becht, E., McInnes, L., Healy, J., Dutertre, C.-A., Kwok, I. W. H., Ng, L. G., Ginhoux, F., and Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using umap. Nature Biotechnology, 37(1):38–44.

Binois, M. and Wycoff, N. (2022). A survey on high-dimensional gaussian process modeling with application to bayesian optimization. ACM Trans. Evol. Learn. Optim., 2(2).

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA. Association for Computing Machinery.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics.

Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507.

Hyvärinen, A. and Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4):411–430.

Jia, W., Sun, M., Lian, J., and Hou, S. (2022). Feature dimensionality reduction: a review. Complex & Intelligent Systems, 8(3):2663–2693.

Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. Springer, New York, NY, 2nd edition.

Klema, V. and Laub, A. (1980). The singular value decomposition: Its computation and some applications. IEEE Transactions on Automatic Control, 25(2):164–176.

Marques Queiroga, E., Sarmanho Siqueira, E., Dos Santos Portela, C., Damasceno Cordeiro, T., Ibert Bittencourt, I., Isotani, S., Ferreira Mello, R., Muñoz, R., and Cechinel, C. (2024). Data-driven strategies for achieving school equity: Insights from brazil and policy recommendations. IEEE Access, 12:101646–101659.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Oliveira, E., Justo, W., and Lucena, M. (2024). The dynamics of public high school student performance in ceará: A study of the case of sobral in brazil. IOSR Journal Of Humanities And Social Science, 29:25–33.

Sampaio, B. and Guimarães, J. (2009). Diferenças de eficiência entre ensino público e privado no brasil. Economia Aplicada, 13(1):45–68.

Santos, B., Saporetti, C. M., and Macedo, B. S. (2023). Analysis of the impact of the pandemic on social inequalities in enem 2019 and 2020 using machine learning. Semina: Exact and Technological Sciences, 44(2):1–12.

Wang, J., He, H., and Prokhorov, D. V. (2012). A folded neural network autoencoder for dimensionality reduction. Procedia Computer Science, 13:120–127. Proceedings of the International Neural Network Society Winter Conference (INNS-WC2012).

Wang, Y., Huang, H., Rudin, C., and Shaposhnik, Y. (2021). Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization. Journal of Machine Learning Research, 22(201):1–73.

Weikuan, Z., Qiang, L., and Jiawei, W. (2022). A comprehensive review on feature selection strategies for high-dimensional data. Journal of Machine Learning Research, 23(5):1–45.
Publicado
19/07/2026
MENDIETA, Cristiano C.; VIGNATTI, André L.. ENEM under a Socioeconomic Perspective: Analysis and Evaluation Through Dimensionality Reduction. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 15. , 2026, Gramado/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 14-27. ISSN 2595-6094. DOI: https://doi.org/10.5753/brasnam.2026.21861.