Educational Inequalities in Brazil: A Clustering Analysis of Educational and School Performance Indicators

  • Matheus L. de Melo Silva Federal University of Ceará (UFC)
  • Lívia Almada Cruz Federal University of Ceará (UFC)
  • Regis Pires Magalhães Federal University of Ceará (UFC)
  • Tatieures Gomes Pires Federal University of Ceará (UFC)
  • José Antonio Macedo Federal University of Ceará (UFC)
  • Rossana Maria de Castro Andrade Federal University of Ceará (UFC) https://orcid.org/0000-0002-0186-2994

Abstract


The Brazilian education system faces structural and socioeconomic challenges, reflected in unequal access to education and low academic performance rates, especially in vulnerable regions. Analyzing educational indicators helps identify structural changes in education, assess the effectiveness of implemented policies, and monitor the evolution of educational quality. This work employs the clustering of educational and school performance indicators to identify factors for educational inequalities in Brazil. Based on data from several educational indicators from 2015, 2019, and 2021 provided by INEP, it was possible to identify municipalities with more similar profiles. In addition, the temporal analysis of the clusters allowed us to understand the evolution of inequalities over the years, providing information that can be useful for formulating more effective public policies and strategically allocating resources.

Keywords: clustering, k-means, educational indicators, data analysis

References

CNN (2023). Brasil tem baixo desempenho e estagna em ranking mundial da educação básica. Dados publicados na CNN Brasil.

Cutler, D. M. and Lleras-Muney, A. (2012). Education and health: insights from international comparisons. Encyclopedia of Health Economics.

Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12):64–73.

Fernández, R., Correal, J. F., D’Ayala, D., and Medaglia, A. L. (2023). A decision-making framework for school infrastructure improvement programs. Structure and Infrastructure Engineering, pages 1–20.

Gonçalves, T. G. G. L., do Santo, S. C., and dos Santos, N. G. (2017). Indicadores educacionais brasileiros: limites e perspectivas. Educação Em Perspectiva, 8(3):444–461.

Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666.

Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3):685–695.

Kriegel, H.-P., Kröger, P., Sander, J., and Zimek, A. (2011). Density-based clustering. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(3):231–240.

Li, X., Zhang, Y., Cheng, H., Zhou, F., and Yin, B. (2021). An unsupervised ensemble clustering approach for the analysis of student behavioral patterns. Ieee Access, 9:7076–7091.

MacQueen, J. et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA.

Mohamed Nafuri, A. F., Sani, N. S., Zainudin, N. F. A., Rahman, A. H. A., and Aliff, M. (2022). Clustering analysis for classifying student academic performance in higher education. Applied Sciences, 12(19):9467.

Nikita Sachdeva (2023). Top 12 clustering algorithms in machine learning. Dados publicados no daffodil – Os 12 algorimtos mais populares de clusterização.

Quintero, Y., Ardila, D., Aguilar, J., and Cortes, S. (2022). Analysis of the socioeconomic impact due to covid-19 using a deep clustering approach. Applied Soft Computing, 129:109606.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65.

Satopaa, V., Albrecht, J., Irwin, D., and Raghavan, B. (2011). Finding a "kneedle"in a haystack: Detecting knee points in system behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops, pages 166–171.

Shinde, P. P. and Shah, S. (2018). A review of machine learning and deep learning applications. In 2018 Fourth international conference on computing communication control and automation (ICCUBEA), pages 1–6. IEEE.

Tan, P.-N., Steinbach, M., and Kumar, V. (2016). Introduction to data mining. Pearson Education India.

Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18(4):267–276.

Valles-Coral, M. A., Salazar-Ramírez, L., Injante, R., Hernandez-Torres, E. A., Juárez-Díaz, J., Navarro-Cabrera, J. R., Pinedo, L., and Vidaurre-Rojas, P. (2022). Density-based unsupervised learning algorithm to categorize college students into dropout risk levels. Data, 7(11):165.

Xu, R. and Wunsch, D. (2008). Clustering. John Wiley & Sons.

Zhang, T. and Oles, F. (2000). The value of unlabeled data for classification problems. In Proceedings of the Seventeenth International Conference on Machine Learning,(Langley, P., ed.), volume 20. Citeseer.
Published
2025-09-29
DE MELO SILVA, Matheus L.; CRUZ, Lívia Almada; MAGALHÃES, Regis Pires; PIRES, Tatieures Gomes; MACEDO, José Antonio; CASTRO ANDRADE, Rossana Maria de. Educational Inequalities in Brazil: A Clustering Analysis of Educational and School Performance Indicators. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 154-167. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247053.