AcolheEdu: Predicting Psychosocial Vulnerability in the School Context with Histogram-Based Gradient Boosting Trees
Resumo
AcolheEdu proposes a data-driven screening approach to support schools in the early identification of students potentially vulnerable to psychological distress. As a proof of concept, it uses public, anonymized PeNSE 2019 microdata and a calibrated Machine Learning model to estimate a risk score from self-reported psychosocial factors. The calibrated HistGradientBoosting achieved ROC-AUC 0.86 and PR-AUC 0.77 in stratified cross-validation, and ROC-AUC 0.859 on a holdout test set. With an operational threshold of 0.30, recall reached 80.1% with 61.7% precision. The product flow is currently a navigable Figma prototype; REST API integration (FastAPI) is planned for implementation. These results indicate feasibility for high-sensitivity, data-driven school mental health screening.Referências
Andifes (2019). V pesquisa nacional de perfil socioeconômico e cultural dos(as) graduandos(as) das IFES – 2018. Technical report, Associação Nacional dos Dirigentes das Instituições Federais de Ensino Superior (Andifes), Brasília.
Brasil (2018). Lei geral de proteção de dados pessoais (LGPD). Lei nº 13.709, de 14 de agosto de 2018.
IBGE (2021). Pesquisa nacional de saúde do escolar: 2019 (PeNSE 2019). Technical report, Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro.
Martinez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernandez-Orallo, J., Kull, M., Lachiche, N., Ramirez-Quintana, M. J., and Flach, P. (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8):3048–3061.
Niculescu-Mizil, A. and Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML ’05), pages 625–632.
Shearer, C. (2000). The crisp-dm model: the new blueprint for data mining. Journal of Data Warehousing, 5(4):13–22.
Silva Filho, T., Song, H., Perello-Nieto, M., Santos-Rodriguez, R., Kull, M., and Flach, P. (2023). Classifier calibration: a survey on how to assess and improve predicted class probabilities. Machine Learning, 112(9):3211–3260.
Xia, L., Zheng, P., Li, J., Huang, X., and Gao, R. X. (2024). Histogram-based gradient boosting tree: A federated learning approach for collaborative fault diagnosis. IEEE/ASME Transactions on Mechatronics, 29(4):2637–2648.
Brasil (2018). Lei geral de proteção de dados pessoais (LGPD). Lei nº 13.709, de 14 de agosto de 2018.
IBGE (2021). Pesquisa nacional de saúde do escolar: 2019 (PeNSE 2019). Technical report, Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro.
Martinez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernandez-Orallo, J., Kull, M., Lachiche, N., Ramirez-Quintana, M. J., and Flach, P. (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8):3048–3061.
Niculescu-Mizil, A. and Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML ’05), pages 625–632.
Shearer, C. (2000). The crisp-dm model: the new blueprint for data mining. Journal of Data Warehousing, 5(4):13–22.
Silva Filho, T., Song, H., Perello-Nieto, M., Santos-Rodriguez, R., Kull, M., and Flach, P. (2023). Classifier calibration: a survey on how to assess and improve predicted class probabilities. Machine Learning, 112(9):3211–3260.
Xia, L., Zheng, P., Li, J., Huang, X., and Gao, R. X. (2024). Histogram-based gradient boosting tree: A federated learning approach for collaborative fault diagnosis. IEEE/ASME Transactions on Mechatronics, 29(4):2637–2648.
Publicado
01/06/2026
Como Citar
BRUM, Beatriz; DEMBINSKI, Iasmin; VASCONCELLOS, Cristhiano; SANTOS, Carlos.
AcolheEdu: Predicting Psychosocial Vulnerability in the School Context with Histogram-Based Gradient Boosting Trees. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 26. , 2026, Ouro Preto/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 1319-1324.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2026.20169.
