Data Mining to Characterize Hypertensive Individuals with Cardiovascular Diseases in Brazil

  • Gustavo Costa Pontifical Catholic University of Minas Gerais
  • Luis Enrique Zárate Gálvez Pontifical Catholic University of Minas Gerais

Abstract


This study applied data mining to classify healthy individuals and those with hypertension and cardiovascular diseases (HA + CVD) in Brazil, using data from the 2019 National Health Survey (PNS). Algorithms such as Decision Tree, Random Forest, and Naive Bayes were tested. The models performed similarly, with Random Forest achieving 97% accuracy and sensitivity in identifying healthy individuals. However, classifying HA + CVD cases was more challenging, with lower sensitivity, possibly due to the absence of formal diagnoses and lifestyle factors. The results highlight the importance of more detailed and longitudinal data to improve the identification of chronic diseases.

Keywords: Data Mining, Hypertension, Cardiovascular Diseases, Machine Learning, Public Health, Brazilian National Health Survey

References

AlKaabi, L., Ahmed, L., Al Attiyah, M., and Abdel-Rahman, M. (2020). Predicting hypertension using machine learning: Findings from qatar biobank study. PLOS ONE, 15(10):e0240370.

Alwan, A. (2011). Global status report on noncommunicable diseases 2010. World Health Organization, Geneva. 176 pp.

Bhatt, C., Patel, P., Ghetia, T., and Mazzeo, P. (2023). Effective heart disease prediction using machine learning techniques. Algorithms, 16(2):88.

de Araújo, J., de Alencar Rodrigues, R., da Costa Pereira de Arruda Neta, A., et al. (2022). The direct and indirect costs of cardiovascular diseases in brazil. PLOS ONE, 17(12):e0278891.

de Carvalho, N., Gomes, M., and Zárate, L. (2024). Mineração de dados no diagnóstico de hipertensão baseado na pesquisa nacional em saúde 2019. J Health Inform, 16(Especial).

Gárate-Escamila, A., El Hassani, A., and Andrès, E. (2020). Classification models for heart disease prediction using feature selection and pca. Informatics in Medicine Unlocked, 19:100330.

Gonçalves, L., Franca, D., and Zarate, L. (2024). Relevância do entendimento do domínio de problema na construção de modelos computacionais de aprendizado. In Anais do XVIII Brazilian e-Science Workshop, pages 135–142, Porto Alegre, RS, Brasil. SBC.

IBGE (2020). Pesquisa nacional de saúde 2019 - instituto brasileiro de geografia e estatística. [link]. Acesso em: 2024-07-15.

Loyola-González, O. (2019). Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access, 7:154096–154113.

Malta, D. et al. (2022). Hipertensão arterial e fatores associados: Pesquisa nacional de saúde, 2019. Revista de Saúde Pública, 56:122.

National Institute on Alcohol Abuse and Alcoholism (2022). Standard alcohol guidelines.

Powell-Wiley, T., Poirier, P., Burke, L., et al. (2021). Obesity and cardiovascular disease: A scientific statement from the american heart association. Circulation, 143(21):e84–e118.

Sousa, C., Ribeiro, A., Barreto, S., et al. (2022). Diferenças raciais no controle da pressão arterial em usuários de anti-hipertensivos em monoterapia: resultados do estudo elsa-brasil. Arq. Bras. Cardiol., 118(3):614–622.

Sousa, M. and Zarate, L. (2024). A epidemia silenciosa: Explorando os determinantes comportamentais e socioeconômicos da deficiência renal crônica no brasil. In Anais Estendidos do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 318–327, Porto Alegre, RS, Brasil. SBC.

Stevens, B., Pezzullo, L., Verdian, L., Tomlinson, J., George, A., and Bacal, F. (2018). The economic burden of heart conditions in brazil. Arq. Bras. Cardiol., 111(1):29–36.

WHO (2011). Global Atlas on Cardiovascular Disease Prevention and Control. World Health Organization, Geneva.

WHO (2021). Obesity and overweight.

Yang, J., Rahardja, S., and Fränti, P. (2019). Outlier detection: how to threshold outlier scores? In Proc. of the Int. Conf. on Artificial Intelligence, Information Processing and Cloud Computing, pages 37–42.

Zilbermint, M., Hannah-Shmouni, F., and Stratakis, C. (2019). Genetics of hypertension in african americans and others of african descent. Int. J. Mol. Sci., 20(5):1081.
Published
2025-09-29
COSTA, Gustavo; GÁLVEZ, Luis Enrique Zárate. Data Mining to Characterize Hypertensive Individuals with Cardiovascular Diseases in Brazil. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 168-181. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247058.