Application of the KDD Process to COVID-19 Data: A Case Study in Rio Grande do Sul, Brazil

  • Gabriel V. Heisler UFSM
  • Joaquim V. C. Assunção UFSM

Abstract


Given the increasing amount of data linked to a complex healthcare system, challenges arise in enhancing decision-making based on data patterns. To address this issue, it is crucial to explore data mining as a tool to extract valuable insights. This study focuses on applying the Knowledge Discovery in Databases (KDD) process, particularly in the preliminary and data mining stages, to identify patterns in the COVID-19 pandemic data in Rio Grande do Sul, Brazil. Our analyses revealed interesting patterns, such as the association between specific symptoms and patient outcomes. While the results offer valuable insights, it is important to note that this study does not aim to provide definitive conclusions regarding the causal relationship between symptoms and patient outcomes. Instead, the goal is to present patterns identified in the data without interpreting their clinical significance. These findings have the potential to inform future research and provide a solid foundation for proactive decision-making in public health.

References

Agrawal, R., Mehta, M., Shafer, J. C., Srikant, R., Arning, A., and Bollinger, T. (1996). The quest data mining system. In KDD, volume 96, pages 244–249.

Apté, C. and Weiss, S. (1997). Data mining with decision trees and decision rules. Future generation computer systems, 13(2-3):197–210.

Cucinotta, D. and Vanelli, M. (2020). Who declares covid-19 a pandemic. Acta Bio Medica: Atenei Parmensis, 91(1):157.

Dagnino, R., Weber, E., and Panitz, L. (2020). Monitoramento do coronavírus (covid-19) nos municípios do Rio Grande do Sul.

Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3):37–37.

Hahsler, M., Grün, B., and Hornik, K. (2005). arules-a computational environment for mining association rules and frequent item sets. Journal of statistical software, 14:1–25.

Hallal, P. C., Horta, B. L., Barros, A. J., Dellagostin, O. A., Hartwig, F. P., Pellanda, L. C., Struchiner, C. J., Burattini, M. N., Silveira, M. F. d., Menezes, A., et al. (2020). Evolução da prevalência de infecção por covid-19 no Rio Grande do Sul, Brasil: inquéritos sorológicos seriados. Ciência & Saúde Coletiva, 25:2395–2401.

Milborrow, S. and Milborrow, M. S. (2019). Package ‘rpart. plot’. Plot’rpart’Models: An Enhanced Version of’plot. rpart.

Phyu, T. N. (2009). Survey of classification techniques in data mining. In Proceedings of the international multiconference of engineers and computer scientists, volume 1, pages 727–731. Citeseer.

Silveira, M. F., Barros, A. J., Horta, B. L., Pellanda, L. C., Victora, G. D., Dellagostin, O. A., Struchiner, C. J., Burattini, M. N., Valim, A. R., Berlezi, E. M., et al. (2020). Population-based surveys of antibodies against sars-cov-2 in southern brazil. Nature Medicine, 26(8):1196–1199.

Therneau, T., Atkinson, B., Ripley, B., and Ripley, M. B. (2015). Package ‘rpart’. Available online: [link] (accessed on 20 April 2016).

Therneau, T. M., Atkinson, E. J., et al. (1997). An introduction to recursive partitioning using the rpart routines. Technical report, Technical report Mayo Foundation.
Published
2024-04-10
HEISLER, Gabriel V.; ASSUNÇÃO, Joaquim V. C.. Application of the KDD Process to COVID-19 Data: A Case Study in Rio Grande do Sul, Brazil. In: REGIONAL DATABASE SCHOOL (ERBD), 19. , 2024, Farroupilha/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 91-100. ISSN 2595-413X. DOI: https://doi.org/10.5753/erbd.2024.238871.