Unsupervised machine learning and pandemics spread: the case of COVID-19

  • Roberto Silva USP
  • Fernando Xavier USP
  • Antonio Saraiva USP
  • Carlos Cugnasca USP


Epidemics have severe impacts on people's health. The COVID-19 has infected more than 3 million people in 3 months. In this work, we explore the use of unsupervised machine learning to evaluate and monitor the disease spread worldwide in three points in time: January, February, and March of 2020. Besides the features related to the disease spread, we consider HDI, population density, and age structure. We define the number of clusters using the elbow and agglomerative clustering methods, then implement and evaluate the k-means algorithm with 3, 4, and 5 clusters. We conclude that four clusters better represent the data, analyze the clusters over time, and discuss the impacts on each depending on the measures adopted.


SILVA, Roberto; XAVIER, Fernando; SARAIVA, Antonio; CUGNASCA, Carlos. Unsupervised machine learning and pandemics spread: the case of COVID-19. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 20. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 506-511. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2020.11548.