Unsupervised machine learning and pandemics spread: the case of COVID-19

  • Roberto Silva USP
  • Fernando Xavier USP
  • Antonio Saraiva USP
  • Carlos Cugnasca USP


Epidemics have severe impacts on people's health. The COVID-19 has infected more than 3 million people in 3 months. In this work, we explore the use of unsupervised machine learning to evaluate and monitor the disease spread worldwide in three points in time: January, February, and March of 2020. Besides the features related to the disease spread, we consider HDI, population density, and age structure. We define the number of clusters using the elbow and agglomerative clustering methods, then implement and evaluate the k-means algorithm with 3, 4, and 5 clusters. We conclude that four clusters better represent the data, analyze the clusters over time, and discuss the impacts on each depending on the measures adopted.


Attaluri, P. K., Zheng, X., Chen, Z., Lu, G. (2009) "Applying machine learning techniques to classify H1N1 viral strains occurring in 2009 flu pandemic". BIOT2009, v.21.

Cabral, M. C. et al. (2019) "Epidemia de sarampo e vacinação de bloqueio: um diagnóstico situacional dos estados do Amazonas, Roraima e Pará". Revista Saúde e Meio Ambiente, v.9, n.3, p. 1-7.

Dong, E., Du, H., Gardner, L. (2020) "An interactive web-based dashboard to track COVID-19 in real time". The Lancet Infectious Diseases, Correspondence, p.1-2.

Ghahramani, Z. (2003) "Unsupervised learning". In: Summer School on Machine Learning, p. 72-112. Springer, Berlin, Heidelberg.

Haraty, R.A., Dimishkieh, M., Masud, M. (2015) "An enhanced k-means clustering algorithm for pattern discovery in healthcare data". International Journal of distributed sensor networks, v.11, n.6, p.615740.

Jain, A.K. (2010) "Data clustering: 50 years beyond k-means", Pattern Recognition Letters, v.31, n.8, p.651-666.

Martis, R.J., Prasad, H., Chakraborty, C., Ray, A.K. (2014) "The application of genetic algorithm for unsupervised classification of ECG". In Machine Learning in Healthcare Informatics, p. 65-80. Springer, Berlin, Heidelberg.

Steinley, D. (2006) "K-means clustering: a half-century synthesis". British Journal of Mathematical and Statistical Psychology, v.59, n.1, p.1-34.

Stricker, M.D. et al. (2013) "Dietary patterns derived from principal component-and kmeans cluster analysis: long-term association with coronary heart disease and stroke". Nutrition, Metabolism and Cardiovascular Diseases, v.23, n.3, p.250-256.

WHO. WHO Director-General's opening remarks at the media briefing on COVID-19, WHO, 2020. Available in: [https://www.who.int/dg/speeches/detail/who-directorgeneral-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020]. Accessed in: March 15, 2020.
Como Citar

Selecione um Formato
SILVA, Roberto; XAVIER, Fernando; SARAIVA, Antonio; CUGNASCA, Carlos. Unsupervised machine learning and pandemics spread: the case of COVID-19. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 20. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 506-511. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2020.11548.