Use of the PAM algorithm in k-medoids for clustering mental health textual data
Abstract
The aim of this study was to investigate the application of the PAM algorithm for clustering using the k-medoid method, applied to open questions in a questionnaire on mental health of university students. The PAM algorithm was used with the Euclidean distance, based on the matrix of documents and terms. The results revealed that the PAM algorithm, with two, three and four initial k-medoids, analyzed 427 open responses, with a volume of 2101 words, with processing time of 40.05, 40.39 and 48.69 seconds respectively. The PAM algorithm demonstrated good efficiency to perform cluster analysis on textual data.
References
Brito, J. A. M., Ochi, L. S., Brito, L. R., and Montenegro, F. M. T. (2010). Um algoritmo para o agrupamento baseado em k-medoids. Revista Brasileira de Estatistica, 71(234):75–100.
Feinerer, I. and Hornik, K. (2023). tm: Text Mining Package. R package version 0.7-11.
García, R. G., Beltrán, B., Vilariño, D., Zepeda, C., and Martínez, R. (2020). Comparison of clustering algorithms in text clustering tasks. Computación y Sistemas, 24(2):499–437.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2023). cluster: Cluster Analysis Basics and Extensions.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Vishwakarma, S., Nair, D. P. S., and Rao, D. S. (2017). Comparative study of k-means andk-medoid clustering for social media text mining. NTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH AND ENGINEERING TRENDS, 2(1):297–302.
