Use of the PAM algorithm in k-medoids for clustering mental health textual data

  • Bruno G. Silva UFMT
  • Anderson C. S. Oliveira UFMT
  • Lia H. M. Morita UFMT
  • Thiago M. Brito UFMT

Abstract


The aim of this study was to investigate the application of the PAM algorithm for clustering using the k-medoid method, applied to open questions in a questionnaire on mental health of university students. The PAM algorithm was used with the Euclidean distance, based on the matrix of documents and terms. The results revealed that the PAM algorithm, with two, three and four initial k-medoids, analyzed 427 open responses, with a volume of 2101 words, with processing time of 40.05, 40.39 and 48.69 seconds respectively. The PAM algorithm demonstrated good efficiency to perform cluster analysis on textual data.

Keywords: Text mining, Document-term matrix, Mental health perception, Unstructured data, Questionnaires

References

Ariff, N. M., Bakar, M. A. A., and Rahmad, M. I. (2018). Comparative study of document clustering algorithms. International Journal of Engineering Technology, 7(4.11):246–251.

Brito, J. A. M., Ochi, L. S., Brito, L. R., and Montenegro, F. M. T. (2010). Um algoritmo para o agrupamento baseado em k-medoids. Revista Brasileira de Estatistica, 71(234):75–100.

Feinerer, I. and Hornik, K. (2023). tm: Text Mining Package. R package version 0.7-11.

García, R. G., Beltrán, B., Vilariño, D., Zepeda, C., and Martínez, R. (2020). Comparison of clustering algorithms in text clustering tasks. Computación y Sistemas, 24(2):499–437.

Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2023). cluster: Cluster Analysis Basics and Extensions.

R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Vishwakarma, S., Nair, D. P. S., and Rao, D. S. (2017). Comparative study of k-means andk-medoid clustering for social media text mining. NTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH AND ENGINEERING TRENDS, 2(1):297–302.
Published
2023-11-28
SILVA, Bruno G.; OLIVEIRA, Anderson C. S.; MORITA, Lia H. M.; BRITO, Thiago M.. Use of the PAM algorithm in k-medoids for clustering mental health textual data. In: REGIONAL SCHOOL ON INFORMATICS OF MATO GROSSO (ERI-MT), 12. , 2023, Cuiabá/MT. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 201-205. ISSN 2447-5386. DOI: https://doi.org/10.5753/eri-mt.2023.236246.