Model and Algorithm-Agnostic Clustering Interpretability

Guilherme S. Oliveira; Fabrício A. Silva; Ricardo V. Ferreira

doi:10.5753/kdmile.2023.232618

Guilherme S. Oliveira Universidade Federal de Viçosa
Fabrício A. Silva Universidade Federal de Viçosa https://orcid.org/0000-0002-0713-0583
Ricardo V. Ferreira Cinnecta do Brasil S/A

DOI: https://doi.org/10.5753/kdmile.2023.232618

Resumo

Data clustering through unsupervised algorithms is an important technique in several applications, both in research and industrial projects, allowing similar elements to be associated with each other for knowledge extraction. After grouping, the interpretation and understanding of the created clusters is a crucial step so that they can be used in decision-making. However, this is not a trivial task, since it requires manual and repetitive analyses, which consume time and resources of those involved. In the present work, a solution for the interpretability of clusters generated by unsupervised learning is proposed. Unlike existing solutions in the literature, the proposed approach is independent of the model and algorithm used for clustering, and generates easy-to-understand descriptions for end users, facilitating their use by teams from different areas of the companies. The results showed that the solution was able to provide a friendly description to interpret the 13 clusters created to segment 263,684 customers of a company.

Palavras-chave: clustering, explainable, unsupervised learning

Referências

Bartels, C. Cluster analysis for customer segmentation with open banking data. In 2022 3rd Asia Service Sciences and Software Engineering Conference. ASSE’ 22. Association for Computing Machinery, New York, NY, USA, pp. 87–94, 2022.

Bertsimas, D., Orfanoudaki, A., and Wiberg, H. M. Interpretable clustering: an optimization approach. Machine Learning vol. 110, pp. 89–138, 2020.

Breiman, L. Random forests. Machine learning 45 (1): 5–32, 2001.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. Classification and regression trees, 1983.

Corral, G., Armengol, E., Fornells, A., and Golobardes, E. Explanations of unsupervised learning clustering applied to data security analysis. Neurocomputing 72 (13): 2754–2762, 2009. Hybrid Learning Machines (HAIS 2007) / Recent Developments in Natural Computation (ICNC 2007).

Dasgupta, S., Frost, N., Moshkovitz, M., and Rashtchian, C. Explainable k-means and k-medians clustering. ICML’20. JMLR.org, 2020.

Ellis, C. A., Sendi, M. S. E., Plis, S., Miller, R. L., and Calhoun, V. D. Algorithm-agnostic explainability for unsupervised clustering. ArXiv vol. abs/2105.08053, 2021.

Fraiman, R., Ghattas, B., and Svarc, M. Interpretable clustering using unsupervised binary trees. Advances in Data Analysis and Classification vol. 7, pp. 125–145, 2013.

Frost, N., Moshkovitz, M., and Rashtchian, C. Exkmc: Expanding explainable k-means clustering. ArXiv vol. abs/2006.02399, 2020.

Ghahramani, Z. Unsupervised learning. In Summer school on machine learning. Springer, pp. 72–112, 2003.

Li, Y., Chu, X., Tian, D., Feng, J., and Mu, W. Customer segmentation using k-means clustering and the adaptive particle swarm optimization algorithm. Applied Soft Computing vol. 113, pp. 107924, 2021.

Molnar, C. Interpretable Machine Learning, 2022.

Morichetta, A., Casas, P., and Mellia, M. Explain-it: Towards explainable ai for unsupervised network traffic analysis. Big-DAMA ’19. Association for Computing Machinery, New York, NY, USA, pp. 22–28, 2019.

Xu, R. and Wunsch, D. Survey of clustering algorithms. IEEE Transactions on neural networks 16 (3): 645–678, 2005.

Yu, Z., Sohail, A., Nofal, T. A., and Tavares, J. M. R. S. Explainability Of Neural Network Clustering In Interpreting The Covid-19 Emergency Data. FRACTALS (fractals) 30 (05): 1–12, August, 2022.