Spending pattern visualization using unsupervised machine learning

Gabriel Porto Oliveira; Jadson Castro Gertrudes; Roberta B. Oliveira

doi:10.5753/sbbd.2023.231577

Gabriel Porto Oliveira Universidade de Brasília (UnB) http://orcid.org/0009-0000-5758-5503
Jadson Castro Gertrudes Universidade Federal de Ouro Preto (UFOP) https://orcid.org/0000-0002-0861-6681
Roberta B. Oliveira Universidade de Brasília (UnB) https://orcid.org/0000-0002-5373-9402

DOI: https://doi.org/10.5753/sbbd.2023.231577

Resumo

As the amount of financial data generated grows yearly, there is a growing need to leverage this data to develop customized financial products to meet individual users' unique needs and preferences. This study proposes a method for identifying potential spending patterns based on categorized financial transactions. Different clustering and outlier detection algorithms are compared using various internal validation metrics and empirical analysis of cluster balancing. A visualization of the spending patterns is created from the proposed method and validated by an expert in the domain in order to extract more insights based on user behavior. The visualization was found to be helpful when analyzing for insights into spending pattern.

Palavras-chave: unsupervised machine-learning, clustering, outlier removal, spending patterns

Referências

Allegue, S., Abdellatif, T., and Bannour, K. (2020). RFMC: a spending-category segmentation. In 2020 IEEE 29th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). IEEE.

Arthur, D. and Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 1027–1035, USA. Society for Industrial and Applied Mathematics.

Bock, H.-H. (2008). Origins and extensions of the k-means algorithm in cluster analysis. Electronic Journal for History of Probability and Statistics, 4(2):1–18.

Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104.

Caliński, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1):1–27.

Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2):224–227.

Di, J. and Gou, X. (2018). Bisecting k-means algorithm based on k-valued selfdetermining and clustering center optimization. J. Comput., 13(6):588–595.

Ernawati, E., Baharin, S. S. K., and Kasmin, F. (2021). A review of data mining methods in RFM-based customer segmentation. Journal of Physics: Conference Series, 1869(1):012085.

Fukunaga, K. and Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1):32–40.

Gan, G., Ma, C., and Wu, J. (2007). In Data Clustering: Theory, Algorithms, and Applications, pages 299–320. Society for Industrial and Applied Mathematics.

Holm, M. (2017). Machine learning and spending patterns: A study on the possibility of identifying riskily spending behaviour. Master’s thesis, KTH Royal Institute of Technology.

Hu, X., Shi, Z., Yang, Y., and Chen, L. (2020). Classification method of internet catering customer based on improved rfm model and cluster analysis. In 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pages 28–31. IEEE.

Huang, Y., Zhang, M., and He, Y. (2020). Research on improved RFM customer segmentation model based on k-means algorithm. In 2020 5th International Conference on Computational Intelligence and Applications (ICCIA). IEEE.

Jo-Ting, W., Shih-Yen, L., and Hsin-Hung, W. (2010). A review of the application of rfm model. African Journal of Business Management, 4(19):4199–4206.

Lefait, G. and Kechadi, T. (2010). Customer segmentation architecture based on clustering techniques. In 2010 Fourth International Conference on Digital Society. IEEE.

Li, H. and Wu, W. (2021). Construction of chinese national geography APP user operation strategy based on RFM model. In 2021 2nd International Conference on E-Commerce and Internet Technology (ECIT). IEEE.

Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE.

Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9:381–386.

Oliveira, G. P. (2023). A method for defining customer spending behavior based on unsupervised machine learning. Bachelor’s thesis, Universidade de Brasília.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65.

Shaw, M. J., Subramaniam, C., Tan, G. W., and Welge, M. E. (2001). Knowledge management and data mining for marketing. Decision Support Systems, 31(1):127–137. Knowledge Management Support of Decision Making.

Umuhoza, E., Ntirushwamaboko, D., Awuah, J., and Birir, B. (2020). Using unsupervised machine learning techniques for behavioral-based credit card users segmentation in africa. SAIEE Africa Research Journal, 111(3):95–101.

Wu, J. and Lin, Z. (2005). Research on customer segmentation model by clustering. In Proceedings of the 7th international conference on Electronic commerce - ICEC '05. ACM Press.

Zakrzewska, D. and Murlewski, J. (2005). Clustering algorithms for bank customer segmentation. In 5th International Conference on Intelligent Systems Design and Applications (ISDA'05). IEEE.