Predicting the Next Transaction on Anonymized Payment Datasets with Deep Learning Models

  • Claudia Francesca Suarez Mariscal Universidade Federal do Rio Grande do Sul (UFRGS)
  • Renata Galante Universidade Federal do Rio Grande do Sul (UFRGS)
  • Weverton Cordeiro Universidade Federal do Rio Grande do Sul (UFRGS) https://orcid.org/0000-0001-7536-0586

Resumo


Predicting customer behavior has long been a critical area of exploration for many companies, who often analyze purchase history to uncover behavioral trends and enhance their services. However, analyzing large amounts of personal customer data while maintaining compliance with data protection regulations (GDPR or LGPD) is challenging. In this paper, we propose three models that tackle the complexities of recognizing purchasing patterns for diverse applications in anonymized data. First, we evaluate architectures leveraging DL models for predicting subsequent purchase transactions using a dataset that safeguards confidential customer data while adhering to data protection regulations. The suggested models rely on Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) to discern behaviors within a dataset devoid of personal information, allowing for comparison with other models pursuing the same goal. Then, we optimize each model's parameters, with findings indicating that the GRU-based model demonstrates superior generalization capabilities.
Palavras-chave: Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), General Data Protection Regulation (GDPR)

Referências

Campagna, D. P., da Silva, A. S., and Braganholo, V. (2020). Achieving gdpr compliance through provenance: An extended model. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 13–24. SBC.

Fleder, M. and Shah, D. (2020). I know what you bought at chipotle for $9.81 by solving a linear inverse problem. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(3):1–17.

Huang, C., Wu, X., Zhang, X., Zhang, C., Zhao, J., Yin, D., and Chawla, N. V. (2019). Online purchase prediction via multi-scale modeling of behavior dynamics. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2613–2622.

Ładyżyński, P., Żbikowski, K., and Gawrysiak, P. (2019). Direct marketing campaigns in retail banking with the use of deep learning and random forests. Expert Systems with Applications, 134:28–35.

Li, J., Pan, S., Huang, L., et al. (2019). A machine learning based method for customer behavior prediction. Tehnički vjesnik, 26(6):1670–1676.

Li, Q., Chen, Z., and Zhao, H. V. (2021). Prima++: A probabilistic framework for user choice modelling with small data. IEEE Transactions on Signal Processing, 69:1140–1153.

Martens, D. (2022). Data science ethics: Concepts, techniques, and cautionary tales. Oxford University Press.

Martínez, A., Schmuck, C., Pereverzyev Jr, S., Pirker, C., and Haltmeier, M. (2020). A machine learning framework for customer purchase prediction in the non-contractual setting. European Journal of Operational Research, 281(3):588–596.

Nery, C., Galante, R., and Cordeiro, W. (2021). FIP-SHA - finding individual profiles through shared accounts. In Strauss, C., Kotsis, G., Tjoa, A. M., and Khalil, I., editors, Database and Expert Systems Applications - 32nd International Conference, DEXA 2021, Virtual Event, September 27-30, 2021, Proceedings, Part II, volume 12924 of Lecture Notes in Computer Science, pages 115–126. Springer.

Neto, E. R., Mendonça, A. L., Brito, F. T., and Machado, J. C. (2018). Privlbs: uma abordagem para preservação de privacidade de dados em serviços baseados em localização. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 109–120. SBC.

Pinheiro, P. P. (2020). Proteção de dados pessoais: Comentários à lei n. 13.709/2018-lgpd. Saraiva Educação SA.

Rendle, S., Freudenthaler, C., and Schmidt-Thieme, L. (2010). Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web, pages 811–820.

Ruiz, F. J., Athey, S., and Blei, D. M. (2020). Shopper: A probabilistic model of consumer choice with substitutes and complements.

Safara, F. (2022). A computational model to predict consumer behaviour during covid-19 pandemic. Computational Economics, 59(4):1525–1538.

Sarkar, M. and De Bruyn, A. (2021). Lstm response models for direct marketing analytics: Replacing feature engineering with deep learning. Journal of Interactive Marketing, 53(1):80–95.

Suarez Mariscal, C., de Lima, B. S. M., Galante, R., and Cordeiro, W. (2023). Assessing explainable recommendations from knowledge graph-based in an international streaming platform. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web, WebMedia ’23, page 213–220, New York, NY, USA. Association for Computing Machinery.

Tabianan, Kayalvily e Velu, S. e. R. V. (2022). K-means clustering approach for intelligent customer segmentation using customer purchase behavior data. Sustainability, 14(12):7243.

Vasupula, NarsingRao e Munnangi, V. e. D. S. (2022). Modern privacy risks and protection strategies in data analytics. In Soft Computing and Signal Processing: Proceedings of 3rd ICSCSP 2020, Volume 2, pages 81–89. Springer.

Wachter, S., Mittelstadt, B., and Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841.

Wang, W., Xiong, W., Wang, J., Tao, L., Li, S., Yi, Y., Zou, X., and Li, C. (2023). A user purchase behavior prediction method based on xgboost. Electronics, 12(9):2047.

Wen, Y.-T., Yeh, P.-W., Tsai, T.-H., Peng, W.-C., and Shuai, H.-H. (2018). Customer purchase behavior prediction from payment datasets. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 628–636.

Wieringa, J., Kannan, P., Ma, X., Reutterer, T., Risselada, H., and Skiera, B. (2021). Data analytics in a privacy-concerned world. Journal of Business Research, 122:915–925.

Yadav, S. and Shukla, S. (2016). Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In 2016 IEEE 6th International conference on advanced computing (IACC), pages 78–83. IEEE.

Yuan, Q., Zhang, W., Zhang, C., Geng, X., Cong, G., and Han, J. (2017). Pred: Periodic region detection for mobility modeling of social media users. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 263–272.

Zhou, M., Ding, Z., Tang, J., and Yin, D. (2018). Micro behaviors: A new perspective in e-commerce recommender systems. In Proceedings of the eleventh ACM international conference on web search and data mining, pages 727–735.

Zhu, B., Tang, W., Mao, X., and Yang, W. (2020). Location-based hybrid deep learning model for purchase prediction. In 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), pages 161–165. IEEE.
Publicado
14/10/2024
MARISCAL, Claudia Francesca Suarez; GALANTE, Renata; CORDEIRO, Weverton. Predicting the Next Transaction on Anonymized Payment Datasets with Deep Learning Models. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 639-651. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.243511.