Evaluation of Machine Learning Models for Estimating Sales in Physical Retail


The amount of sales in a store is a strong indicator that contributes to managers' decision making. In physical retail, unlike e-commerce, it is more difficult to collect sales and customer behavior metrics because it depends on great sensing and integration between systems. In a shopping mall scenario, we use real WiFi data, People Flow and Sales create a dataset. In this article we propose an evaluation of machine learning models with the objective of estimating the next hour sales in Low, Medium and High, thus providing a tool to assist in decision making. We use the PyCaret library to perform the training of the 13 compared algorithms. The F1-score metric was used to evaluate the models. The Gradient Booster Classifier was the model that got the best result with a score of 84.75%. Among the estimated classes, the High class showed the greatest error in the confusion matrix, reaching 60%, possibly a reflection of the low amount of records in the high class.

Palavras-chave: data mining, estimate sales, evaluation models, machine learning


Géron, A. Mãos à Obra: Aprendizado de Máquina com Scikit-Learn & TensorFlow. Alta Books, 2019.

Gonçalves, A. P. G. Previsão e-commerce: indicadores de desempenho por canal. Ph.D. thesis, 2019.

Nabipour, M., Nayyeri, P., Jabani, H.,Shahab, S., and Mosavi, A. Predicting stock market trends using machine learning and deep learning algorithms via continuous and binary data a comparative analysis. IEEE Access vol. 8, pp. 150199–150212, 2020.

Shaytura, S. V., Kozhayev, Y. P., Ordov, K. V., Antonenkova, A. V., and Zhenova, N. A. Performance evaluation of the electronic commerce systems. Revista Espacios, 2017.

Sinaga, K. P. and Yang, M.-S. Unsupervised k-means clustering algorithm. IEEE Access vol. 8, pp. 80716–80727, 2020

Srivastava, S., Gupta, M. R., and Frigyik, B. A. Bayesian quadratic discriminant analysis. Journal of Machine Learning Research 8 (6), 2007.

Taha, A. A. and Malebary, S. J. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access vol. 8, pp. 25579–25587, 2020.

Taleb, I., Dssouli, R., and Serhani, M. A. Big data pre-processing: A quality framework. In2015 IEEE international congress on big data. IEEE, pp. 191–198, 2015.

Triayudi, A., Sumiati, S., Nurhadiyan, T., and Rosalina, V. Data mining implementation to predict sales using time series method. Proceeding of the Electrical Engineering Computer Science and Informatics7 (2): 1–6, 2020.

Zhang, X.-D .Machine learning. In A Matrix Algebra Approach to Artificial Intelligence. Springer, pp. 223–440, 2020a.

Zunic, E., Korjenic, K., Hodzic, K., and Donko, D. Application of facebook’s prophet algorithm for success fulsales forecasting based on real-world data. arXiv preprint arXiv:2005.07575, 2020.
ALVES, Geovanne O.; FONSÊCA, Jorge C. B.; MACIEL, Alexandre M. A.. Evaluation of Machine Learning Models for Estimating Sales in Physical Retail. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 9. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 41-48. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2021.17459.