Avaliação de técnicas de balanceamento na classificação de aceitabilidade de carros

Lucas Ferreira Paiva; Allan Fernando Oliveira de Mattos; Lucas de Assis Silva; Juscimara Gomes Avelino; George Darmiton da Cunha Cavalcanti

doi:10.5753/eniac.2023.233712

Lucas Ferreira Paiva Embraer S.A. / Universidade Federal de Pernambuco
Allan Fernando Oliveira de Mattos Embraer S.A. / Universidade Federal de Pernambuco
Lucas de Assis Silva Embraer S.A.
Juscimara Gomes Avelino Universidade Federal de Pernambuco
George Darmiton da Cunha Cavalcanti Universidade Federal de Pernambuco

DOI: https://doi.org/10.5753/eniac.2023.233712

Resumo

A aceitabilidade de carros consiste em classificar um veículo com base nas suas características físicas e financeiras. Esse tipo de análise auxilia na aquisição, ou não, de um determinado modelo de automóvel. Neste estudo, o objetivo foi avaliar o impacto do uso de técnicas de subamostragem, sobreamostragem e uma combinação das duas técnicas em oito modelos de aprendizado de máquinas. Para cada técnica de balanceamento e modelo foi utilizado otimização de hiper-parâmetros e seleção de atributos. Os resultados obtidos neste estudo superaram o estado da arte para o SVM. Além disso, foi possível notar a melhora de modelos mais simples com o uso das técnicas de balanceamento.

Palavras-chave: Técnicas de Balanceamento, Aprendizado de Máquina, Otimização de Hiperparâmetros, Comparação de Métricas

Referências

Arafa, A., Radad, M., El-Fishawy, N., and Badawy, M. (2022). Logistic regression hyper-parameter optimization for cancer classification. Menoufia Journal of Electronic Engineering Research, 31(1):1–8.

Awwalu, J., Ghazvini, A., and Bakar, A. A. (2014). Performance comparison of data mining algorithms: a case study on car evaluation dataset. International Journal of Computer Trends and Technology (IJCTT), 13(2):78–82.

Bentéjac, C., Csörgő, A., and Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3):1937–1967.

Chandrashekar, G. and Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.

De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J., and Moguerza, J. M. (2022). General performance score for classification problems. Applied Intelligence, 52(10):12049–12063.

Ferreira-Paiva, L., Lopes, H. G., Alfaro-Espinoza, E. R., Félix, L. B., and Neves, R. V. A. (2022). Towards a device for helping deaf people to dance: estimation of forro bar length using artificial neural network. IEEE Latin America Transactions, 20(6):970–976.

Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182.

Jain, P. and Vishwakarma, S. K. (2017). A case study on car evaluation and prediction: comparative analysis using data mining models. International Journal of Computer Applications (0975–8887), 172(9).

Jović, A., Brkić, K., and Bogunović, N. (2015). A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pages 1200–1205, Opatija. IEEE.

Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232.

Lemaître, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.

Lopez-Vazquez, V., Lopez-Guede, J. M., Marini, S., Fanelli, E., Johnsen, E., and Aguzzi, J. (2020). Video image enhancement and machine learning pipeline for underwater animal detection and classification at cabled observatories. Sensors, 20(3):726.

Maier, A., Syben, C., Lasser, T., and Riess, C. (2019). A gentle introduction to deep learning in medical image processing. Zeitschrift für Medizinische Physik, 29(2):86–101.

Makki, S., Mustapha, A., Kassim, J., Gharayebeh, E., and Alhazmi, M. (2011). Employing neural network and naive bayesian classifier in mining data for car evaluation. In Proc. ICGST AIML-11 Conference, pages 113–119.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Potdar, K., Pardawala, T. S., and Pai, C. D. (2017). A comparative study of categorical variable encoding techniques for neural network classifiers. International Journal of Computer Applications, 175:7–9.

Ramamohan, V., Singhal, S., Gupta, A. R., and Bolia, N. B. (2022). Discrete simulation optimization for tuning machine learning method hyperparameters. arXiv preprint arXiv:2201.05978.

Rehman, Z. U., Fayyaz, H., Shah, A. A., Aslam, N., Hanif, M., and Abbas, S. (2018). Performance evaluation of mlpnn and nb: a comparative study on car evaluation dataset. International Journal of Computer Science and Network Security, 18(9):144–147.

Shruthi, U., Nagaveni, V., and Raghavendra, B. (2019). A review on machine learning classification techniques for plant disease detection. In 2019 5th International conference on advanced computing & communication systems (ICACCS), pages 281–284. IEEE.

Uzut, Ö. G. and Buyrukoğlu, S. (2020). Hyperparameter optimization of data mining algorithms on car evaluation dataset. Euroasia Journal of Mathematics, Engineering, Natural & Medical Sciences, 8(9):70–76.

Yang, L. and Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415:295–316.

Yu, B., Li, C., Mirza, N., and Umar, M. (2022). Forecasting credit ratings of decarbonized firms: Comparative assessment of machine learning models. Technological Forecasting and Social Change, 174:121255.