Exploring Supervised Learning Models for Multi-Label Text Classification in Brazilian Restaurant Reviews
Resumo
Este artigo investiga o uso de métodos de Processamento de Linguagem Natural (NLP) para classificação de comentários de clientes sobre restaurantes brasileiros, explorando diversas técnicas de pré-processamento para aprimorar modelos de aprendizado supervisionado. Entre os modelos avaliados, a combinação da Regressão Logística (LR) com a técnica de préprocessamento stemming se mostrou mais eficaz, alcançando um valor de micro F1-Score de 0,89, com destaque na classificação de texto multirrótulo. Quando aplicado a um conjunto de dados reais, o modelo conseguiu ser útil na identificação de diferenças sutis nas opiniões dos clientes, até mesmo dentro de unidades de uma mesma franquia de restaurantes.
Referências
Blohm, M., Hanussek, M., and Kintz, M. (2020). Leveraging automated machine learning for text classification: Evaluation of automl tools and comparison with human performance. arXiv preprint arXiv:2012.03575.
Catelli, R., Pelosi, S., and Esposito, M. (2022). Lexicon-based vs. bert-based sentiment analysis: A comparative study in italian. Electronics, 11(3):374.
Cysneiros Aragão, M. V., Guimarães Afonso, A., Ferraz, R. C., Gonçalves Ferreira, R., and Gomes Leite, S. (2023). A practical evaluation of automl tools for binary, multi-class, and multilabel classification.
Das, B. and Chakraborty, S. (2018). An improved text sentiment classification model using tf-idf and next word negation. arXiv preprint arXiv:1806.06407.
de Melo, T. (2021). Análise de comentários das plataformas online de restaurante michelin no brasil. In A produção do conhecimento nas ciências da comunicação, pages 226–238.
de Oliveira, M. and de Melo, T. (2021). An empirical study of text features for identifying subjective sentences in portuguese. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10, pages 374–388. Springer.
Domeniconi, G., Moro, G., Pasolini, R., and Sartori, C. (2016). A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf. idf. In Data Management Technologies and Applications: 4th International Conference, DATA 2015, Colmar, France, July 20-22, 2015, Revised Selected Papers 4, pages 39–58. Springer.
Gandhi, U. D., Malarvizhi Kumar, P., Chandra Babu, G., and Karthick, G. (2021). Sentiment analysis on twitter data by using convolutional neural network (cnn) and long short term memory (lstm). Wireless Personal Communications, pages 1–10.
Ghag, K. V. and Shah, K. (2015). Comparative analysis of effect of stopwords removal on sentiment classification. In 2015 international conference on computer, communication and control (IC4), pages 1–6. IEEE.
He, J., Wang, C., Wu, H., Yan, L., and Lu, C. (2019). Multi-label chinese comments categorization: comparison of multi-label learning algorithms. Journal of New Media, 1(2):51.
Kadhim, A. I. (2018). An evaluation of preprocessing techniques for text classification. International Journal of Computer Science and Information Security (IJCSIS), 16(6):22–32.
Kumar, J., Konar, R., and Balasubramanian, K. (2020). The impact of social media on consumers’ purchasing behaviour in malaysian restaurants. Journal of Spatial and Organizational Dynamics, 8(3):197–216.
Li, J., Kim, W. G., and Choi, H. M. (2021). Effectiveness of social media marketing on enhancing performance: Evidence from a casual-dining restaurant setting. Tourism Economics, 27(1):3–22.
Priyadarshini, I. and Cotton, C. (2021). A novel lstm–cnn–grid search-based deep neural network for sentiment analysis. The Journal of Supercomputing, 77(12):13911–13932.
Sarica, S. and Luo, J. (2021). Stopwords in technical language processing. Plos one, 16(8):e0254937.
Singh, J. and Tripathi, P. (2021). Sentiment analysis of twitter data by making use of svm, random forest and decision tree algorithm. In 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), pages 193–198. IEEE.
Spolaôr, N., Lee, H. D., and Monard, M. C. (2014). Seleção de atributos para aprendizagem multirrótulo. Master’s thesis, Universidade de São Paulo.
Srividhya, V. and Anitha, R. (2010). Evaluating preprocessing techniques in text categorization. International journal of computer science and application, 47(11):49–51.
Wang, Y., Kim, J., and Kim, J. (2021). The financial impact of online customer reviews in the restaurant industry: A moderating effect of brand equity. International Journal of Hospitality Management, 95:102895.
Wever, M., Tornede, A., Mohr, F., and Hüllermeier, E. (2021). Automl for multi-label classification: Overview and empirical evaluation. IEEE transactions on pattern analysis and machine intelligence, 43(9):3037–3054.
Yu, C.-E. and Zhang, X. (2020). The embedded feelings in local gastronomy: a sentiment analysis of online reviews. Journal of Hospitality and Tourism Technology, 11(3):461–478.
Zhang, M.-L. and Zhou, Z.-H. (2013). A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819–1837.