Fine-tuning Open-source Large Language Models for Automated Response to Customer Feedback

M. Albuquerque; L. Barbosa; J. Moreira; A. da Silva; T. Melo

doi:10.5753/kdmile.2024.244556

M. Albuquerque UFPE
L. Barbosa UFPE
J. Moreira UFPE
A. da Silva UFAM
T. Melo UEA

DOI: https://doi.org/10.5753/kdmile.2024.244556

Resumo

Online reviews play a key role in influence customer decisions during their purchase journey. Consequently, negative feedback from customers can have an adverse impact on the sales of products or services, potentially leading to diminished revenue and market share. However, this effect can be mitigated by crafting thoughtful responses to these comments. This paper proposes using open-source pre-trained large language models, specifically smaller versions, to respond to negative reviews effectively. These models, pre-trained on large datasets, require minimal additional data for fine-tuning. To validate the effectiveness of this approach, we apply our solution to the domain of restaurant reviews. Our research shows that these fine-tuned models perform comparably to larger models, such as ChatGPT-3.5, in generating respectful, specific, and corrective responses that encourage customers to revisit the restaurant.

Palavras-chave: large language model, natural language processing, supervised fine-tuning

Referências

Ahuja, K., Diddee, H., Hada, R., Ochieng, M., Ramesh, K., Jain, P., Nambi, A., Ganu, T., Segal, S., Axmed, M., et al. Mega: Multilingual evaluation of generative ai. arXiv preprint arXiv:2303.12528 , 2023.

Alnuhait, D., Wu, Q., and Yu, Z. Facechat: An emotion-aware face-to-face dialogue framework, 2023.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. Language models are few-shot learners, 2020.

Cao, Y. and Fard, F. H. Pre-trained neural language models for automatic mobile app user feedback answer generation. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, pp. 120–125, 2021.

Clavié, B., Ciceu, A., Naylor, F., Soulié, G., and Brightwell, T. Large language models in the workplace: A case study on prompt engineering for job type classification. In International Conference on Applications of Natural Language to Information Systems. Springer, pp. 3–17, 2023.

de Melo, T. Sentilexbr: An automatic methodology of building sentiment lexicons for the portuguese language. Journal of Information and Data Management 13 (3), 2022.

Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms, 2023.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.

Farooq, U., Siddique, A. B., Jamour, F., Zhao, Z., and Hristidis, V. App-aware response synthesis for user reviews, 2020.

Gao, C., Zhou, W., Xia, X., Lo, D., Xie, Q., and Lyu, M. R. Automating app review response generation based on contextual knowledge. ACM Transactions on Software Engineering and Methodology (TOSEM) 31 (1): 1–36, 2021.

Lee, J., Park, D.-H., and Han, I. The effect of negative online consumer reviews on product attitude: An information processing view. Electronic Commerce Research and Applications 7 (3): 341–352, 2008. Special Section: New Research from the 2006 International Conference on Electronic Commerce.

Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, pp. 74–81, 2004.

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., Wu, Z., Zhao, L., Zhu, D., Li, X., Qiang, N., Shen, D., Liu, T., and Ge, B. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology 1 (2): 100017, Sept., 2023.

P, M., CM, P., U, S., A, G., C, S., and Keshri. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth, 2019.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation, 2002.

Qing, P., Huang, H., Razzaq, A., Tang, Y., and Tu, M. Impacts of sellers’ responses to online negative consumer reviews: Evidence from an agricultural product. Canadian Journal of Agricultural Economics/Revue canadienne d’agroeconomie 66 (4): 587–597, 2018.

Qiu, H., He, H., Zhang, S., Li, A., and Lan, Z. Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support, 2024.

Richardson, L. Beautiful soup documentation. April , 2007.

Sadiq, M. W., Akhtar, M. W., Huo, C., and Zulfiqar, S. Chatgpt-powered chatbot as a green evangelist: an innovative path toward sustainable consumerism in e-commerce. The Service Industries Journal 44 (3-4): 173–217, 2024.

Schwartz, S., Yaeli, A., and Shlomov, S. Enhancing trust in llm-based ai automation agents: New considerations and future challenges. arXiv preprint arXiv:2308.05391 , 2023.

Shin, J., Tang, C., Mohati, T., Nayebi, M., Wang, S., and Hemmati, H. Prompt engineering or fine tuning: An empirical assessment of large language models in automated software engineering tasks, 2023.

Sparks, B. A., So, K. K. F., and Bradley, G. L. Responding to negative online reviews: The effects of hotel responses on customer inferences of trust and concern. Tourism Management vol. 53, pp. 74–85, 2016.

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., et al. A survey on large language model based autonomous agents. Frontiers of Computer Science 18 (6): 1–26, 2024.

Zhang, W., Gu, W., Gao, C., and Lyu, M. R. A transformer-based approach for improving app review response generation. Software: Practice and Experience 53 (2): 438–454, 2023.

Zhang, Y., Sun, S., Galley, M., Chen, Y.-C., Brockett, C., Gao, X., Gao, J., Liu, J., and Dolan, B. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536 , 2019.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., and Wen, J.-R. A survey of large language models, 2023.