Evaluating LLM-Based Chatbots through Touchpoint-Driven Process Models

  • Carlos H. Camillo da Silva USP
  • Ana R. Cárdenas Maita USP
  • Gregorio Assagra de Almeida Filho USP
  • Caio V. Melo da Silva USP
  • Gabriel Dimant USP
  • Renato A. Almeida Instituto de Ciência e Tecnologia Itaú
  • Guilherme M. Lopes Costa Instituto de Ciência e Tecnologia Itaú
  • Anna M. Cintra Araujo Instituto de Ciência e Tecnologia Itaú
  • Barbara Correia Dos Santos Santana Instituto de Ciência e Tecnologia Itaú
  • Enio Alterman Blay USP
  • Sarajane Marques Peres USP

Resumo


This article introduces a method for evaluating the quality of large language model (LLM)-based chatbots, focusing on the user journey. Each dialogue utterance, whether from the user or the bot, is mapped to a predefined set of touchpoints that represent key interaction goals. The resulting dialogue sequences are treated as business process activities, enabling process mining to discover and analyze the process model underlying the dialogue context. This method identifies inefficiencies such as bottlenecks, conversational loops, and shifting responsibility to human agents. It was tested on simulated dialogues generated by an LLM-based financial chatbot, with user personas and dialogue goals also synthesized by LLMs. Although based on synthetic data, the results demonstrate the potential of process mining to uncover structural strengths and weaknesses in LLM-based chatbots.

Referências

Aalst, W. M. P. (2016). Process Mining – Data Science in Action. Springer.

Bernard, G. and Andritsos, P. (2017). A process mining based model for customer journey mapping. In Proc. CAiSE Forum and Doctoral Consortium, pages 49–56. CEUR-WS.

de Arriba-Pérez, F., García-Méndez, S., González-Castaño, F. J., and Costa-Montenegro, E. (2022). Automatic detection of cognitive impairment in elderly people using an entertainment chatbot with natural language processing capabilities. J. Ambient Intell. Humaniz. Comput., 14:16283–16298.

Evermann, J., Rehse, J.-R., and Fettke, P. (2017). Predicting process behaviour using deep learning. Decis. Support Syst., 100:129–140.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., and Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. Preprint arXiv:2312.10997.

Jurafsky, D. and Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. Online manuscript released Jan. 12, 2025.

Merriam-Webster Dict. (2025). Touchpoint. [link]. Accessed: 2025-07-19.

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., and Gao, J. (2024). Large language models: A survey. Preprint arXiv:2402.06196.

Ozuem, W., Ranfagni, S., Willis, M., Salvietti, G., and Howell, K. (2025). Chatbots, service failure recovery, and online customer experience through lenses of frustration–aggression theory and signaling theory. J. Serv. Mark., 39(5):493–512.

Rana, J., Gaur, L., Singh, G., Awan, U., and Rasheed, M. I. (2022). Reinforcing customer journey through artificial intelligence: a review and research agenda. Int. J. Emerg. Mark., 17(7):1738–1758.

Samuel, V., Zou, H. P., Zhou, Y., Chaudhari, S., Kalyan, A., Rajpurohit, T., Deshpande, A., Narasimhan, K., and Murahari, V. (2024). Personagym: Evaluating persona agents and LLMs. Preprint arXiv:2407.18416.

van der Aalst, W. M. P. (2022). Process mining: A 360 degree overview. In Process Mining Handbook, volume 448 of Lect. Notes Bus. Inf. Process., pages 3–34. Springer.

Wang, L., Ma, C., Feng, X., et al. (2024). A Survey on Large Language Model based Autonomous Agents. Front. Comput. Sci., 18:186345.

Zhang, X., Yu, H., Li, Y., Wang, M., Chen, L., and Huang, F. (2024). The imperative of conversation analysis in the era of LLMs: A survey of tasks, techniques, and trends. Preprint arXiv:2409.14195.
Publicado
22/09/2025
SILVA, Carlos H. Camillo da et al. Evaluating LLM-Based Chatbots through Touchpoint-Driven Process Models. In: WORKSHOP SOBRE BOTS NA ENGENHARIA DE SOFTWARE (WBOTS), 2. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1-10. DOI: https://doi.org/10.5753/wbots.2025.14148.