Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue
Resumo
Conversational agents have made significant progress since ELIZA, expanding their role across various domains, including healthcare, education, and customer service. As these agents become increasingly integrated into daily human interactions, the need for emotional intelligence, particularly empathetic listening, becomes increasingly essential. In this study, we explore how Large Language Models (LLMs) respond when tasked with generating emotionally rich interactions. We analyzed the emotional progression of the dialogues using both sentiment analysis (via VADER) and expert assessments. While the generated conversations often mirrored the intended emotional structure, human evaluation revealed important differences in the perceived empathy and coherence of the responses. These findings suggest that emotion modeling in dialogues requires not only structural alignment in the expressed emotions but also qualitative depth, highlighting the importance of combining automated and human-centered methods in the development of emotionally competent agents.Referências
P. Manickam, S. A. Mariappan, S. M. Murugesan, S. Hansda, A. Kaushik, R. Shinde, and S. Thipperudraswamy, “Artificial intelligence (ai) and internet of medical things (iomt) assisted biomedical systems for intelligent healthcare,” Biosensors, vol. 12, no. 8, p. 562, 2022.
J. Weizenbaum, “Eliza—a computer program for the study of natural language communication between man and machine,” Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966.
A. Talyshinskii, N. Naik, B. Z. Hameed, P. Juliebø-Jones, and B. K. Somani, “Potential of ai-driven chatbots in urology: revolutionizing patient care through artificial intelligence,” Current Urology Reports, vol. 25, no. 1, pp. 9–18, 2024.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
A. Abd-Alrazaq, R. AlSaad, D. Alhuwail, A. Ahmed, P. M. Healy, S. Latifi, S. Aziz, R. Damseh, S. A. Alrazak, J. Sheikh et al., “Large language models in medical education: opportunities, challenges, and future directions,” JMIR Medical Education, vol. 9, no. 1, p. e48291, 2023.
Z. Elyoseph, D. Hadar-Shoval, K. Asraf, and M. Lvovsky, “Chatgpt outperforms humans in emotional awareness evaluations,” Frontiers in Psychology, vol. 14, p. 1199058, 2023.
H. Prendinger, J. Mori, and M. Ishizuka, “Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game,” International journal of human-computer studies, vol. 62, no. 2, pp. 231–245, 2005.
M. Arjmand, F. Nouraei, I. Steenstra, and T. Bickmore, “Empathic grounding: Explorations using multimodal interaction and large language models with conversational agents,” in Proceedings of the ACM International Conference on Intelligent Virtual Agents, 2024, pp. 1–10.
D. Bill and T. Eriksson, “Fine-tuning a llm using reinforcement learning from human feedback for a therapy chatbot application,” 2023.
Y. Chen, X. Xing, J. Lin, Z. Wang, Q. Liu, X. Xu et al., “Soulchat: Improving llms’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations,” in The 2023 Conference on Empirical Methods in Natural Language Processing.
C. Hutto and E. Gilbert, “Vader: A parsimonious rule-based model for sentiment analysis of social media text,” in Proceedings of the international AAAI conference on web and social media, vol. 8, no. 1, 2014, pp. 216–225.
C. Bartneck, E. Croft, and D. Kulic, “Measuring the anthropomorphism, animacy, likeability, perceived intelligence and perceived safety of robots,” 2008.
J. Weizenbaum, “Eliza—a computer program for the study of natural language communication between man and machine,” Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966.
A. Talyshinskii, N. Naik, B. Z. Hameed, P. Juliebø-Jones, and B. K. Somani, “Potential of ai-driven chatbots in urology: revolutionizing patient care through artificial intelligence,” Current Urology Reports, vol. 25, no. 1, pp. 9–18, 2024.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
A. Abd-Alrazaq, R. AlSaad, D. Alhuwail, A. Ahmed, P. M. Healy, S. Latifi, S. Aziz, R. Damseh, S. A. Alrazak, J. Sheikh et al., “Large language models in medical education: opportunities, challenges, and future directions,” JMIR Medical Education, vol. 9, no. 1, p. e48291, 2023.
Z. Elyoseph, D. Hadar-Shoval, K. Asraf, and M. Lvovsky, “Chatgpt outperforms humans in emotional awareness evaluations,” Frontiers in Psychology, vol. 14, p. 1199058, 2023.
H. Prendinger, J. Mori, and M. Ishizuka, “Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game,” International journal of human-computer studies, vol. 62, no. 2, pp. 231–245, 2005.
M. Arjmand, F. Nouraei, I. Steenstra, and T. Bickmore, “Empathic grounding: Explorations using multimodal interaction and large language models with conversational agents,” in Proceedings of the ACM International Conference on Intelligent Virtual Agents, 2024, pp. 1–10.
D. Bill and T. Eriksson, “Fine-tuning a llm using reinforcement learning from human feedback for a therapy chatbot application,” 2023.
Y. Chen, X. Xing, J. Lin, Z. Wang, Q. Liu, X. Xu et al., “Soulchat: Improving llms’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations,” in The 2023 Conference on Empirical Methods in Natural Language Processing.
C. Hutto and E. Gilbert, “Vader: A parsimonious rule-based model for sentiment analysis of social media text,” in Proceedings of the international AAAI conference on web and social media, vol. 8, no. 1, 2014, pp. 216–225.
C. Bartneck, E. Croft, and D. Kulic, “Measuring the anthropomorphism, animacy, likeability, perceived intelligence and perceived safety of robots,” 2008.
Publicado
30/09/2025
Como Citar
KNOB, Paulo; SCHOLLER, Leonardo; RIGATTI, Juliano; MUSSE, Soraia.
Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue. In: WORKSHOP DE TRABALHOS EM ANDAMENTO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 38. , 2025, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 79-84.
