Multi-Agent LLM Approach for Moderating E-Commerce Customer Service Responses

  • Tiago Gomes UNICAMP
  • André Gomes Regino Centro de Tecnologia da Informação Renato Archer
  • Rodrigo Caus UNICAMP
  • Victor Sotelo UNICAMP
  • Julio Cesar dos Reis UNICAMP

Resumo


Language model (LLM)-based solutions have been widely adopted in automated customer service systems, particularly on e-commerce platforms. However, such solutions still face challenges related to the accuracy, contextualization, and reliability of the generated responses. This study proposes an LLM-based multi-agent architecture for the automatic moderation of textual responses. The architecture is composed of specialized agents operating in an iterative review workflow, which includes semantic and contextual evaluation, improvement recommendations, textual rewriting, and final decision-making. The agents share a common context and operate in a coordinated manner to identify deficiencies, propose corrections, and validate the quality of the responses. The proposed approach was evaluated using real-world data from a multilingual e-commerce platform, leveraging two models from the Qwen3 family (32B and 30B-A3B). The results indicate that the approach is effective in enhancing response quality, achieving average gains of more than two points on an evaluation scale and enabling the correction of over 60% of initially inadequate responses. Additionally, the solution offers advantages in terms of auditability, modularity, and potential adaptability to different domains.

Palavras-chave: Arquitetura multiagente LLM, Moderação Automática de Conteudo Textual, Comércio Eletrônico, Grandes Modelos de Linguagem

Referências

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. 2023. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. Retrieved August 7, 2025 from [link]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. Retrieved August 7, 2025 from [link]

Macaio Cacabro, Wellington Franco, José Monteiro, and Javam Machado. 2023. IARA - An Architectural Model to Assist the Development of Advising Bots for Misinformation Detection. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web (Ribeirão Preto/SP). SBC, Porto Alegre, RS, Brasil, 168–176. [link]

Raíssa Carvalho and Humberto Marques-Neto. 2024. Crianças e Propagandas no TikTok: identificando publicidade infantil na rede social TikTok. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web (Juiz de Fora/MG). SBC, Porto Alegre, RS, Brasil, 98–105. DOI: 10.5753/webmedia.2024.242912

Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, and Mohit Bansal. 2024. MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning. Retrieved August 7, 2025 from [link]

Chen Gao, Xiaochong Lan, Nian Li, Yuan Ding, Jingtao Zhou, Zhilun Xu, Fengli Li, and Yong Li. 2024. Large Language Models Empowered Agent-Based Modeling and Simulation: A Survey and Perspectives. Humanities and Social Sciences Communications 11, 1 (Dec. 2024), 1–24. DOI: 10.1057/s41599-024-03611-3

Catalina Gomez, Junjie Yin, Chien-Ming Huang, and Mathias Unberath. 2024. How large language model-powered conversational agents influence decision making in domestic medical triage contexts. Frontiers in Computer Science 6 (18 Oct. 2024), 1427463. DOI: 10.3389/fcomp.2024.1427463

Google and Jigsaw. 2017. Using machine learning for better online conversations (Perspective API announcement). Retrieved August 7, 2025 from [link]

Robert Gorwa, Reuben Binns, and Christian Katzenbach. 2020. Algorithmic Content Moderation: Technical and Political Challenges in the Automation of Platform Governance. Big Data & Society 7, 1 (2020). DOI: 10.1177/2053951719897945

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. 2023. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. Retrieved August 7, 2025 from [link]

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. Retrieved August 7, 2025 from [link]

Aaron Halfaker and R. Stuart Geiger. 2019. ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia. Retrieved August 7, 2025 from [link]

Zhipeng Hou, Junyi Tang, and Yipeng Wang. 2025. HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems. Retrieved August 7, 2025 from [link]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2023. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. Retrieved August 7, 2025 from [link]

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. Retrieved August 7, 2025 from [link]

Satyadhar Joshi. 2025. A Comprehensive Survey of AI Agent Frameworks and Their Applications in Financial Services. Retrieved August 7, 2025 from [link]

LangChain. 2025. LangGraph — stateful orchestration framework for agent workflows. Retrieved August 7, 2025 from [link]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K"uttler, Mike Lewis, Wen tau Yih, Tim Rockt"aschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Retrieved August 7, 2025 from [link]

Feng-Lin Li, Minghui Qiu, Haiqing Chen, Xiongwei Wang, Xing Gao, Jun Huang, Juwei Ren, Zhongzhou Zhao, Weipeng Zhao, Lei Wang, Guwei Jin, and Wei Chu. 2018. AliMe Assist: An Intelligent Assistant for Creating an Innovative E-commerce Experience. Retrieved August 7, 2025 from [link]

Yi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen, Hung yi Lee, and Yun-Nung Chen. 2025. Creativity in LLMbased Multi-Agent Systems: A Survey. Retrieved August 7, 2025 from [link]

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. 2023. A Comprehensive Overview of Large Language Models. Retrieved August 7, 2025 from [link]

OpenAI. 2023. Using GPT-4 for content moderation. Retrieved August 7, 2025 from [link]

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. Retrieved August 7, 2025 from [link]

Daniela S. M. Pereira, Filipe Falcão, Lilian Costa, Brian S. Lunn, José Miguel Pêgo, and Patrício Costa. 2023. Here’s to the future: Conversational agents in higher education- a scoping review. International Journal of Educational Research 122 (2023), 102233. DOI: 10.1016/j.ijer.2023.102233

Giovana Piorino, Vitor Moreira, Luiz Lima, Adriana Pagano, and Ana Silva. 2024. Análise de sentimentos de conteúdo compartilhado em comunidades brasileiras do Reddit: Avaliação de um conjunto de dados rotulados por humanos. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web (Juiz de Fora/MG). SBC, Porto Alegre, RS, Brasil, 54–62. DOI: 10.5753/webmedia.2024.242020

Sarah T. Roberts. 2019. Behind the Screen: Content Moderation in the Shadows of Social Media. Yale University Press, New Haven, CT. Retrieved August 7, 2025 from [link]

Paul Röttger, Bertie Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, and Janet Pierrehumbert. 2021. HateCheck: Functional Tests for Hate Speech Detection Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 41–58. DOI: 10.18653/v1/2021.acl-long.4

Sofia Schöbel, Anuschka Schmitt, Dennis Benner, Mohammed Saqr, Andreas Janson, and Jan Marco Leimeister. 2024. Charting the Evolution and Future of Conversational Agents: A Research Agenda Along FiveWaves and New Frontiers. Information Systems Frontiers 26, 2 (2024), 729–754. DOI: 10.1007/s10796-023-10375-9

Artem Semenko. 2024. Generative AI in Ecommerce: 13 Use Cases You Should Consider. Retrieved August 7, 2025 from [link]

SimulTrans Team. 2024. Limitations of Language Models in Other Languages. Retrieved August 7, 2025 from [link]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. ReAct: Synergizing Reasoning and Acting in Language Models. Retrieved August 7, 2025 from [link]

Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, and Xuanjing Huang. 2023. Do Large Language Models Know What They Don’t Know? Retrieved August 7, 2025 from [link]
Publicado
10/11/2025
GOMES, Tiago; REGINO, André Gomes; CAUS, Rodrigo; SOTELO, Victor; REIS, Julio Cesar dos. Multi-Agent LLM Approach for Moderating E-Commerce Customer Service Responses. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 349-357. DOI: https://doi.org/10.5753/webmedia.2025.16162.

Artigos mais lidos do(s) mesmo(s) autor(es)