Enhancing LLM Agent Effectiveness via Reflective Multi-Agent System

Aissa Hadj Mohamed; Frances A. Santos; Julio Cesar dos Reis

doi:10.5753/wesaac.2025.37536

Aissa Hadj Mohamed UNICAMP
Frances A. Santos UNICAMP
Julio Cesar dos Reis UNICAMP

DOI: https://doi.org/10.5753/wesaac.2025.37536

Resumo

In the last couple of years, we have observed the rapid development of agent systems, which are incorporating Large Language Models (LLMs) as their core components to perform tasks such as content generation, task planning, and conversational actions. Reflection memory, a key component of agent systems, enables LLM agents to improve their results. This study presents a novel reflective multi-agent system designed to enhance the effectiveness of LLM agents. The solution utilizes N independent agents to generate diverse responses to user prompts (questions), which are then aggregated and analyzed by a decisionmaking agent to produce a final answer. The reflection mechanism is triggered by user feedback, enabling self-critique agents and accumulating diverse error patterns in their respective memories. Our experimental evaluation demonstrates that our approach outperforms individual agents on the ARC Challenge dataset. Our results reveal 56.85% for our solution, compared to an average of 54.83% for single agents with reflection memory, using the small, distilled model DeepSeek with 1.5 billion parameters. This research highlights the effectiveness of a reflective multi-agent system in enhancing the overall results of LLM agents when performing tasks.

Referências

Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., and Ahmed, N. K. (2024). Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770.

Guo, J., Wang, M., Yin, H., Song, B., Chi, Y., Yu, F. R., and Yuen, C. (2024). Large language models and artificial intelligence generated content technologies meet communication networks. arXiv preprint arXiv:2411.06193.

Kagaya, T., Yuan, T. J., Lou, Y., Karlekar, J., Pranata, S., Kinose, A., Oguri, K., Wick, F., and You, Y. (2024). Rap: Retrieval-augmented planning with contextual memory for multimodal llm agents. arXiv preprint arXiv:2402.03610.

Li, Y., Du, M., Song, R., Wang, X., and Wang, Y. (2024a). A survey on fairness in large language models. arXiv preprint arXiv:2308.10149.

Li, Y., Yang, C., and Ettinger, A. (2024b). When hindsight is not 20/20: Testing limits on reflective thinking in large language models. arXiv preprint arXiv:2404.09129.

Liang, X., Tao, M., Xia, Y., Shi, T., Wang, J., and Yang, J. (2024). Self-evolving agents with reflective and memory-augmented abilities. arXiv preprint arXiv:2409.00872.

Lippmann, P., Spaan, M. T. J., and Yang, J. (2024). Positive experience reflection for agents in interactive text environments. arXiv preprint arXiv:2411.02223.

Lu, J., An, S., Lin, M., Pergola, G., He, Y., Yin, D., Sun, X., and Wu, Y. (2023). Memochat: Tuning llms to use memos for consistent long-range open-domain conversation. arXiv preprint arXiv:2308.08239.

OpenAI (2024). Chatgpt: Conversational ai model. [Online]. Available from: [link].

Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior.

Renze, M. and Guven, E. (2024). Self-reflection in llm agents: Effects on problem-solving performance. arXiv preprint arXiv:2405.06682.

Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., and Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366.

Team, G. (2023). Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.

Tworkowski, S., Staniszewski, K., Pacek, M., Wu, Y., Michalewski, H., and Miłoś, P. (2023). Focused transformer: Contrastive training for context scaling. arXiv preprint arXiv:2307.03170.

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X., Wei, Z., and Wen, J. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6).

Wu, Y., Xu, G., and Dongchen, Z. (2024). proposalself-reflection like humans, editable-LLM (e-LLM) is all you need. [Online]. Available from: [link]. under review.

Zhang, Z., Bo, X., Ma, C., Li, R., Chen, X., Dai, Q., Zhu, J., Dong, Z., and Wen, J.-R. (2024). A survey on the memory mechanism of large language model based agents. arXiv preprint arXiv:2404.13501.

Zhao, A., Huang, D., Xu, Q., Lin, M., Liu, Y.-J., and Huang, G. (2024). Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144.