Evaluating Domain-Specialized LLMs in Multi-Agent RAG for Enterprise Retrieval
Abstract
This paper evaluates a multi-agent architecture for enterprise knowledge retrieval where a semantic router directs queries to specialized LLM agents covering thematic domains (e.g., legal, regulatory). We benchmarked prominent models, including GPT-4, LLaMA 4, and Gemini, on metrics like answer relevance, faithfulness, and execution time. Our results demonstrate that this specialized approach achieves superior precision and lower latency compared to centralized configurations. We identify GPT-4o, LLaMA 4, and Gemini-2.5-Flash as offering the best balance of accuracy and efficiency. These findings provide a practical guide for designing scalable, high-fidelity retrieval systems for regulated, multi-domain environments.
References
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A survey on evaluation of large language models. ACM Computing Surveys, 15(3):1–45.
da Costa, L. and e Souza Filho, J. O. (2024). Adapting llms to new domains: A comparative study of fine-tuning and rag strategies for portuguese qa tasks. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 267–277, Porto Alegre, RS, Brasil. SBC.
de Albuquerque, A. M., Wensing, I. M., Joppi Filho, N. L., and Dorneles, C. (2024). Avaliação de aplicações de geração aumentada de recuperação por meio de feedback implícito. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 253–259. SBC.
Dulay, H. (2024). Event-driven agent mesh. Medium. [link].
Enterprise, H. P. (2024). Llm agentic tool mesh: Harnessing agent services and multi-agent ai for next-level gen ai. [link]. Accessed: 2025-06-22.
Es, S., James, J., Anke, L. E., and Schockaert, S. (2024). Ragas: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 150–158.
Han, X., Wang, W., Cao, Y., Zhang, Y., Hu, Z., Jiang, J., Yao, Q., Lin, Y., Liu, Z., and Sun, M. (2024). Llm multi-agent systems: Challenges and open problems. arXiv preprint arXiv:2402.03578.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. (2020a). Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv preprint arXiv:2005.11401.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D. (2020b). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA. Curran Associates Inc.
Medeiros, G. H., Souza, T. F., Bezerra, K. C., and Santana, E. C. (2023). Using retrieval-augmented generation to improve performance of large language models on the brazilian university admission exam. In Anais do Simpósio Brasileiro de Banco de Dados (SBBD). SBC.
Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior.
Puschmann, J., Dietz, L., Behnke, S., and Wehrle, K. (2022). Multi-agent document indexing with topic-specific retrieval pipelines. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1349–1359.
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. (2023). Reflexion: language agents with verbal reinforcement learning. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY, USA. Curran Associates Inc.
Siqueira, C., Fonseca, O., Ferreira, G., and Leiva, O. (2024). Leveraging structured data input for effective chatbot integration in enterprises. In Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology, pages 1–5, Porto Alegre, RS, Brasil. SBC.
Team, L. A. (2025). Agentmesh: Unfolding the communication of multiple ai agents. [link]. Accessed: 2025-06-22.
Wooldridge, M. (2009). An introduction to multiagent systems. John wiley & sons.
