Automated testing framework to evaluate multi-agent chat assistants

  • Lucas Ramalho INDT
  • Jose Sousa INDT
  • Maria Nascimento INDT
  • Raiza Hanada INDT
  • Cristian Souza INDT
  • Eliane Collins INDT

Resumo


This applied R&D project proposes an automated framework for evaluating multi-agent conversational assistants equipped with retrieval-augmented generation (RAG) capabilities. The solution addresses the high cost and time demands of manual evaluation by introducing a synthetic persona dataset and an automated pipeline that executes large-scale tests on mobile devices. Tests with English and Portuguese personas revealed recurring weaknesses in multi-agent systems, particularly in Portuguese interactions, highlighting the importance of multilingual evaluation. The project is a collaboration between INDT and Motorola Mobility and aims to provide a systematic methodology and testing infrastructure for industry conversational systems.

Referências

Li, Y., Wen, H., Wang, W., Li, X., Yuan, Y., Liu, G., Liu, J., Xu, W., Wang, X., Sun, Y., Kong, R., Wang, Y., Geng, H., Luan, J., Jin, X., Ye, Z.-L., Xiong, G., Zhang, F., Li, X., Xu, M., Li, Z., Li, P., Liu, Y., Zhang, Y., and Liu, Y. (2024). Personal llm agents: Insights and survey about the capability, efficiency and security. ArXiv, abs/2401.05459.

Schick, T., Dwivedi-Yu, J., Dessı̀, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., and Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools.

Yukhymenko, H., Staab, R., Vero, M., and Vechev, M. (2025). A synthetic dataset for personal attribute inference. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY, USA. Curran Associates Inc.
Publicado
11/05/2026
RAMALHO, Lucas; SOUSA, Jose; NASCIMENTO, Maria; HANADA, Raiza; SOUZA, Cristian; COLLINS, Eliane. Automated testing framework to evaluate multi-agent chat assistants. In: CONGRESSO IBERO-AMERICANO EM ENGENHARIA DE SOFTWARE (CIBSE), 29. , 2026, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 400-403.