On the Limits of Automated Root Cause Analysis in Network Virtualization Scenarios using Language Models

Ana Beatriz L. Romero; Pedro R. X. do Carmo; Assis T. Oliveira Filho; Judith Kelner; Djamel Sadok

doi:10.5753/sbrc.2026.19300

Ana Beatriz L. Romero UFPE / UNICAP http://orcid.org/0009-0004-5553-3688
Pedro R. X. do Carmo UFPE / UNICAP http://orcid.org/0000-0002-7952-3239
Assis T. Oliveira Filho UFPE / UNICAP https://orcid.org/0000-0001-9873-6929
Judith Kelner UFPE http://orcid.org/0000-0002-2673-5887
Djamel Sadok UFPE https://orcid.org/0000-0001-5378-4732

DOI: https://doi.org/10.5753/sbrc.2026.19300

Resumo

Root Cause Analysis (RCA) in networked and virtualized infrastructures is a complex task due to the volume of low-level metrics and the ambiguity of observable symptoms. Although Large Language Models (LLMs) have recently been explored for automated diagnosis, their effectiveness in realistic network scenarios remains unclear. This paper investigates the use of small-scale LLMs for network fault diagnosis through a systematic experimental study. We introduce the NetPerf-RCA Benchmark, composed of 24 representative network and virtualization scenarios, and evaluate multiple diagnostic approaches. Our results show that diagnostic effectiveness is primarily constrained by scenario characteristics and system observability.

Referências

Akhtar, S., Khan, S., and Parkinson, S. (2025). Llm-based event log analysis techniques: A survey. arXiv preprint arXiv:2502.00677.

Daniel Han, M. H. and team, U. (2023). Unsloth.

Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. (2023). Qlora: Efficient finetuning of quantized llms. In Advances in Neural Information Processing Systems, volume 36, pages 10088–10115. Curran Associates, Inc.

Guan, W., Cao, J., Qian, S., Gao, J., and Ouyang, C. (2024). Logllm: Log-based anomaly detection using large language models. arXiv preprint arXiv:2411.08561.

Han, Y., Du, Q., Huang, Y., Li, P., Shi, X., Wu, J., Fang, P., Tian, F., and He, C. (2024). Holistic root cause analysis for failures in cloud-native systems through observability data. IEEE Transactions on Services Computing, 17(6):3789–3802.

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., and Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst., 43(2).

Kan, K. B., Mun, H., Cao, G., and Lee, Y. (2024). Mobile-llama: Instruction fine-tuning open-source llm for network analysis in 5g networks. IEEE Network, 38(5):76–83.

Kosińska, J., Baliś, B., et al. (2023). Toward the observability of cloud-native applications: The overview of the state-of-the-art. IEEE Access, 11:73036–73052.

Lakhina, A., Crovella, M., and Diot, C. (2004). Diagnosing network-wide traffic anomalies. ACM SIGCOMM computer communication review, 34(4):219–230.

Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474.

Pham, L., Zhang, H., et al. (2025). Rcaeval: A benchmark for root cause analysis of microservice systems with telemetry data. In Companion Proceedings of the ACM on Web Conference 2025, New York, NY, USA. Association for Computing Machinery.

Pingua, B., Sahoo, A., Kandpal, M., Murmu, D., et al. (2025). Medical llms: Fine-tuning vs. retrieval-augmented generation. Bioengineering, 12(7).

Qiu, S., Wang, M., Afsharmazayejani, R., Shahmiri, M. M., Tan, B., and Pearce, H. (2025). Towards llm-based root cause analysis of hardware design failures.

Tan, Y., Wang, J., and Liu, J. (2024). Zoom-inrcl: Root cause localization at virtualized infrastructure layer for b5g/6g network slicing. In 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), pages 1–5.

Usman, Y., Oladipupo, H., During, A. D., Akl, R., and Chataut, R. (2025). Ai, ml, and llm integration in 5g/6g networks: A comprehensive survey of architectures, challenges, and future directions. IEEE Access, 13:168914–168950.

Zheng, L., Chen, Z., Wang, D., Deng, C., Matsuoka, R., and Chen, H. (2025). Lemmarca: A large multi-modal multi-domain dataset for root cause analysis.

On the Limits of Automated Root Cause Analysis in Network Virtualization Scenarios using Language Models

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)