A Reproducible Semantic Benchmark for Multivendor DSM-to-CLI Translation

Resumo


Translating high-level network intents into correct multivendor configurations remains challenging, as syntactically valid outputs may diverge from intended behavior. This paper presents a DSM-to-CLI semantic benchmark with multiple LLM translators, vendors, use cases, and repeated runs, using fixed judges and a failure taxonomy. Results show that semantic quality and reliability should be evaluated separately, vendor effects dominate use-case variation, and outcome dispersion correlates with vote instability. Huawei VRP scenarios expose vendor-specific failures not captured by aggregate metrics. Overall, multivendor benchmarking supports comparison of LLM-based configuration systems, while highlighting the need for complementary validation.

Referências

Aykurt, K., Blenk, A., and Kellerer, W. (2024). NetLLMBench: A benchmark framework for large language models in network configuration tasks. In IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN). IEEE.

Boateng, G. O., Sami, H., Alagha, A., Elmekki, H., Hammoud, A., Mizouni, R., Mourad, A., Otrok, H., Bentahar, J., Muhaidat, S., et al. (2025). A survey on large language models for communication, network, and service management: Application insights, challenges, and future directions. IEEE Communications Surveys & Tutorials.

Hong, J., Tu, N. V., and Hong, J. W.-K. (2025). A comprehensive survey on LLM-based network management and operations. International Journal of Network Management, 35(6):e70029.

Liu, C., Xie, X., Zhang, X., and Cui, Y. (2024). Large language models for networking: Workflow, advances, and challenges. IEEE Network, 39(5):165–172.

Long, S., Tan, J., Mao, B., Tang, F., Li, Y., Zhao, M., and Kato, N. (2025). A survey on intelligent network operations and performance optimization based on large language models. IEEE Communications Surveys & Tutorials, 27(6):3915–3949.

Mendoza, J. R. and Ocampo, R. (2025). PeeringLLM-Bench: Evaluating LLMs for BGP configuration tasks. In Proceedings of the 20th Asian Internet Engineering Conference.

Menezes, J., Bitzki, L., and Kreutz, D. (2025). Net2d-LLM: Translating Structured Network Intents into CLI using LLMs with Execution in a Network Digital Twin. In Anais da XXII ERRC. SBC.

Menezes, J., Bitzki, L., Kreutz, D., Almeida, G., Pohlmann, M., and Mansilha, R. (2026). dsm2cli: An observable pipeline for translating network intents into multivendor CLI with independent semantic assessment. In Anais do Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC). SBC.

Raptis, N., Adhane, G., Fonseca, J. P., Ramantas, K., and Verikoukis, C. (2025). ARGVI: Adaptive routing, generation, and validation of intents for intent-driven management. In 2025 IEEE Conference on Network Function Virtualization and Software-Defined Networking (NFV-SDN), pages 1–6.

Tageldien, M., Selim, B., and Sboui, L. (2025). Large language models in intent-based networking: A comprehensive survey across the intent lifecycle. In ITC-Egypt, pages 810–817. IEEE.

Wang, C., Scazzariello, M., Farshin, A., Ferlin, S., Kostić, D., and Chiesa, M. (2024). NetConfEval: Can LLMs facilitate network configuration? Proceedings of the ACM on Networking, 2(CoNEXT2):1–25.

Wang, J., He, B., Zhao, J., Xuan, Y., Sun, H., Qi, Q., Liang, J., Zhuang, Z., and Liao, J. (2026). LLM-powered intent-driven configuration generation for multi-vendor networks. IEEE Transactions on Network and Service Management, PP:1–1.

Wei, Y., Xie, X., Hu, T., Zuo, Y., Chen, X., Chi, K., and Cui, Y. (2025). INTA: Intent-Based Translation for Network Configuration with LLM Agents. In 2025 IEEE 33rd International Conference on Network Protocols (ICNP), pages 1–16, Seoul, Korea, Republic of. IEEE.
Publicado
25/05/2026
MENEZES, Jerônimo; BITZKI, Leonardo; KREUTZ, Diego; ALMEIDA, Gefte; POHLMANN, Marcio; MANSILHA, Rodrigo. A Reproducible Semantic Benchmark for Multivendor DSM-to-CLI Translation. In: WORKSHOP DE INTELIGÊNCIA ARTIFICIAL PARA REDES DE COMPUTADORES (WIARC), 1. , 2026, Praia do Forte/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 1-14. DOI: https://doi.org/10.5753/wiarc.2026.23707.