A Reproducible Semantic Benchmark for Multivendor DSM-to-CLI Translation
Resumo
Translating high-level network intents into correct multivendor configurations remains challenging, as syntactically valid outputs may diverge from intended behavior. This paper presents a DSM-to-CLI semantic benchmark with multiple LLM translators, vendors, use cases, and repeated runs, using fixed judges and a failure taxonomy. Results show that semantic quality and reliability should be evaluated separately, vendor effects dominate use-case variation, and outcome dispersion correlates with vote instability. Huawei VRP scenarios expose vendor-specific failures not captured by aggregate metrics. Overall, multivendor benchmarking supports comparison of LLM-based configuration systems, while highlighting the need for complementary validation.Referências
Aykurt, K., Blenk, A., and Kellerer, W. (2024). NetLLMBench: A benchmark framework for large language models in network configuration tasks. In IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN). IEEE.
Boateng, G. O., Sami, H., Alagha, A., Elmekki, H., Hammoud, A., Mizouni, R., Mourad, A., Otrok, H., Bentahar, J., Muhaidat, S., et al. (2025). A survey on large language models for communication, network, and service management: Application insights, challenges, and future directions. IEEE Communications Surveys & Tutorials.
Hong, J., Tu, N. V., and Hong, J. W.-K. (2025). A comprehensive survey on LLM-based network management and operations. International Journal of Network Management, 35(6):e70029.
Liu, C., Xie, X., Zhang, X., and Cui, Y. (2024). Large language models for networking: Workflow, advances, and challenges. IEEE Network, 39(5):165–172.
Long, S., Tan, J., Mao, B., Tang, F., Li, Y., Zhao, M., and Kato, N. (2025). A survey on intelligent network operations and performance optimization based on large language models. IEEE Communications Surveys & Tutorials, 27(6):3915–3949.
Mendoza, J. R. and Ocampo, R. (2025). PeeringLLM-Bench: Evaluating LLMs for BGP configuration tasks. In Proceedings of the 20th Asian Internet Engineering Conference.
Menezes, J., Bitzki, L., and Kreutz, D. (2025). Net2d-LLM: Translating Structured Network Intents into CLI using LLMs with Execution in a Network Digital Twin. In Anais da XXII ERRC. SBC.
Menezes, J., Bitzki, L., Kreutz, D., Almeida, G., Pohlmann, M., and Mansilha, R. (2026). dsm2cli: An observable pipeline for translating network intents into multivendor CLI with independent semantic assessment. In Anais do Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC). SBC.
Raptis, N., Adhane, G., Fonseca, J. P., Ramantas, K., and Verikoukis, C. (2025). ARGVI: Adaptive routing, generation, and validation of intents for intent-driven management. In 2025 IEEE Conference on Network Function Virtualization and Software-Defined Networking (NFV-SDN), pages 1–6.
Tageldien, M., Selim, B., and Sboui, L. (2025). Large language models in intent-based networking: A comprehensive survey across the intent lifecycle. In ITC-Egypt, pages 810–817. IEEE.
Wang, C., Scazzariello, M., Farshin, A., Ferlin, S., Kostić, D., and Chiesa, M. (2024). NetConfEval: Can LLMs facilitate network configuration? Proceedings of the ACM on Networking, 2(CoNEXT2):1–25.
Wang, J., He, B., Zhao, J., Xuan, Y., Sun, H., Qi, Q., Liang, J., Zhuang, Z., and Liao, J. (2026). LLM-powered intent-driven configuration generation for multi-vendor networks. IEEE Transactions on Network and Service Management, PP:1–1.
Wei, Y., Xie, X., Hu, T., Zuo, Y., Chen, X., Chi, K., and Cui, Y. (2025). INTA: Intent-Based Translation for Network Configuration with LLM Agents. In 2025 IEEE 33rd International Conference on Network Protocols (ICNP), pages 1–16, Seoul, Korea, Republic of. IEEE.
Boateng, G. O., Sami, H., Alagha, A., Elmekki, H., Hammoud, A., Mizouni, R., Mourad, A., Otrok, H., Bentahar, J., Muhaidat, S., et al. (2025). A survey on large language models for communication, network, and service management: Application insights, challenges, and future directions. IEEE Communications Surveys & Tutorials.
Hong, J., Tu, N. V., and Hong, J. W.-K. (2025). A comprehensive survey on LLM-based network management and operations. International Journal of Network Management, 35(6):e70029.
Liu, C., Xie, X., Zhang, X., and Cui, Y. (2024). Large language models for networking: Workflow, advances, and challenges. IEEE Network, 39(5):165–172.
Long, S., Tan, J., Mao, B., Tang, F., Li, Y., Zhao, M., and Kato, N. (2025). A survey on intelligent network operations and performance optimization based on large language models. IEEE Communications Surveys & Tutorials, 27(6):3915–3949.
Mendoza, J. R. and Ocampo, R. (2025). PeeringLLM-Bench: Evaluating LLMs for BGP configuration tasks. In Proceedings of the 20th Asian Internet Engineering Conference.
Menezes, J., Bitzki, L., and Kreutz, D. (2025). Net2d-LLM: Translating Structured Network Intents into CLI using LLMs with Execution in a Network Digital Twin. In Anais da XXII ERRC. SBC.
Menezes, J., Bitzki, L., Kreutz, D., Almeida, G., Pohlmann, M., and Mansilha, R. (2026). dsm2cli: An observable pipeline for translating network intents into multivendor CLI with independent semantic assessment. In Anais do Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC). SBC.
Raptis, N., Adhane, G., Fonseca, J. P., Ramantas, K., and Verikoukis, C. (2025). ARGVI: Adaptive routing, generation, and validation of intents for intent-driven management. In 2025 IEEE Conference on Network Function Virtualization and Software-Defined Networking (NFV-SDN), pages 1–6.
Tageldien, M., Selim, B., and Sboui, L. (2025). Large language models in intent-based networking: A comprehensive survey across the intent lifecycle. In ITC-Egypt, pages 810–817. IEEE.
Wang, C., Scazzariello, M., Farshin, A., Ferlin, S., Kostić, D., and Chiesa, M. (2024). NetConfEval: Can LLMs facilitate network configuration? Proceedings of the ACM on Networking, 2(CoNEXT2):1–25.
Wang, J., He, B., Zhao, J., Xuan, Y., Sun, H., Qi, Q., Liang, J., Zhuang, Z., and Liao, J. (2026). LLM-powered intent-driven configuration generation for multi-vendor networks. IEEE Transactions on Network and Service Management, PP:1–1.
Wei, Y., Xie, X., Hu, T., Zuo, Y., Chen, X., Chi, K., and Cui, Y. (2025). INTA: Intent-Based Translation for Network Configuration with LLM Agents. In 2025 IEEE 33rd International Conference on Network Protocols (ICNP), pages 1–16, Seoul, Korea, Republic of. IEEE.
Publicado
25/05/2026
Como Citar
MENEZES, Jerônimo; BITZKI, Leonardo; KREUTZ, Diego; ALMEIDA, Gefte; POHLMANN, Marcio; MANSILHA, Rodrigo.
A Reproducible Semantic Benchmark for Multivendor DSM-to-CLI Translation. In: WORKSHOP DE INTELIGÊNCIA ARTIFICIAL PARA REDES DE COMPUTADORES (WIARC), 1. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 1-14.
DOI: https://doi.org/10.5753/wiarc.2026.23707.
