LLM-Driven Reward Design for Intelligent RAN Slice Scheduling in Open RAN

Pedro Sousa; Frank B. Morte; Andrey Oliveira; Cleverson Nahum; Silvia Lins; Andrey Silva; Aldebaro Klautau

doi:10.5753/wiarc.2026.22975

Pedro Sousa UFPA
Frank B. Morte UFPA
Andrey Oliveira UFPA
Cleverson Nahum UFPA
Silvia Lins Ericsson Telecomunicações S.A.
Andrey Silva Ericsson Telecomunicações S.A.
Aldebaro Klautau UFPA

DOI: https://doi.org/10.5753/wiarc.2026.22975

Resumo

Radio resource scheduling is a fundamental task in network slicing scenarios, where different services impose heterogeneous performance requirements. This paper proposes the use of large language models (LLMs) to automatically generate reward functions for Reinforcement Learning-based radio resource schedulers in Open Radio Acess Network (RAN) environments. By adapting the reward formulation according to the characteristics and requirements of two use cases, the approach increases the flexibility of the scheduling policy. Simulation results demonstrate the feasibility of the method and indicate that LLM-assisted reward design can support efficient resource allocation with stable throughput while satisfying slice-level requirements.

Referências

3GPP (2024). Management and orchestration; 5G Network Resource Model (NRM); Stage 2 and stage 3. Technical specification (TS) 28.541, 3rd Generation Partnership Project (3GPP). Version 17.15.0.

Afolabi, I. et al. (2018). Network slicing and softwarization: A survey on principles, enabling technologies, and solutions. IEEE Communications Surveys & Tutorials, 20(3):2429–2453.

Calabrese, F. D. et al. (2018). Learning radio resource management in RANs: Framework, opportunities, and challenges. IEEE Communications Magazine, 56(9):138–145.

Nahum, C. V., D’Oro, S., Batista, P., Both, C. B., Cardoso, K. V., Klautau, A., and Melodia, T. (2026). Intent-based radio scheduler for ran slicing: Learning to deal with different network scenarios. IEEE Transactions on Mobile Computing, 25(3):3229–3246.

Nahum, C. V., Lopes, V. H., Dreifuerst, R. M., Batista, P., Correa, I., Cardoso, K. V., Klautau, A., and Heath, R. W. (2023). Intent-aware radio resource scheduling in a RAN slicing scenario using reinforcement learning. IEEE Transactions on Wireless Communications, pages 1–1.

Oliveira, A. A. M. d., Albuquerque, J. P., Nahum, C. V., Campos, D., Cardoso, K. V., Klautau, A., and Rezende, J. F. d. (2025). Enabling NS-3 simulations integrated with latest versions of open RAN near-RT RICs. In Anais do XLIII Simpósio Brasileiro de Telecomunicações e Processamento de Sinais. Sociedade Brasileira de Telecomunicações.

Polese, M. et al. (2022). ColO-RAN: Developing machine learning-based xApps for open RAN closed-loop control on programmable experimental platforms. IEEE Transactions on Mobile Computing.

Quy, V. K. et al. (2023). Innovative trends in the 6G era: A comprehensive survey of architecture, applications, technologies, and challenges. IEEE Access, 11:39824–39844.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.

Raftopoulos, R., D’Oro, S., Melodia, T., and Schembra, G. (2024). DRL-based latency-aware network slicing in O-RAN with time-varying SLAs. arXiv preprint arXiv:2401.05042.

Schulman, J. et al. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Yu, W., Gileadi, N., Fu, C., Kirmani, S., Lee, K.-H., Arenas, M. G., Chiang, H.-T. L., Erez, T., Hasenclever, L., Humplik, J., et al. (2023). Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647.

Zangooei, M. et al. (2023). Flexible ran slicing in open ran with constrained multi-agent reinforcement learning. IEEE Journal on Selected Areas in Communications, 42(2):280–294.

Zhu, Q. et al. (2019). 3GPP TR 38.901 Channel Model. Wiley 5G Ref: The Essential 5G Reference Online, pages 1–35.