P4eBalancer: Leveraging P4 and eBPF for Optimized Load Balancing with Network and Host Insights
Resumo
Load balancing solutions monitor the infrastructure and redistribute network traffic or application requests in order to adapt to variations in network or server loads. However, these schemes rely on monitoring mechanisms that observe only specific segments of the infrastructure, i.e., network core or end-hosts, which may lead them to make suboptimal decisions. In this paper, we present P4eBalancer, a system that leverages an advanced monitoring mechanism capable of observing the entire distributed infrastructure to make loadbalancing decisions. P4eBalancer incorporates an intelligent offline control loop, operating in the control plane, making global decisions with a reinforcement learning agent, and an online control loop in edge switches, which makes quick decisions and reacts to network variations. We implemented a prototype using the BMv2 P4 software switch. Our results show that P4eBalancer can make better load balancing decisions using end-to-end metrics.
Referências
Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., and Vahdat, A. (2010). Hedera: Dynamic flow scheduling for data center networks. In USENIX NSDI.
Alizadeh, M., Edsall, T., Dharmapurikar, S., Vaidyanathan, R., Chu, K., Fingerhut, A., Lam, V. T., Matus, F., Pan, R., Yadav, N., and Varghese, G. (2014). Conga: Distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM.
Barbette, T., Tang, C., Yao, H., Kostić, D., Jr., G. Q. M., Papadimitratos, P., and Chiesa, M. (2020). A high-speed load-balancer design with guaranteed per-connection consistency. In USENIX NSDI.
Benet, C. H., Kassler, A. J., Benson, T., and Pongracz, G. (2018). MP-HULA: Multipath transport aware load balancing using programmable data planes. In ACM NetCompute.
Benson, T., Akella, A., and Maltz, D. A. (2010). Network traffic characteristics of data centers in the wild. In ACM SIGCOMM.
Bosshart, P., Daly, D., Gibb, G., Izzard, M., McKeown, N., Rexford, J., Schlesinger, C., Talayco, D., Vahdat, A., Varghese, G., et al. (2014). P4: Programming protocol independent packet processors. ACM SIGCOMM CCR.
Casas-Velasco, D. M., Rendon, O. M. C., and da Fonseca, N. L. S. (2022). DRSIR: A deep reinforcement learning approach for routing in software-defined networking. IEEE TNSM.
Coelho, B. L. and Schaeffer-Filho, A. E. (2023). Crossbal: Data and control plane cooperation for efficient and scalable network load balancing. In CNSM.
eBPF (2024). Dynamically program the kernel for efficient networking, observability, tracing, and security. [link]. Accessed in: 09.03.2023.
Eisenbud, D. E., Yi, C., Contavalli, C., Smith, C., Kononov, R., Mann-Hielscher, E., Cilingiroglu, A., Cheyney, B., Shang, W., and Hosein, J. D. (2016). Maglev: A fast and reliable software network load balancer. In USENIX NSDI.
Hsu, K.-F., Beckett, R., Chen, A., Rexford, J., and Walker, D. (2020a). Contra: A programmable system for performance-aware routing. In USENIX NSDI.
Hsu, K.-F., Tammana, P., Beckett, R., Chen, A., Rexford, J., and Walker, D. (2020b). Adaptive weighted traffic splitting in programmable data planes. In SOSR.
Hyun, J., Van Tu, N., and Hong, J. W.-K. (2018). Towards knowledge-defined networking using in-band network telemetry. In IEEE/IFIP NOMS.
Katta, N., Ghag, A., Hira, M., Keslassy, I., Bergman, A., Kim, C., and Rexford, J. (2017). Clove: Congestion-aware load balancing at the virtual edge. In ACM CONEXT.
Katta, N., Hira, M., Kim, C., Sivaraman, A., and Rexford, J. (2016). HULA: Scalable load balancing using programmable data planes. In ACM SOSR.
Kim, C., Sivaraman, A., Katta, N., Bas, A., Dixit, A., and Wobker, L. J. (2015). In-band network telemetry via programmable dataplanes. In ACM SIGCOMM.
Pizzutti, M. and Schaeffer-Filho, A. E. (2019). Adaptive multipath routing based on hybrid data and control plane operation. In IEEE INFOCOM.
Puttlitz, C., Parizotto, R., and Schaeffer-Filho, A. (2024). P4NetIntel: End-to-end network telemetry with eBPF and XDP. In IEEE NFV-SDN.
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N. (2021). Stable-baselines3: Reliable reinforcement learning implementations. JMLR.
Robin, D. D. and Khan, J. I. (2022). CLB: Coarse-grained precision traffic-aware weighted cost multipath load balancing on PISA. IEEE TNSM.
Sutton, R. and Barto, A. (1998). Reinforcement learning: An introduction. IEEE TNN.
Tajbakhsh, H., Parizotto, R., Neves, M., Schaeffer-Filho, A., and Haque, I. (2022). Accelerator-aware in-network load balancing for improved application performance. In IFIP Networking.
Tajbakhsh, H., Parizotto, R., Schaeffer-Filho, A., and Haque, I. (2024). P4Hauler: An accelerator-aware in-network load balancer for applications performance boosting. IEEE TCC.
Tarkoma, S., Rothenberg, C. E., and Lagerspetz, E. (2012). Theory and practice of bloom filters for distributed systems. IEEE COMST.
Ye, J.-L., Chen, C., and Huang Chu, Y. (2018). A weighted ECMP load balancing scheme for data centers using P4 switches. In IEEE CloudNet.
Yen, J. Y. (1971). Finding the k shortest loopless paths in a network. Management Science.
Zhang, J., Yu, F. R., Wang, S., Huang, T., Liu, Z., and Liu, Y. (2018). Load balancing in data center networks: A survey. IEEE COMST.
Zheng, C., Rienecker, B., and Zilberman, N. (2023). QCMP: Load balancing via in-network reinforcement learning. In ACM FIRA.
Zhou, J., Tewari, M., Zhu, M., Kabbani, A., Poutievski, L., Singh, A., and Vahdat, A. (2014). WCMP: Weighted cost multipathing for improved fairness in data centers. In ACM EuroSys.
