Optimizing Network Performance: Benchmarking of NVIDIA Bluefield-2 Offloading Capabilities

  • Arthur Vinícius Camargo Universidade Federal do Rio Grande do Sul (UFRGS)
  • Leandro Bertholdo Universidade Federal do Rio Grande do Sul (UFRGS)
  • Lisandro Granville Universidade Federal do Rio Grande do Sul (UFRGS)

Resumo


The growing demand for higher Internet speeds is placing significant strain on general-purpose processors, particularly in handling network processing tasks. This trend has driven the shift toward the offloading of such tasks to specialized hardware. SmartNICs alleviate CPU load by efficiently managing network processing while providing programmability and customization. However, mastering the deployment and utilization of SmartNICs poses a significant challenge due to the steep learning curve involved. In this study, we test and evaluate the performance of NVIDIA’s Bluefield-2 SmartNIC under a range of conditions, with the objective of achieving 100Gbps throughput on a single server. Our results demonstrate that specific libraries can enhance the processing of common traffic types, such as TCP and UDP, by up to tenfold, thereby enabling the system to reach 100Gbps line rate.

Palavras-chave: SmartNICs, Bluefield-2, Deslocamento de tarefas de rede, Otimização de desempenho da rede

Referências

Moore, G. E. et al. Progress in digital integrated electronics. In: WASHINGTON, DC. ELECTRON devices meeting. 1975. v. 21, p. 11–13.

Dennard, R. H. et al. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of solid-state circuits, IEEE, v. 9, n. 5, p. 256–268, 1974.

Powell, J. R. The Quantum Limit to Moore’s Law. Proceedings of the IEEE, v. 96, n. 8, p. 1247–1248, 2008. DOI: 10.1109/JPROC.2008.925411.

Prati, E. et al. From the Quantum Moore’s Law toward Silicon Based Universal Quantum Computing. In: 2017 IEEE International Conference on Rebooting Computing (ICRC). 2017. P. 1–4. DOI: 10.1109/ICRC.2017.8123662.

Huawei. How do DPUs and Co-Processors for Data Processing Work? Dec. 2023. [link]. (Accessed on 09/19/2024).

Bhalgat, A. Choosing the Best SmartNIC | NVIDIA Technical Blog. Sept. 2021. (Accessed on 09/19/2024). Available from: [link].

Kfoury, E. F. et al. A Comprehensive Survey on Smart-NICs: Architectures, Development Models, Applications, and Research Directions. IEEE Access, v. 12, p. 107297–107336, 2024. DOI: 10.1109/ACCESS.2024.3437203.

Lin, W. et al. SuperNIC: An FPGA-Based, Cloud-Oriented SmartNIC. In: PROCEEDINGS of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2024. P. 130–141.

Liu, J. et al. Performance Characteristics of the BlueField-2 SmartNIC. 2021. [link]. arXiv: 2105.06619 [cs.NI].

King, C. I. Stress next generation. June 2015. [link]. (Accessed on 10/07/2024).

community, T. kernel development. HOWTO for the linux packet generator. [link]. (Accessed on 10/07/2024).

Bosshart, P. et al. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review, ACM New York, NY, USA, v. 44, n. 3, p. 87–95, 2014.

Xing, J. et al. Unleashing SmartNIC Packet Processing Performance in P4. In: PROCEEDINGS of the ACM SIGCOMM 2023 Conference. New York, NY, USA: Association for Computing Machinery, 2023. (ACM SIGCOMM ’23), p. 1028–1042. ISBN 9798400702365. DOI: 10.1145/3603269.3604882.

Luizelli, M. C. et al. SmartNICs: The Next Leap in Networking. Simposio Brasileiro de Redes de Computadores, 2024.

Cui, T. et al. Offloading load balancers onto smartnics. In: PROCEEDINGS of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems. 2021. P. 56–62.

Eran, H. et al. {NICA}: An infrastructure for inline acceleration of network applications. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19). 2019. P. 345–362.

Firestone, D. et al. Azure accelerated networking:{SmartNICs} in the public cloud. In: 15TH USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 2018. P. 51–66.

Li, J. et al. {AlNiCo}:{SmartNIC-accelerated} contention-aware request scheduling for transaction processing. In: 2022 USENIX Annual Technical Conference (USENIX ATC 22). 2022. P. 951–966.

Liu, M. et al. Offloading distributed applications onto smartnics using ipipe. In: PROCEEDINGS of the ACM Special Interest Group on Data Communication. 2019. P. 318–333.

Min, J. et al. Gimbal: enabling multi-tenant storage disaggregation on SmartNIC JBOFs. In: PROCEEDINGS of the 2021 ACM SIGCOMM 2021 Conference. 2021. P. 106–122.

Moon, Y. et al. {AccelTCP}: Accelerating network applications with stateful {TCP} offloading. In: 17TH USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). 2020. P. 77–92.

Qiu, Y. et al. Automated smartnic offloading insights for network functions. In: PROCEEDINGS of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 2021. P. 772–787.

Schuh, H. N. et al. Xenic: SmartNIC-accelerated distributed transactions. In: PROCEEDINGS of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 2021. P. 740–755.

Team, T. TRex Realistic Traffic Generator. 2024. [link]. (Accessed on 09/24/2024).
Publicado
27/11/2024
CAMARGO, Arthur Vinícius; BERTHOLDO, Leandro; GRANVILLE, Lisandro. Optimizing Network Performance: Benchmarking of NVIDIA Bluefield-2 Offloading Capabilities. In: ESCOLA REGIONAL DE REDES DE COMPUTADORES (ERRC), 21. , 2024, Rio Grande/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 13-18. DOI: https://doi.org/10.5753/errc.2024.4561.