Scaling Stateful Network Services on Multicore Architectures

Resumo


This thesis investigates the effective scheduling of TCP stacks alongside applications on multicore architectures, focusing on the trade-offs in allocating workers for both TCP and application processing. It explores the interplay between stateful network protocols with strong guarantees and the challenges of scheduling such protocols alongside multicore applications. To allow fair comparisons, we design and implement Demieagle, a benchmark framework that allows the execution of “apples-to-apples” experiments to uncover the trade-offs of different multicore scheduling policies and architectures. We also address the complexity of scaling stateful network functions, which require per-packet state updates. During a scaling operation, workers need to synchronize access to a shared state to avoid race conditions and to guarantee that network functions process packets in arrival order. Unfortunately, the classic approach to control concurrent access to a shared state with locks does not scale to today’s throughput and latency requirements. To address these challenges, we design, implement, and evaluate Dyssect, a system that enables dynamic scaling of stateful network functions by disaggregating their states. Dyssect’s state disaggregation allows the offloading of stateful network functions to programmable NICs and makes it easier to explore hardware-software trade-offs that better suit specific network functions and traffic loads. Our experimental evaluation shows that Dyssect reduces tail latency up to 32.04% and increases throughput up to 19.36% compared to state-of-the-art competing solutions.
Palavras-chave: Serviços de Rede, TCP, Funções de Rede com Estado, NFV

Referências

Barbette, T., Katsikas, G. P., Maguire, G. Q., and Kostić, D. (2019). RSS++: Load and State-Aware Receive Side Scaling. In Proc. of ACM CoNEXT.

Carvalho, F. B. (2024). Scaling Stateful Network Services on Multicore Architectures. PhD thesis, Federal University of Mato Grosso do Sul (UFMS).

Carvalho, F. B. et al. (2022). Dyssect: Dynamic Scaling of Stateful Network Functions. In Proc. of IEEE INFOCOM.

Carvalho, F. B. et al. (2024). State Disaggregation for Dynamic Scaling of Network Functions. IEEE/ACM Transactions on Networking.

Carvalho, F. B. et al. (2025). A Principled Approach to Multicore Scheduling in the Microsecond Era. To be submitted to ACM SOSP 2025.

Demoulin, H. M. et al. (2021). When Idling is Ideal: Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone. In Proc. of ACM SOSP.

Fried, J., Ruan, Z., Ousterhout, A., and Belay, A. (2020). Caladan: Mitigating Interference at Microsecond Timescales. In Proc. of USENIX OSDI.

Gember-Jacobson et al. (2014). OpenNF: Enabling Innovation in Network Function Control. In Proc. of ACM SIGCOMM.

Jeong, E. Y. et al. (2014). MTCP: A Highly Scalable User-Level TCP Stack for Multicore Systems. In Proc. of USENIX NSDI.

Kablan, M., Alsudais, A., Keller, E., and Le, F. (2017). Stateless Network Functions: Breaking the Tight Coupling of State and Processing. In Proc. of USENIX NSDI.

Kaffes, K. et al. (2019). Shinjuku: Preemptive Scheduling for usecond-Scale Tail Latency. In Proc. of USENIX NSDI.

Kaufmann, A., Stamler, T., Peter, S., Sharma, N. K., Krishnamurthy, A., and Anderson, T. (2019). TAS: TCP Acceleration as an OS Service. In Proc. of ACM EuroSys.

Liu, M. et al. (2019). Offloading Distributed Applications onto SmartNICs Using iPipe. In Proc. of ACM SIGCOMM.

Ousterhout, A. et al. (2019). Shenango: Achieving High CPU Efficiency for Latency-Sensitive Datacenter Workloads. In Proc. of USENIX NSDI.

Peter, S. et al. (2014). Arrakis: The Operating System is the Control Plane. In Proc. of USENIX OSDI.

Woo, S., Sherry, J., Han, S., Moon, S., Ratnasamy, S., and Shenker, S. (2018). Elastic Scaling of Stateful Network Functions. In Proc. of USENIX NSDI.

Zhang, I. et al. (2021). The Demikernel Datapath OS Architecture for Microsecond-Scale Datacenter Systems. In Proc. of ACM SOSP.
Publicado
19/05/2025
CARVALHO, Fabrício B.; FERREIRA, Ronaldo A.. Scaling Stateful Network Services on Multicore Architectures. In: CONCURSO DE TESES E DISSERTAÇÕES - SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 43. , 2025, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 192-201. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc_estendido.2025.6893.