A High Availability Architecture for Stateful Virtual Network Functions
Abstract
Virtualization has represented a revolution on the way networks are built and managed. In particular, dedicated hardware can be replaced by Virtualized Network Functions (VNFs) which can be even downloaded from Internet marketplaces. However, it is undeniable that VNFs are more susceptible to failures. In this work we propose a high availability architecture for VNFs, which provides fault management and a variety of recovery techniques. As stateful VNFs are executed in virtualized environments, copying the entire state is an attractive strategy that does not require any code modifications to the VNFs. The architecture is based on Checkpoint/Restore and was designed within the NFV-MANO reference architecture. A proof-of-concept prototype was implemented and experimental results are presented.
References
Bondan, L., Franco, M. F., Marcuzzo, L., Venancio, G., Santos, R. L., Pfitscher, R. J., Scheid, E. J., Stiller, B., De Turck, F., Duarte, E. P., et al. (2019). Fende: Marketplace-based distribution, execution, and life cycle management of vnfs. IEEE Communications Magazine, 57(1):13–19.
Chiosi, M., Clarke, D., Willis, P., Reid, A., Feger, J., Bugenhagen, M., Khan, W., Fargano, M., Cui, C., Deng, H., et al. (2012). Network functions virtualisation: An introduction, benefits, enablers, challenges and call for action. In SDN and OpenFlow World Congress, pages 22–24.
Cotroneo, D., De Simone, L., Iannillo, A. K., Lanzaro, A., Natella, R., Fan, J., and Ping, W. (2014). Network function virtualization: Challenges and directions for reliability assurance. In 2014 IEEE International Symposium on Software Reliability Engineering Workshops, pages 37–42. IEEE.
CRIU (2019). Checkpoint/Restore In Userspace. https://criu.org/. Dezembro de 2019.
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., and Warfield, A. (2008). Remus: High availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pages 161–174. San Francisco.
Elnozahy, E. N., Alvisi, L., Wang, Y.-M., and Johnson, D. B. (2002). A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys (CSUR), 34(3):375–408.
Gember-Jacobson, A., Viswanathan, R., Prakash, C., Grandl, R., Khalid, J., Das, S., and Akella, A. (2014). Opennf: Enabling innovation in network function control. In ACM SIGCOMM Computer Communication Review, pages 163–174. ACM.
Gray, J. and Siewiorek, D. P. (1991). High-availability computer systems. Computer, 24(9):39–48.
Han, B., Gopalakrishnan, V., Ji, L., and Lee, S. (2015). Network function virtualization: Challenges and opportunities for innovations. IEEE Communications Magazine, 53(2):90–97.
Han, B., Gopalakrishnan, V., Kathirvel, G., and Shaikh, A. (2017). On the resiliency of virtual network functions. IEEE Communications Magazine, 55(7):152–157.
Kablan, M., Alsudais, A., Keller, E., and Le, F. (2017). Stateless network functions: Breaking the tight coupling of state and processing. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 97–112.
Kshemkalyani, A. D. and Singhal, M. (2011). Distributed computing: principles, algorithms, and systems. Cambridge University Press.
Kulkarni, S. G., Liu, G., Ramakrishnan, K., Arumaithurai, M., Wood, T., and Fu, X. (2018). Reinforce: Achieving efficient failure resiliency for network function virtualization based services. In Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies, pages 41–53. ACM.
Li, W., Kanso, A., and Gherbi, A. (2015). Leveraging linux containers to achieve high availability for cloud services. In 2015 IEEE International Conference on Cloud Engineering, pages 76–83. IEEE.
Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239):2.
Mijumbi, R., Serrat, J., Gorricho, J.-L., Bouten, N., De Turck, F., and Boutaba, R. (2016). Network function virtualization: State-of-the-art and research challenges. IEEE Communications Surveys & Tutorials, 18(1):236–262.
Nakamura, H., Adams, R., and et al (2016). Network Functions Virtualisation (NFV); Reliability; Report on Models and Features for End-to-End Reliability. GS NFV-REL 003 V1.1.1. Technical report, ETSI.
OpenStack (2019). OpenStack - open source software for creating private and public clouds.
Quittek, J., Bauskar, P., BenMeriem, T., Bennett, A., Besson, M., and et al (2014). Network Functions Virtualisation (NFV); Management and Orchestration. GS NFV-MAN 001. Technical report, ETSI.
Rajagopalan, S., Williams, D., Jamjoom, H., and Warfield, A. (2013). Split/merge: System support for elastic execution in virtual middleboxes. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 227–240.
Schöller, M., Khan, N., and et al (2015). Network Function Virtualisation (NFV); Resiliency Requirements. GS NFV-REL 001 V1.1.1. Technical report, ETSI.
Sherry, J., Gao, P. X., Basu, S., Panda, A., Krishnamurthy, A., Maciocco, C., Manesh, M., Martins, J., Ratnasamy, S., Rizzo, L., et al. (2015). Rollback-recovery for middleboxes. In ACM SIGCOMM Computer Communication Review, pages 227–240. ACM.
