Uma Análise da Sobrecarga Imposta pelo Mecanismo de Replicação de Máquinas Virtuais Remus
Abstract
Remus is a primary-backup replication mechanism for the Xen hypervisor, providing high availability to virtual machines by frequent checkpointing (on the order of tens of checkpoints per second). Despite providing good fault tolerance for crash and omission failures, the performance implications of using Remus are not well understood. We experimentally characterize Remus performance under different scenarios and show that (i) the checkpointing frequency that provides the best performance is highly dependent on application behavior, and (ii) the overhead might be prohibitive for latency-sensitive network applications.References
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. “Xen and the art of virtualization”. Proc. of the 19th ACM Symp. on Operating Systems Principles (SOSP), 2003.
Bressoud, T. C., and Schneider, F. B. “Hypervisor-based fa ult tolerance”. ACM Trans. on Computer Systems, 14(1), 80-107, 1996.
Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. “The primary-backup approach”. In Distributed Systems (2nd Ed.), Cap. 8. Sape Mullender (Ed.), ACM Press, 1993.
Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. “Live migration of virtual machines”. Proc. of the 2nd USENIX Symp. on Networked Systems Design and Implementation (NSDI), 2005.
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., and Warfield, A. “Remus: High availability via asynchronous virtual machine replication”. Proc. of the 5th USENIX Symp. on Networked Systems Design and Implementation (NSDI), 2008.
Magalhães, D. M. V.; Soares, J. M.; Gomes, D. G. “Análise do Impacto de Migração de Máquinas Virtuais em Ambiente Computacional Virtualizado”. Anais do XXIX SBRC, 2011.
Pearce, M., Zeadally, Sherali., and Hunt, R. “Virtualization: Issues, security threats, and solutions”. ACM Computing Surveys, 45(2), 2013.
Rajagopalan, S., Cully, B., O’Connor, R., and Warfield, A. “Secondsite: Disaster tolerance as a service”. Proc. of the 8th ACM SIGPLAN/SIGOPS Conf. on Virtual Execution Environments (VEE), 2012.
Reisner, P., and Ellenberg, L. “DRBD v8 – replicated storage with shared disk semantics”. Proc. of the 12th International Linux System Technology Conference, 2005.
Stevens, W. R. TCP/Illustrated, Vol. 1: The Protocols. Addison-Wesley, 1994.
Wood, T., Ramakrishnan, K. K., Shenoy, P., and van der Merwe, J. “Cloudnet: Dynamic pooling of cloud resources by live WAN migration of virtual machines”. Proc. of the 7th ACM SIGPLAN/SIGOPS Conf. on Virtual Execution Environments (VEE), 2011.
Bressoud, T. C., and Schneider, F. B. “Hypervisor-based fa ult tolerance”. ACM Trans. on Computer Systems, 14(1), 80-107, 1996.
Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. “The primary-backup approach”. In Distributed Systems (2nd Ed.), Cap. 8. Sape Mullender (Ed.), ACM Press, 1993.
Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. “Live migration of virtual machines”. Proc. of the 2nd USENIX Symp. on Networked Systems Design and Implementation (NSDI), 2005.
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., and Warfield, A. “Remus: High availability via asynchronous virtual machine replication”. Proc. of the 5th USENIX Symp. on Networked Systems Design and Implementation (NSDI), 2008.
Magalhães, D. M. V.; Soares, J. M.; Gomes, D. G. “Análise do Impacto de Migração de Máquinas Virtuais em Ambiente Computacional Virtualizado”. Anais do XXIX SBRC, 2011.
Pearce, M., Zeadally, Sherali., and Hunt, R. “Virtualization: Issues, security threats, and solutions”. ACM Computing Surveys, 45(2), 2013.
Rajagopalan, S., Cully, B., O’Connor, R., and Warfield, A. “Secondsite: Disaster tolerance as a service”. Proc. of the 8th ACM SIGPLAN/SIGOPS Conf. on Virtual Execution Environments (VEE), 2012.
Reisner, P., and Ellenberg, L. “DRBD v8 – replicated storage with shared disk semantics”. Proc. of the 12th International Linux System Technology Conference, 2005.
Stevens, W. R. TCP/Illustrated, Vol. 1: The Protocols. Addison-Wesley, 1994.
Wood, T., Ramakrishnan, K. K., Shenoy, P., and van der Merwe, J. “Cloudnet: Dynamic pooling of cloud resources by live WAN migration of virtual machines”. Proc. of the 7th ACM SIGPLAN/SIGOPS Conf. on Virtual Execution Environments (VEE), 2011.
Published
2014-05-05
How to Cite
SILVA, Marcelo Pereira da; KOSLOVSKI, Guilherme; OBELHEIRO, Rafael R..
Uma Análise da Sobrecarga Imposta pelo Mecanismo de Replicação de Máquinas Virtuais Remus. In: FAULT TOLERANCE WORKSHOP (WTF), 15. , 2014, Florianópolis/SC.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2014
.
p. 160-173.
ISSN 2595-2684.
DOI: https://doi.org/10.5753/wtf.2014.22954.
