Avaliação dos Serviços de Armazenamento da Amazon Web Services para Gravação e Recuperação de Checkpoints

  • Luan Teylo Federal Fluminense University
  • Rafaela Brum Federal Fluminense University
  • Luciana Arantes Sorbonne Université
  • Pierre Sens Sorbonne Université
  • Lúcia Drummond Federal Fluminense University

Abstract


Cloud providers offer several resources for processing and storing data. Some of those resources are prone to failures or revocations, and the adoption of fault tolerance techniques are necessary to minimize the impact and cost of those failures on the users’ application and budget. One of the most widely adopted solutions is the checkpoint and recovery techniques, that record the current state of the application and use it to restart the application from its last state if any fault happens. Thus, those approaches require a safe storage location. Fortunately, when adopted cloud environments, the user has different storage options given by the provider himself in the form of cloud services. In this work, we want to evaluate three of those services: Amazon Simple Storage Service (S3), Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS). The objective is to characterize and evaluate the performance of those services in relation to the checkpoint and recovery process in the context of faults induced by the revocation of virtual machines.

Keywords: checkpoint e recovery, serviços de armazenamento, avaliação

References

Alves, M. M. and Drummond, L. M. (2017). A multivariate and quantitative model forpredicting cross-application interference in virtual environments. Journal of Systemsand Software, 128:150 — 163.

AWS (2020a). Amazon EC2 pricing. https://aws.amazon.com/ec2/pricing/. Accessed 15 March 2020.

AWS (2020b). Amazon Elastic Block Store. https://aws.amazon.com/ebs/.Accessed 15 March 2020.

AWS (2020c). Amazon S3. https: //aws.amazon.com/s3/. Accessed 15 March2020.

Di, S., Robert, Y., Vivien, F., Kondo, D., Wang, C.-L., and Cappello, F. (2013). Optimi-zation of cloud task processing with checkpoint-restart mechanism. In Proceedings ofthe International Conference on High Performance Computing, Networking, Storageand Analysis, SC "13, New York, NY, USA. Association for Computing Machinery.

Egwutuoha, I. P., Levy, D., Selic, B., and Chen, S. (2013). A survey of fault tolerancemechanisms and checkpoint/restart implementations for high performance computingsystems. The Journal of Supercomputing, 65(3):1302-1326.

Elnozahy, E. N., Alvisi, L., Wang, Y.-M., and Johnson, D. B. (2002). A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys (CSUR),34(3):375-408.

EMELYANOV, P. (2011). Criu: Checkpoint/restore in userspace, july 2011. URL:https://criu. org.

Hargrove, P. H. and Duell, J. C. (2006). Berkeley lab checkpoint/restart (BLCR) for linuxclusters. Journal of Physics: Conference Series, 46:494-499.

Meroufel, B. and Belalem, G. (2018). Optimization of checkpointing/recovery strategy incloud computing with adaptive storage management. Concurrency and Computation:Practice and Experience, 30(24):e4906. e4906 CPE-17-0585.R2.

Nicolae, B. and Cappello, F. (2011). Blobcr: Efficient checkpoint-restart for hpc applicati-ons on iaas clouds using virtual disk image snapshots. In SC "11: Proceedings of 2011International Conference for High Performance Computing, Networking, Storage andAnalysis, pages 1-12, Seatle, WA, USA. IEEE.

Poola, D., Ramamohanarao, K., and Buyya, R. (2014). Fault-tolerant workflow schedu-ling using spot instances on clouds. Procedia Computer Science, 29:523 — 533. 2014International Conference on Computational Science.

Rizun, R. (2011). S3fs: Fuse-based file system backed by amazon s3. https://github.com/s3fs-fuse/s3fs-fuse/.

Ruiz-Alvarez, A. and Humphrey, M. (2011). An automated approach to cloud storageservice selection. In Proceedings of the 2nd International Workshop on ScientificCloud Computing, ScienceCloud "11, page 39-48, New York, NY, USA. Associationfor Computing Machinery.

Services, A. W. (2020a). Amazon Elastic File System. https: //aws.amazon.com/efs/. Accessed 07 April 2020.

Services, A. W. (2020b). Boto 3 Documentation. https://boto3.amazonaws.com/vl/documentation/api/latest/index.html. Accessed 01 April2020.

Services, A. W. (2020c). Cloud Storage with AWS. https://aws.amazon.com/products/storage/?ncl=h 1s. Accessed 02 April 2020.

Teylo, L., Arantes, L., Sens, P., and de A. Drummond, L. M. (2019). A hibernation awaredynamic scheduler for cloud environments. In Proceedings of the 48th InternationalConference on Parallel Processing: Workshops, ICPP 2019, New York, NY, USA.Association for Computing Machinery.

Yaothanee, J. and Chanchio, K. (2019). An in-memory checkpoint-restart mechanism fora cluster of virtual machines. In 2019 16th International Joint Conference on Compu-ter Science and Software Engineering (JCSSE), pages 131-136, Chonburi, Thailand,Thailand. IEEE.

Yi, S., Kondo, D., and Andrzejak, A. (2010). Reducing costs of spot instances via check-pointing in the amazon elastic compute cloud. In 2010 IEEE 3rd International Confe-rence on Cloud Computing, pages 236-243, Miami, FL, USA. IEEE, IEEE.

Zhou, A., Sun, Q., and Li, J. (2017). Enhancing reliability via checkpointing in cloudcomputing systems. China Communications, 14(7):1-10.
Published
2020-12-07
TEYLO, Luan; BRUM, Rafaela; ARANTES, Luciana; SENS, Pierre; DRUMMOND, Lúcia. Avaliação dos Serviços de Armazenamento da Amazon Web Services para Gravação e Recuperação de Checkpoints. In: FAULT TOLERANCE WORKSHOP (WTF), 21. , 2020, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 29-40. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2020.12485.