A Framework for Executing Protein Sequence Alignment in Cloud Computing Services
Resumo
Protein sequence alignment is a task of great relevance in Bioinformatics and the Hirschberg algorithm is widely used for this task. This work proposes a framework for executing sequence alignment with the Hirschberg algorithm in different cloud computing services. In experiments, our framework was used to align HIV-1 protease sequences using different instances of AWS EC2 and different configurations of AWS Lambda functions.The results show that, for this application, there is a tradeoff between the expected execution time and the cost, e.g., in most cases AWS Lambda provides the best runtime, however at a higher USD cost. In this context, it is important to have a framework that helps in deciding which approach is most appropriate.Referências
Amazon (2021). Amazon web services. https://aws.amazon.com/about-aws/. [Online; accessed 21-April-2021].
Crespo-Cepeda, R., Agapito, G., Vazquez-Poletti, J. L., and Cannataro, M. (2019). Challenges and opportunities of amazon serverless lambda services in bioinformatics. BCB ’19, page 663–668, New York, NY, USA. Association for Computing Machinery.
GARTNER (2021). Magic quadrant for cloud infrastructure and platform services. [link]. [Online; accessed 28-August-2021].
HashiCorp (2021). Terraform: Write, plan, apply. https://www.terraform.io/. [Online; accessed 31-May-2021].
Hirschberg, D. S. (1975). A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341–343.
Hung, L.-H., Niu, X., Lloyd, W., and Yeung, K. Y. (2020). Accessible and interactive RNA sequencing analysis using serverless computing. bioRxiv.
Lynn, T., Rosati, P., Lejeune, A., and Emeakaroha, V. (2017). A preliminary review of enterprise serverless cloud computing (function-as-a-service) platforms. In IEEE CloudCom), pages 162–169.
Malla, S. and Christensen, K. (2020). Hpc in the cloud: Performance comparison of function as a service (faas) vs infrastructure as a service (iaas). Internet Technology Letters, 3(1):e137.
MELL, P. and Grance, T. (2011). The NIST definition of cloud computing. National Institute of Standards and Tecnology.
Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453.
Niu, X., Kumanov, D., Hung, L.-H., Lloyd, W., and Yeung, K. Y. (2019). Leveraging serverless computing to improve performance for sequence comparison. BCB’19, page 683–687. Association for Computing Machinery.
Poccia, D. (2020). New for AWS lambda – functions with up to 10 GB of memory and 6 vCPUs.
Sarje, A. and Aluru, S. (2009). Parallel genomic alignments on the cell broadband engine. IEEE TPDS, 20(11):1600–1610.
Smith, T. F., Waterman, M. S., et al. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1):195–197.
Crespo-Cepeda, R., Agapito, G., Vazquez-Poletti, J. L., and Cannataro, M. (2019). Challenges and opportunities of amazon serverless lambda services in bioinformatics. BCB ’19, page 663–668, New York, NY, USA. Association for Computing Machinery.
GARTNER (2021). Magic quadrant for cloud infrastructure and platform services. [link]. [Online; accessed 28-August-2021].
HashiCorp (2021). Terraform: Write, plan, apply. https://www.terraform.io/. [Online; accessed 31-May-2021].
Hirschberg, D. S. (1975). A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341–343.
Hung, L.-H., Niu, X., Lloyd, W., and Yeung, K. Y. (2020). Accessible and interactive RNA sequencing analysis using serverless computing. bioRxiv.
Lynn, T., Rosati, P., Lejeune, A., and Emeakaroha, V. (2017). A preliminary review of enterprise serverless cloud computing (function-as-a-service) platforms. In IEEE CloudCom), pages 162–169.
Malla, S. and Christensen, K. (2020). Hpc in the cloud: Performance comparison of function as a service (faas) vs infrastructure as a service (iaas). Internet Technology Letters, 3(1):e137.
MELL, P. and Grance, T. (2011). The NIST definition of cloud computing. National Institute of Standards and Tecnology.
Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453.
Niu, X., Kumanov, D., Hung, L.-H., Lloyd, W., and Yeung, K. Y. (2019). Leveraging serverless computing to improve performance for sequence comparison. BCB’19, page 683–687. Association for Computing Machinery.
Poccia, D. (2020). New for AWS lambda – functions with up to 10 GB of memory and 6 vCPUs.
Sarje, A. and Aluru, S. (2009). Parallel genomic alignments on the cell broadband engine. IEEE TPDS, 20(11):1600–1610.
Smith, T. F., Waterman, M. S., et al. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1):195–197.
Publicado
26/10/2021
Como Citar
CARVALHO, Leonardo Reboucas de; MELO, Alba Cristina Alves; ARAUJO, Aleteia.
A Framework for Executing Protein Sequence Alignment in Cloud Computing Services. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 22. , 2021, Belo Horizonte.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 48-59.
DOI: https://doi.org/10.5753/wscad.2021.18511.