Advancing Research on Bioinformatics and Cloud Infrastructure using AWS

  • Alba C. M. A. Melo UnB
  • Lucia M. A. Drummond UFF
  • Celia G. Ralha UFBA
  • Gustavo J. Portella Capes
  • Luan Teylo Inria Centre at the University of Bordeaux
  • Aldo H. D. Mendes UNIEURO

Resumo


This paper describes joint research to accelerate Bionformatics Applications in the AWS Cloud. There were two main axes of research: parallel sequence comparion Bioinformatics tools and cloud schedulers to execute efficiently the parallel applications in the AWS cloud. Our project involved three universities and one research institute and produced as outcome 4 papers in prestigious international journals, 2 papers in international conferences and 5 PhD Theses. As results of scientific research, we developed (a) a cloud scheduler that aims to reduce both the running time and the cost of the execution; (b) a combined model that uses statistics and neural networks to predict the cost variation of spot instances; (c) a multiagent framework to provision and execute cloud applications; (d) a fault tolerant strategy to execute long running applications with GPUs in the cloud. Additionally, since the duration of the CNPq-AWS project was from 2020 to 2021, thus during the covid-19 pandemics, we were able to compare hundreds of thousands of SARS-CoV-2 sequences with our cloud schedulers and parallel sequence comparison tools. Finally, members of our group were editors of a book on High Performance Clouds, published in 2023 by Springer Nature.

Referências

Banimfreg, B. H. (2023). A comprehensive review and conceptual framework for cloud computing adoption in bioinformatics. Healthcare Analytics, 3.

Borin, E., Drummond, L. M. A., Gaudiot, J.-L., Melo, A. C. M. A., Alves, M. M., and Navaux, P. O. A. (2023). High Performance Computing in Clouds: Moving HPC Applications to a Scalable and Cost-Effective Environment. Springer Nature.

Brum, R. C. (2024). Multi-FedLS: A Scheduler of Federated Learning Applications in a Multi-Cloud Environment. PhD thesis, Graduate Program in Computer Science, Federal Fluminense University and Sorbonne University. Available online: [link].

Brum, R. C., Sousa, W. P., Melo, A. C. M. A., Bentes, C., Castro, M. C. S., and Drummond, L. M. A. (2021). A fault tolerant and deadline constrained sequence alignment application on cloud-based spot gpu instances. In 27th International Conference on Parallel and Distributed Computing, Euro-Par, Virtual, pages 317–333.

Carvalho, L. R., Melo, A. C. M. A., and Araujo, A. P. F. (2023). Afmc: An alignment framework for multiple computing services and providers. Concurrency Computation: Practice and Experience, 35(18).

Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. J. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

Figueiredo, M. A. C., Navarro, J. P., Sandes, E. F. O., Teodoro, G., and Melo, A. C. M. A. (2021). Parallel fine-grained comparison of long dna sequences in homogeneous and heterogeneous gpu platforms with pruning. IEEE Transactions on Parallel and Distributed Systems, 32(12):3053–3065.

Gupta, V., Sengupta, M., Prakash, J., and Tripathy, B. C. (2017). Basic and Applied Aspects of Biotechnology. Springer Nature.

Jorge, C. A. C. (2022). Comparação paralela de sequências biológicas em plataformas de hardware uniformes e híbridas. PhD thesis, Graduate Program in Informatics, University of Brasilia. Available online: [link].

Mendes, A. H. D. (2024). Arquitetura multiagente com modelos de raciocínio distintos para gerenciamento de recursos em múltiplos provedores de nuvem. PhD thesis, Graduate Program in Informatics, University of Brasilia. Available online: [link].

Mendes, A. H. D., Rosa, M. J. F., Marotta, M. A., Araujo, A. P. F., Melo, A. C. M. A., and Ralha, C. G. (2024). Mas-cloud+: A novel multi-agent architecture with reasoning models for resource management in multiple providers. Future Generation Computer Systems, 154:16–34.

Portella, G., Rodrigues, G. N., Nakano, E., and Melo, A. C. (2019a). Statistical analysis of amazon ec2 cloud pricing models. Concurrency and Computation: Practice and Experience, 31(18):e4451. e4451 cpe.4451.

Portella, G. J. (2021). Precificacao em computacao em nuvem para instâncias permanentes e transientes : modelagem e previsao. PhD thesis, Graduate Program in Informatics, University of Brasilia. Available online, [link].

Portella, G. J., Nakano, E. Y., Rodrigues, G. N., Boukerche, A., and Melo, A. C. M. A. (2024). A novel statistical and neural network combined approach for the cloud spot market. IEEE Transactions on Cloud Computing, 11(1):278–290.

Portella, G. J., Nakano, E. Y., Rodrigues, G. N., and Melo., A. C. M. A. (2019b). Utility-based strategy for balanced cost and availability at the cloud spot market. In 9th IEEE International Conference on Cloud Computing (IEEE CLOUD), Milan, pages 214–218.

Ralha, C. G., Mendes, A. H. D., Laranjeira, L. A., Araujo, A. P. F., and Melo, A. C. M. A. (2019). Multiagent system for dynamic resource provisioning in cloud computing platforms. Future Generation Computing Systems, 94:80–96.

Sandes, E. F. O., Guillermo Miranda, X. M., Ayguade, E., Teodoro, G., and Melo, A. C. M. A. (2016a). Masa: A multiplatform architecture for sequence aligners with block pruning. ACM Transactions on Parallel Computing, 2(4).

Sandes, E. F. O., Miranda, G., Martorell, X., Ayguade, E., Teodoro, G., and Melo, A. C. M. A. (2016b). Cudalign 4.0: Incremental speculative traceback for exact chromosome-wide alignment in gpu clusters. IEEE Transactions on Parallel and Distributed Systems, 27(10):2838–2850.

Teylo, L. (2022). Scheduling Deadline Constrained Bag-of-Tasks in Cloud Environments Using Hibernation Prone Spot Instances. PhD thesis, Graduate Program in Computer Science, Federal Fluminense University. Available online: [link].

Teylo, L., Arantes, L., Sens, P., and Drummond, L. M. A. (2023). Scheduling bag-of-tasks in clouds using spot and burstable virtual machines. IEEE Transactions on Cloud Computing, 11(1):964–982.

Teylo, L., Nunes, A. L., Melo, A. C. M. A., Boeres, C., de A. Drummond, L. M., and Martins, N. F. (2021). Comparing sars-cov-2 sequences using a commercial cloud with a spot instance based dynamic scheduler. In 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Virtual, pages 247–256.
Publicado
19/07/2026
MELO, Alba C. M. A.; DRUMMOND, Lucia M. A.; RALHA, Celia G.; PORTELLA, Gustavo J.; TEYLO, Luan; MENDES, Aldo H. D.. Advancing Research on Bioinformatics and Cloud Infrastructure using AWS. In: SIMPÓSIO DE INFRAESTRUTURA DIGITAL/NUVEM PARA PESQUISA (PESQUISA@NUVEM), 1. , 2026, Gramado/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 1-10. DOI: https://doi.org/10.5753/pesquisanuvem.2026.21800.