Avaliação de Desempenho de um Workflow Científico para Experimentos de RNA-Seq no Supercomputador Santos Dumont

  • Lucas Cruz LNCC
  • Micaella Coelho LNCC
  • Luiz Gadelha LNCC
  • Kary Ocaña LNCC
  • Carla Osthoff LNCC

Abstract


Large-scale scientific experiments are considered complex due to moIn bioinformatics deling of their activities, execution and big data analyses. these experiments are modeling as scientific workflows using the High Performance Computing and data science concepts. This research paper presents the ParslRNA-Seq workflow for RNA-Seq experiments and analyses of performance execution on the Santos Dumont supercomputer using real data. The results show an improvement on performance, by comparing to execution on traditional way without parallelization and via Web, from 3 days to 11 hours, with reproducibility of biological results, by comparing ParslRNA-Seq to tradicional or Web application execution. The workflow multithreading execution also indicates that the parametrization is dependent on Parsl and bowtie use.

References

Anders, S., Pyl, P. T., and Huber, W. (2014). HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics, 31(2):166–169.

Babuji, Y., Woodard, A., Li, Z., Katz, D. S., Clifford, B., Kumar, R., Lacinski, L., Chard, R., Wozniak, J., Foster, I., Wilde, M., and Chard, K. (2019). Parsl: Pervasive parallel programming in python. In 28th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC).

Braghetto, K. R. and Cordeiro, D. (2014). Introdução à modelagem e execução de work- ows cientícos. Atualizações em Informática. 1ed. Porto Alegre: SBC, pages 1–40.

Cordeiro, D., Braghetto, K. R., Goldman, A., and Kon, F. (2013). Da ciência à e-ciência: paradigmas da descoberta do conhecimento. Revista USP, (97):71–81.

Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read alignment with bowtie 2. Nature methods, 9(4):357.

Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology, 15(12):550.

Mattos, A., Silva, F., Ruberg, N., and Cruz, M. (2008). Gerência de workows cientícos: uma análise crítica no contexto da bioinformática. COPPE/UFRJ.

Oca˜na, K. A. C. S., Galheigo, M., Osthoff, C., Gadelha, L., Porto, F., Gomes, A., Oliveira, D., and Vasconcelos, A. T. (2020). Bioinfoportal: A scientic gateway for integra- ting bioinformatics applications on the brazilian national high-performance computing network. Future Generation Computer Systems, 107:192–214.

Silva, R. R. and Yokoyama, R. S. (2011). Avaliação do desempenho de threads em user level utilizando sistema operacional linux. Revista de Informática Teórica e Aplicada.
Published
2020-10-21
CRUZ, Lucas; COELHO, Micaella; GADELHA, Luiz; OCAÑA, Kary; OSTHOFF, Carla. Avaliação de Desempenho de um Workflow Científico para Experimentos de RNA-Seq no Supercomputador Santos Dumont. In: UNDERGRADUATE RESEARCH WORKSHOP - SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 21. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 86-93. DOI: https://doi.org/10.5753/wscad_estendido.2020.14093.