Experiencing DfAnalyzer for Runtime Analysis of Phylogenomic Dataflows

Resumo


Phylogenomic experiments provide the basis for evolutionary biology inferences. They are dataand CPU-intensive by nature and aim at producing phylogenomic trees based on an input dataset of protein sequences of genomes. These experiments can be modeled as scientific workflows. Although workflows can be efficiently managed by Workflow Management Systems (WfMS), they are not often used by bioinformaticians, which traditionally use scripts to implement their workflows. However, collecting provenance from scripts is a challenging task. In this paper, we specialize the DfAnalyzer tool for the phylogenomics domain. DfAnalyzer enables capturing, monitoring, debugging, and analysing dataflows while being generated by the script. Additionally, it can be invoked from scripts, in the same way bioinformaticians already import libraries in their code. The proposed approach captures strategic domain data, registering provenance and telemetry (performance) data to enable queries at runtime. Another advantage of specializing DfAnalyzer in the context of Phylogenomic experiments is the capability of capturing data from experiments that execute either locally or in HPC environments. We evaluated the proposed specialization of DfAnalyzer using the SciPhylomics workflow and the proposed approach showed relevant telemetry scenarios and rich data analyses.
Palavras-chave: Scientific workflow, Provenance, Dataflow analysis
Publicado
23/11/2020
DIAS, Luiz Gustavo; MATTOSO, Marta; LOPES, Bruno; DE OLIVEIRA, Daniel. Experiencing DfAnalyzer for Runtime Analysis of Phylogenomic Dataflows. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 13. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 105-116. ISSN 2316-1248.