PIMBA: A PIpeline for MetaBarcoding Analysis

Renato R. M. Oliveira; Raíssa Silva; Gisele L. Nunes; Guilherme Oliveira

Renato R. M. Oliveira Instituto Tecnológico Vale / UFMG
Raíssa Silva Instituto Tecnológico Vale
Gisele L. Nunes Instituto Tecnológico Vale
Guilherme Oliveira Instituto Tecnológico Vale

Resumo

DNA metabarcoding is an emerging monitoring method capable of assessing biodiversity from environmental samples (eDNA). Advances in computational tools have been required due to the increase of Next-Generation Sequencing data. Tools for DNA metabarcoding analysis, such as MOTHUR, QIIME, Obitools, PEMA, and mBRAVE have been widely used in ecological studies, however, some difficulties are encountered when there is a need to use custom databases. Here we present PIMBA, a PIpeline for MetaBarcoding Analysis, which allows the use of customized databases, as well as other reference databases used by the software mentioned here. PIMBA is an open-source and user-friendly pipeline that consolidates all analyses in just three command lines. PIMBA’s implementation is available at https://github.com/reinator/pimba.

Palavras-chave: DNA metabarcoding, Flexible pipeline, OTU, ASV

Referências

Creer, S., et al.: The ecologist’s field guide to sequence-based identification of biodiversity. Meth. Ecol. Evol. 7, 1008–1018 (2016). https://doi.org/10.1111/2041-210X.12574

Alberdi, A., Aizpurua, O., Gilbert, M.T.P., Bohmann, K.: Scrutinizing key steps for reliable metabarcoding of environmental samples. Meth. Ecol. Evol. 9, 134–147 (2018). https://doi.org/10.1111/2041-210X.12849

Schloss, P.D., et al.: Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009). https://doi.org/10.1128/AEM.01541-09

Caporaso, J.G., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Meth. 7, 335–336 (2010). https://doi.org/10.1038/nmeth.f.303

Boyer, F., Mercier, C., Bonin, A., Le Bras, Y., Taberlet, P., Coissac, E.: Obitools : a unix -inspired software package for DNA metabarcoding. Mol. Ecol. Resour. 16, 176–182 (2016). https://doi.org/10.1111/1755-0998.12428

Ratnasingham, S.: mBRAVE: the multiplex barcode research and visualization environment. Biodivers. Inf. Sci. Stand. 3, e37986 (2019). https://doi.org/10.3897/biss.3.37986

Zafeiropoulos, H., et al.: PEMA: a flexible pipeline for environmental DNA metabarcoding analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes. Gigascience 9, 1–12 (2020). https://doi.org/10.1093/GIGASCIENCE/GIAA022

Cristescu, M.E.: From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. Trends Ecol. Evol. 29(10), 566-571 (2014). https://doi.org/10.1016/j.tree.2014.08.001

Hering, D., et al.: Implementation options for DNA-based identification into ecological status assessment under the European water framework directive. Water Res. 138, 192–205 (2018). https://doi.org/10.1016/j.watres.2018.03.003

Deiner, K., et al.: Environmental DNA metabarcoding: transforming how we survey animal and plant communities. Mol. Ecol. 26, 5872–5895 (2017). https://doi.org/10.1111/mec.14350

Callahan, B.J., McMurdie, P.J., Holmes, S.P.: Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11(12), 2639–2643 (2017). https://doi.org/10.1038/ismej.2017.119

DeSantis, T.Z., et al.: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006). https://doi.org/10.1128/AEM.03006-05

Quast, C., et al.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013). https://doi.org/10.1093/nar/gks1219

Abarenkov, K., et al.: The UNITE database for molecular identification of fungi – recent updates and future perspectives. https://www.jstor.org/stable/27797548. (2010). https://doi.org/10.2307/27797548

Ratnasingham, S., Hebert, P.D.N.: BARCODING: bold: the barcode of life data system (http://www.barcodinglife.org). Mol. Ecol. Notes. 7, 355–364 (2007). https://doi.org/10.1111/j.1471-8286.2007.01678.x

Machida, R.J., Leray, M., Ho, S.-L., Knowlton, N.: Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples. Sci. Data 41(4), 1–7 (2017). https://doi.org/10.1038/sdata.2017.27

Pylro, V.S., et al.: Brazilian microbiome project: revealing the unexplored microbial diversity—challenges and prospects. Microb. Ecol. 67(2), 237–241 (2013). https://doi.org/10.1007/s00248-013-0302-4

Frøslev, T.G., et al.: Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nat. Commun. 8, 1–11 (2017). https://doi.org/10.1038/s41467-017-01312-x

Mahé, F., Rognes, T., Quince, C., de Vargas, C., Dunthorn, M.: Swarm v2: highly-scalable and high-resolution amplicon clustering. Peer J. 3, e1420 (2015). https://doi.org/10.7717/PEERJ.1420

McMurdie, P.J., Holmes, S.: phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013). https://doi.org/10.1371/journal.pone.0061217

Schubert, M., Lindgreen, S., Orlando, L.: AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 91(9), 1–7 (2016). https://doi.org/10.1186/S13104-016-1900-2

Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014). https://doi.org/10.1093/bioinformatics/btt593

Schmieder, R., Edwards, R.: Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011). https://doi.org/10.1093/bioinformatics/btr026

Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F.: VSEARCH: a versatile open source tool for metagenomics. Peer J. 4, e2584 (2016). https://doi.org/10.7717/PEERJ.2584

Cole, J.R., et al.: Ribosomal database project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014). https://doi.org/10.1093/NAR/GKT1244

Benson, D.A., et al.: GenBank. Nucleic Acids Res. 41, D36–D42 (2013). https://doi.org/10.1093/NAR/GKS1195

Tatusova, T.A., Madden, T.L.: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247–250 (1999). https://doi.org/10.1111/j.1574-6968.1999.tb13575.x

Bengtsson-Palme, J., et al.: Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Meth. Ecol. Evol. 4, 914–919 (2013). https://doi.org/10.1111/2041-210X.12073

McDonald, D., et al.: The biological observation matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience 1(1), 2047-217X (2012). https://doi.org/10.1186/2047-217X-1-7

Gohl, D.M., et al.: Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat. Biotechnol. 349(34), 942–949 (2016). https://doi.org/10.1038/nbt.3601

Bakker, M.G.: A fungal mock community control for amplicon sequencing experiments. Mol. Ecol. Resour. 18, 541–556 (2018). https://doi.org/10.1111/1755-0998.12760

Bista, I., et al.: Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples. Mol. Ecol. Resour. 18, 1020–1034 (2018). https://doi.org/10.1111/1755-0998.12888

Encyclopedia of Machine Learning: Encycl. Mach. Learn. (2010). https://doi.org/10.1007/978-0-387-30164-8

Toju, H., Tanabe, A.S., Yamamoto, S., Sato, H.: High-coverage ITS primers for the DNA-based identification of ascomycetes and basidiomycetes in environmental samples. PLoS ONE 7, e40863 (2012). https://doi.org/10.1371/JOURNAL.PONE.0040863