PIMBA: A PIpeline for MetaBarcoding Analysis
Resumo
DNA metabarcoding is an emerging monitoring method capable of assessing biodiversity from environmental samples (eDNA). Advances in computational tools have been required due to the increase of Next-Generation Sequencing data. Tools for DNA metabarcoding analysis, such as MOTHUR, QIIME, Obitools, PEMA, and mBRAVE have been widely used in ecological studies, however, some difficulties are encountered when there is a need to use custom databases. Here we present PIMBA, a PIpeline for MetaBarcoding Analysis, which allows the use of customized databases, as well as other reference databases used by the software mentioned here. PIMBA is an open-source and user-friendly pipeline that consolidates all analyses in just three command lines. PIMBA’s implementation is available at https://github.com/reinator/pimba.
Referências
Alberdi, A., Aizpurua, O., Gilbert, M.T.P., Bohmann, K.: Scrutinizing key steps for reliable metabarcoding of environmental samples. Meth. Ecol. Evol. 9, 134–147 (2018). https://doi.org/10.1111/2041-210X.12849
Schloss, P.D., et al.: Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009). https://doi.org/10.1128/AEM.01541-09
Caporaso, J.G., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Meth. 7, 335–336 (2010). https://doi.org/10.1038/nmeth.f.303
Boyer, F., Mercier, C., Bonin, A., Le Bras, Y., Taberlet, P., Coissac, E.: Obitools : a unix -inspired software package for DNA metabarcoding. Mol. Ecol. Resour. 16, 176–182 (2016). https://doi.org/10.1111/1755-0998.12428
Ratnasingham, S.: mBRAVE: the multiplex barcode research and visualization environment. Biodivers. Inf. Sci. Stand. 3, e37986 (2019). https://doi.org/10.3897/biss.3.37986
Zafeiropoulos, H., et al.: PEMA: a flexible pipeline for environmental DNA metabarcoding analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes. Gigascience 9, 1–12 (2020). https://doi.org/10.1093/GIGASCIENCE/GIAA022
Cristescu, M.E.: From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. Trends Ecol. Evol. 29(10), 566-571 (2014). https://doi.org/10.1016/j.tree.2014.08.001
Hering, D., et al.: Implementation options for DNA-based identification into ecological status assessment under the European water framework directive. Water Res. 138, 192–205 (2018). https://doi.org/10.1016/j.watres.2018.03.003
Deiner, K., et al.: Environmental DNA metabarcoding: transforming how we survey animal and plant communities. Mol. Ecol. 26, 5872–5895 (2017). https://doi.org/10.1111/mec.14350
Callahan, B.J., McMurdie, P.J., Holmes, S.P.: Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11(12), 2639–2643 (2017). https://doi.org/10.1038/ismej.2017.119
DeSantis, T.Z., et al.: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006). https://doi.org/10.1128/AEM.03006-05
Quast, C., et al.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013). https://doi.org/10.1093/nar/gks1219
Abarenkov, K., et al.: The UNITE database for molecular identification of fungi – recent updates and future perspectives. https://www.jstor.org/stable/27797548. (2010). https://doi.org/10.2307/27797548
Ratnasingham, S., Hebert, P.D.N.: BARCODING: bold: the barcode of life data system (http://www.barcodinglife.org). Mol. Ecol. Notes. 7, 355–364 (2007). https://doi.org/10.1111/j.1471-8286.2007.01678.x
Machida, R.J., Leray, M., Ho, S.-L., Knowlton, N.: Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples. Sci. Data 41(4), 1–7 (2017). https://doi.org/10.1038/sdata.2017.27
Pylro, V.S., et al.: Brazilian microbiome project: revealing the unexplored microbial diversity—challenges and prospects. Microb. Ecol. 67(2), 237–241 (2013). https://doi.org/10.1007/s00248-013-0302-4
Frøslev, T.G., et al.: Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nat. Commun. 8, 1–11 (2017). https://doi.org/10.1038/s41467-017-01312-x
Mahé, F., Rognes, T., Quince, C., de Vargas, C., Dunthorn, M.: Swarm v2: highly-scalable and high-resolution amplicon clustering. Peer J. 3, e1420 (2015). https://doi.org/10.7717/PEERJ.1420
McMurdie, P.J., Holmes, S.: phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013). https://doi.org/10.1371/journal.pone.0061217
Schubert, M., Lindgreen, S., Orlando, L.: AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 91(9), 1–7 (2016). https://doi.org/10.1186/S13104-016-1900-2
Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014). https://doi.org/10.1093/bioinformatics/btt593
Schmieder, R., Edwards, R.: Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011). https://doi.org/10.1093/bioinformatics/btr026
Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F.: VSEARCH: a versatile open source tool for metagenomics. Peer J. 4, e2584 (2016). https://doi.org/10.7717/PEERJ.2584
Cole, J.R., et al.: Ribosomal database project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014). https://doi.org/10.1093/NAR/GKT1244
Benson, D.A., et al.: GenBank. Nucleic Acids Res. 41, D36–D42 (2013). https://doi.org/10.1093/NAR/GKS1195
Tatusova, T.A., Madden, T.L.: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247–250 (1999). https://doi.org/10.1111/j.1574-6968.1999.tb13575.x
Bengtsson-Palme, J., et al.: Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Meth. Ecol. Evol. 4, 914–919 (2013). https://doi.org/10.1111/2041-210X.12073
McDonald, D., et al.: The biological observation matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience 1(1), 2047-217X (2012). https://doi.org/10.1186/2047-217X-1-7
Gohl, D.M., et al.: Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat. Biotechnol. 349(34), 942–949 (2016). https://doi.org/10.1038/nbt.3601
Bakker, M.G.: A fungal mock community control for amplicon sequencing experiments. Mol. Ecol. Resour. 18, 541–556 (2018). https://doi.org/10.1111/1755-0998.12760
Bista, I., et al.: Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples. Mol. Ecol. Resour. 18, 1020–1034 (2018). https://doi.org/10.1111/1755-0998.12888
Encyclopedia of Machine Learning: Encycl. Mach. Learn. (2010). https://doi.org/10.1007/978-0-387-30164-8
Toju, H., Tanabe, A.S., Yamamoto, S., Sato, H.: High-coverage ITS primers for the DNA-based identification of ascomycetes and basidiomycetes in environmental samples. PLoS ONE 7, e40863 (2012). https://doi.org/10.1371/JOURNAL.PONE.0040863