An External Memory Approach for Large Genome De Novo Assembly

de Armas, Elvismary Molina; Lifschitz, Sérgio

doi:10.1007/978-3-031-21175-1_9

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13523))

Included in the following conference series:

Brazilian Symposium on Bioinformatics

229 Accesses

Abstract

De novo genome assembly of sequenced reads is a fundamental problem in bioinformatics. When there is no reference genome sequence to guide the process, many assemblers programs consider using the de Bruijn Graph data structure to improve performances. However, the construction of such a graph has a high computational cost, mainly due to internal RAM consumption in the presence of very large and repeated read datasets. Building a de Bruijn Graph relies on a broad set of k-mers. Some existing approaches use external memory processing to make it feasible. This work proposes an approach for constructing the de Bruijn graph that does not generate all k-mers during the execution. An external memory processing allows reducing the high number of duplicate k-mers and, consequently, reduces the total number of k-mers that incur on the number of I/O operations. Some practical experiments are presented, showing the solution’s viability and its improvements over other common assemblers in the literature. Our solution reduces the computational requirements and enables execution feasibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Out-of-Core Contig Generation

Time- and memory-efficient genome assembly with Raven

Article 20 May 2021

Persistent memory as an effective alternative to random access memory in metagenome assembly

Article Open access 30 November 2022

References

Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988). https://doi.org/10.1145/48529.48535. https://doi.acm.org/10.1145/48529.48535
de Armas, E.M., Ferreira, P.C.G., Haeusler, E.H., de Holanda, M.T., Lifschitz, S.: K-mer mapping and RDBMS indexes. In: Kowada, L., de Oliveira, D. (eds.) BSB 2019. LNCS, vol. 11347, pp. 70–82. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46417-2_7
Chapter Google Scholar
Bradnam, K.R., et al.: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2(1), 1–31 (2013)
Google Scholar
Chikhi, R., Limasset, A., Jackman, S., Simpson, J.T., Medvedev, P.: On the representation of de Bruijn graphs. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 35–55. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05269-4_4
Chapter Google Scholar
Chikhi, R., Limasset, A., Medvedev, P.: Compacting de Bruijn graphs from sequencing data quickly and in low memory. Bioinformatics 32(12), i201 (2016)
Google Scholar
Cook, J.J., Zilles, C.: Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly. In: International Symposium on Performance Analysis of Systems and Software, ISPASS 2009, pp. 143–152 (2009). https://doi.org/10.1109/ISPASS.2009.4919646
de Armas, E.M., Castro, L.C., Holanda, M., Lifschitz, S.: A new approach for de bruijn graph construction in de novo genome assembling. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1842–1849 (2019)
Google Scholar
de Armas, E.M., Haeusler, E.H., Lifschitz, S., de Holanda, M.T., da Silva, W.M.C., Ferreira, P.C.G.: K-mer Mapping and de Bruijn graphs: the case for velvet fragment assembly. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 882–889 (2016). https://doi.org/10.1109/BIBM.2016.7822642
de Armas, E.M., Silva, M.V.M., Lifschitz, S.: A study of index structures for K-mer mapping. In: Proceedings Satellite Events of the 32nd Brazilian Symposium on Databases. Databases Meet Bioinformatics Workshop, pp. 326–333 (2017)
Google Scholar
Demaine, E.: Lecture notes in Advanced Data Structures, MIT course number 6.851 (Spring 2012). https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-851-advanced-data-structures-spring-2012/calendar-and-notes/MIT6_851S12_L7.pdf
Kundeti, V., Rajasekaran, S., Dinh, H.: Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs. arXiv e-prints (2010)
Google Scholar
Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research (2009)
Google Scholar
Li, Y., Kamousi, P., Han, F., Yang, S., Yan, X., Suri, S.: Memory efficient minimum substring partitioning. Proc. VLDB Endow. 6(3), 169–180 (2013). https://doi.org/10.14778/2535569.2448951. https://dx.doi.org/10.14778/2535569.2448951
Salzberg, S.L., et al.: GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22(3), 557–567 (2012)
Article CAS Google Scholar
Schatz, M.C., Delcher, A.L., Salzberg, S.L.: Assembly of large genomes using second-generation sequencing. Genome Res. 20(9), 1165–1173 (2010)
Article CAS Google Scholar
Silva, M.V.M., de Holanda, M.T., Haeusler, E.H., de Armas, E.M., Lifschitz, S.: VelvetH-DB: data persistency during fragment assembly. In: Proceedings Satellite Events of the 32nd Brazilian Symposium on Databases. Databases Meet Bioinformatics Workshop, pp. 334–341 (2017). (in Portuguese)
Google Scholar
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Article CAS Google Scholar
Zerbino, D.: Velvet software. EMBL-EBI (2016). https://www.ebi.ac.uk/zerbino/velvet/. Accessed 15 June 2019

Download references

Author information

Authors and Affiliations

Depto. Informática, PUC-Rio, Rio de Janeiro, Brazil
Elvismary Molina de Armas & Sérgio Lifschitz

Authors

Elvismary Molina de Armas
View author publications
You can also search for this author in PubMed Google Scholar
Sérgio Lifschitz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elvismary Molina de Armas .

Editor information

Editors and Affiliations

Instituto Nacional de Câncer, Rio de Janeiro, Brazil
Nicole M. Scherer
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Raquel C. de Melo-Minardi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Armas, E.M., Lifschitz, S. (2022). An External Memory Approach for Large Genome De Novo Assembly. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-21175-1_9
Published: 16 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21174-4
Online ISBN: 978-3-031-21175-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An External Memory Approach for Large Genome De Novo Assembly

Abstract

Access this chapter

Similar content being viewed by others

Efficient Out-of-Core Contig Generation

Time- and memory-efficient genome assembly with Raven

Persistent memory as an effective alternative to random access memory in metagenome assembly

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An External Memory Approach for Large Genome De Novo Assembly

Abstract

Access this chapter

Similar content being viewed by others

Efficient Out-of-Core Contig Generation

Time- and memory-efficient genome assembly with Raven

Persistent memory as an effective alternative to random access memory in metagenome assembly

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation