Abstract
BLAST, or Basic Local Alignment Search Tool, is one of the most widely used bioinformatics tools today. However, as biological data accumulates, its use can become a bottleneck limiting biological analyses both in time and memory usage. In this work we propose the use of a new data structure to re-implement BLAST. We use Binary Decision Diagrams (BDDs) to store the biological sequences and optimize resources, reducing memory requirements. This new approach has allowed us to construct the alignment of biological sequences with gains of up to 65.7% in memory usage for the allocation of the BLAST data structures and up to 16.3% faster search results, without altering the BLAST algorithm or its results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at https://www.ncbi.nlm.nih.gov/protein/1004172080.
- 2.
The value for parameters W, T and X are used to explain a simplified example, and were not used in any experiments. W and T have either default or typical values. Parameter X is mentioned for completeness, but not used in the example.
- 3.
BDDs can have a worst case exponential memory complexity in cases where the dependencies between the variables is circular or very complex. These dependencies do not typically occur in BDDBlast.
References
Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Altschul, S.F., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389 (1997)
Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (1999)
Kasap, S., Benkrid, K., Liu, Y.: Design and implementation of an FPGA-based core for gapped blast sequence alignment. Eng. Lett. 16 (2008)
Ye, W., et al.: H-blast: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 33(8), 1130–1138 (2017)
Nguyen, V., Lavenier, D.: PLAST: parallel local alignment search tool for database comparison. BMC Bioinform. 10(329), 1–13 (2009)
Fahim, S.: Comparative analysis of protein alignment algorithms in parallel environment using CUDA. Ph.D. dissertation, BRAC University (2016)
Boratyn, G.M., Schäffer, A.A., Agarwala, R., et al.: Domain enhanced lookup time accelerated blast. Biol. Direct 7(12), 1–14 (2012). https://doi.org/10.1186/1745-6150-7-12
Buchfink, B., Xie, C., Huson, D.: Fast and sensitive protein alignment using diamond. Nat. Methods 12, 59–60 (2015)
Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nat. Biotechnol. 22(8), 1035–1036 (2004)
Hess, M., et al.: Addressing inaccuracies in BLOSUM computation improves homology search performance. BMC Bioinform. 17(1), 1–10 (2016). https://doi.org/10.1186/s12859-016-1060-3
Lee, C.-Y.: Representation of switching circuits by binary-decision programs. Bell Labs Tech. J. 38(4), 985–999 (1959)
Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. Comput. IEEE Trans. 100(8), 677–691 (1986)
Brace, K.S., Rudell, R.L., Bryant, R.E.: Efficient implementation of a BDD package. In: 27th ACM/IEEE Design Automation conference, p. 40. ACM (1991)
Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35(suppl 1), D61–D65 (2006)
Gastauer, M., et al.: Ti - a metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining. Sci. Data 6, 1–10 (2019)
Acknowledgments
The authors would like to thanks Ronnie Alves and Kleber de Souza with assistance with the Corumba sequences. This research was supported by in part by CNPq, CAPES and FAPEMIG, Brazilian Funding Agencies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Availability
BDDBlast and the sequences used in this paper can be found at: http://www.luar.dcc.ufmg.br/bddblast/.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
de Oliveira, D.B., Faria-Campos, A., Campos, S. (2022). BDDBlast—A Memory Efficient Architecture for Pairwise Alignments. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-21175-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21174-4
Online ISBN: 978-3-031-21175-1
eBook Packages: Computer ScienceComputer Science (R0)