Whole Genome Alignment using a Multithreaded Parallel Implementation

  • Wellington S. Martins University of Delaware
  • Juan del Cuvillo University of Delaware
  • Wenwu Cui University of Delaware
  • Guang R. Gao University of Delaware

Resumo


Alignment of long DNA sequences is a challenging task due to its high demands for computational power and memory. We have developed a multithreaded parallel implementation of a sequence alignment algorithm that is able to align whole genomes with reliable output and reasonable cost. The implementation is based on a fine-grain multithreaded execution model, the EARTH model, which effectively tolerates latency through the overlapping of computation and communication. Human and mice mitochondrial genomes, human and Drosophila mitochondrial genomes are aligned respectively to demonstrate that the implementation can be used to align both closely related as well as less similar genomes. Results from Mycoplasma genitalium and Mycoplasma pneumoniae genomes, which are much larger than the tested mitochondrial genomes, are also presented. From the output, the homologous regions can be easily detected. This tool should facilitate alignment of syntenic regions, strain to strain comparisons, identification of regulatory elements and evolutionary comparisons as well.

Palavras-chave: Multithreading, Parallelism, Alignment, Genome, DNA

Referências

Kun-Mao Chao, Jinghui Zhang, James Ostell, and Webb Miller. A local alignment tool for very long DNA sequences. Computer Applications in the Biosciences, 11 (2): 147-153. 1995.

Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, and Steven L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369-2376, 1999.

Ralf Himmelreich, Helga Plagens, Helmut Hilbert, Berta Reiner, and Richard Herrmann. Comparative analysis of the genomes of the bacteria mycoplasma pneumoniae and mycoplasma genitalium. Nucleic Acids Research, 25(4):701-712, 1997.

E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms. Computer Science Press, 1978.

Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao, and Laurie J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319-347, August 1996.

G. G. Loots, R. M. Locksley, C. M. Blankespoor. Z. E. Wang, W. Miller, E. M. Rubin, and K. A. Frazer. ldentification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science, 288(5463): 136-140, April 2000.

Wellington S. Martins, Juan B. del Cuvillo, Francisco J. Useche, Kevin B. Theobald, and Guang R. Gao. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In Proceedings of the Pacific Symposium on Biocomputing, pages 311-322, Mauna Lani, Hawaii, January 3-7, 2001. World Scientific.

Saul B. Needleman and Christian O. Wunsch. A general method applicable to the search for similarities in the aminoacid sequence of two proteins. Journal of Molecular Biology, 48:443-453, 1970.

Temple F. Smith and Michael S. Waterman. ldentification of common molecular subsequences. Journal of Molecular Biology, 147:195-197. 1981.

Kevin B. Theobald, Rishi Kumar, Gagan Agrawal, Gerd Heber, Ruppa K. Thulasiram, and Guang R. Gao. Developing a communication intensive application on the EARTH multithreaded architecture. In Proceedings of the 6th International Euro-Par Conference, number 1900 in Lecture Notes in Computer Science, pages 625-637, Munich. Germany, August-September 2000. Springer-Verlag.

Kevin Bryan Theobald. EARTH: An Efficient Architecture for Running Threads. PhD thesis, McGill University, Montréal, Québec, May 1999.

Michael S. Waterman. Mathematical Methods for DNA Sequences. CRC Press lnc, Boca Ratón, Florida, 1989.
Publicado
10/09/2001
MARTINS, Wellington S.; DEL CUVILLO, Juan; CUI, Wenwu; GAO, Guang R.. Whole Genome Alignment using a Multithreaded Parallel Implementation. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 13. , 2001, Pirenópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2001 . p. 1-8. DOI: https://doi.org/10.5753/sbac-pad.2001.22185.