Longest Common Subsequence Applied to Protein Comparison
Abstract
This work compares the performance of two algorithms in the search for the longest common subsequence (LCS) among protein sequences from living organisms. We used sequences of different sizes and carried out tests with the Wagner-Fischer and Needleman-Wunsch algorithms. The results revealed that as the input size increases, both execution time of algorithms increases linearly, providing the best results for the Wagner-Fischer algorithm.
References
Garey, M. and Johnson, D. (1979). Computers and Intractability: A Guide to the Theory of NP-completeness. Mathematical Sciences Series. W. H. Freeman.
Howe, K. L., Achuthan, P., Allen, J., Allen, J., Alvarez-Jarreta, J., Amode, M. R., Armean, I. M., Azov, A. G., Bennett, R., Bhai, J., Billis, K., Boddu, S., Charkhchi, M., Cummins, C., Da Rin Fioretto, L., Davidson, C., Dodiya, K., El Houdaigui, B., Fatima, R., Gall, A., Garcia Giron, C., Grego, T., Guijarro-Clarke, C., Haggerty, L., Hemrom, A., Hourlier, T., Izuogu, O. G., Juettemann, T., Kaikala, V., Kay, M., Lavidas, I., Le, T., Lemos, D., Gonzalez Martinez, J., Marug´an, J. C., Maurel, T., McMahon, A. C., Mohanan, S., Moore, B., Muffato, M., Oheh, D. N., Paraschas, D., Parker, A., Parton, A., Prosovetskaia, I., Sakthivel, M. P., Salam, A., Schmitt, B. M., Schuilenburg, H., Sheppard, D., Steed, E., Szpak, M., Szuba, M., Taylor, K., Thormann, A., Threadgold, G.,Walts, B.,Winterbottom, A., Chakiachvili, M., Chaubal, A., De Silva, N., Flint, B., Frankish, A., Hunt, S. E., IIsley, G. R., Langridge, N., Loveland, J. E., Martin, F. J., Mudge, J. M., Morales, J., Perry, E., Ruffier, M., Tate, J., Thybert, D., Trevanion, S. J., Cunningham, F., Yates, A. D., Zerbino, D. R., and Flicek, P. (2020). Ensembl 2021. Nucleic Acids Research, 49(D1):D884–D891.
Issa, M. and Elaziz, M. A. (2020). Analyzing covid-19 virus based on enhanced fragmented biological local aligner using improved ions motion optimization algorithm. Applied Soft Computing, 96:106683.
Kawade, G., Sahu, S., Upadhye, S., Korde, N., and Motghare, M. (2018). An analysis on computation of longest common subsequence algorithm.
Koonin, E. (2005). Orthologs, paralogs, and evolutionary genomics 1. Annual review of genetics, 39:309–38.
Marçais, G., Delcher, A. L., Phillippy, A. M., Coston, R., Salzberg, S. L., and Zimin, A. (2018). Mummer4: A fast and versatile genome alignment system. PLOS Computational Biology, 14:1–14.
Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443–453.
Reddy, B. (2020). Bioinformatics & Pairwise Sequence Alignment: Local and Global Sequence Alignment Algorithms. Independently published.
Shafiq, M., Polo, J., Dickov, B., and Hussain, T. (2016). Modeling and performance evaluation of smith-waterman algorithm. In 2016 13th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pages 191–198.
Shikder, R., Thulasiraman, P., Irani, P., and Hu, P. (2019). An openmp-based tool for finding longest common subsequence in bioinformatics. BMC Research Notes, 12.
Wagner, R. A. and Fischer, M. J. (1974). The string-to-string correction problem. J. ACM, 21(1):168–173.
Wei, S., Wang, Y., Yang, Y., and Liu, S. (2020). A path recorder algorithm for multiple longest common subsequences (mlcs) problems. Bioinformatics (Oxford, England), 36.
Zhang, J., Misra, S., Wang, H., and Feng, W.-c. (2017). Eliminating irregularities of protein sequence search on multicore architectures. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 62–71.
