Linear Algorithm for Detecting Repeat Units in Tandem Arrays
Abstract
Detecting repetitive sequences is relevant in many areas of knowledge, such as genetics, data compression, and text processing. In this article, a solution to the problem of identifying the smallest sequential repetition in a string is addressed. Tandem arrays are a special group of strings made out of the repetition of smaller non-empty strings. The algorithm analyzed in this article has its focus on detecting the smallest repetitive unit of a tandem array in linear time, according to the size of the input string. It also validates the functionality of the algorithm.
References
C. S. Góes, A. (2005). Análise de regiões polimórficas do DNA com o objetivo de estabelecer vínculos genéticos, identificar restos mortais ou realizar perícias criminais. Revista do Biomedico, 65:22–23.
Knuth, D. E., Morris, Jr., J. H., and Pratt, V. R. (1977). Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323–350.
Mala, F. and Ali, R. (2022). The big-O of mathematics and computer science. Journal of Applied Mathematics and Computation, 6:1–3.
Nakamura, A., Saito, T., Takigawa, I., Kudo, M., and Mamitsuka, H. (2013). Fast algorithms for finding a minimum repetition representation of strings and trees. Discrete Applied Mathematics, 161(10):1556–1575.
Peterson, L. L. and Davie, B. S. (2022). 7 - end-to-end data. In Peterson, L. L. and Davie, B. S., editors, Computer Networks (Sixth Edition), The Morgan Kaufmann Series in Networking, pages 554–605. Morgan Kaufmann, 6th edition edition.
Sokol, D., Benson, G., and Tojeira, J. (2007). Tandem repeats over the edit distance. Bioinformatics, 23(2):e30–e35.
