Linear Algorithm for Detecting Repeat Units in Tandem Arrays

  • Luiz H. B. Lago UFSM
  • Bruna da Silva Righi UFSM

Abstract


Detecting repetitive sequences is relevant in many areas of knowledge, such as genetics, data compression, and text processing. In this article, a solution to the problem of identifying the smallest sequential repetition in a string is addressed. Tandem arrays are a special group of strings made out of the repetition of smaller non-empty strings. The algorithm analyzed in this article has its focus on detecting the smallest repetitive unit of a tandem array in linear time, according to the size of the input string. It also validates the functionality of the algorithm.

Keywords: Evaluation, Measurement, and Performance Prediction, Data Science and High-Performance Computing

References

Aho, A. V. (1990). Chapter 5 - algorithms for finding patterns in strings. In Van Leeuwen, J., editor, Algorithms and Complexity, Handbook of Theoretical Computer Science, pages 255–300. Elsevier, Amsterdam.

C. S. Góes, A. (2005). Análise de regiões polimórficas do DNA com o objetivo de estabelecer vínculos genéticos, identificar restos mortais ou realizar perícias criminais. Revista do Biomedico, 65:22–23.

Knuth, D. E., Morris, Jr., J. H., and Pratt, V. R. (1977). Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323–350.

Mala, F. and Ali, R. (2022). The big-O of mathematics and computer science. Journal of Applied Mathematics and Computation, 6:1–3.

Nakamura, A., Saito, T., Takigawa, I., Kudo, M., and Mamitsuka, H. (2013). Fast algorithms for finding a minimum repetition representation of strings and trees. Discrete Applied Mathematics, 161(10):1556–1575.

Peterson, L. L. and Davie, B. S. (2022). 7 - end-to-end data. In Peterson, L. L. and Davie, B. S., editors, Computer Networks (Sixth Edition), The Morgan Kaufmann Series in Networking, pages 554–605. Morgan Kaufmann, 6th edition edition.

Sokol, D., Benson, G., and Tojeira, J. (2007). Tandem repeats over the edit distance. Bioinformatics, 23(2):e30–e35.
Published
2025-04-23
LAGO, Luiz H. B.; RIGHI, Bruna da Silva. Linear Algorithm for Detecting Repeat Units in Tandem Arrays. In: REGIONAL SCHOOL OF HIGH PERFORMANCE COMPUTING FROM SOUTHERN BRAZIL (ERAD-RS), 25. , 2025, Foz do Iguaçu/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 9-12. ISSN 2595-4164. DOI: https://doi.org/10.5753/eradrs.2025.6665.